Models

models

Methods

Create Model ->
post/v4/models

Description

Creates and hosts a model based on a model template.

Base embedding models, chunk ranking functions, and LLMs are often not sufficient for customer use cases. We have shown in various blogs that fine-tuning these models on customer data can lead to significant improvements in performance.

  1. We Fine-Tuned GPT-4 to Beat the Industry Standard for Text2SQL
  2. OpenAI Names Scale as Preferred Partner to Fine-Tune GPT-3.5
  3. How to Fine-Tune GPT-3.5 Turbo With OpenAI API

Details

Before creating a model, you must first create a model template. A model template serves 2 purposes. First, it provides common scaffolding that is static across multiple models. Second, it exposes several variables that can be injected at model creation time to customize the model.

For example, a model template can define a docker image that contains code to run a HuggingFace or SentenceTransformers model. This docker image code also accepts environment variables that can be set to swap out the model weights or model name. Refer to the Create Model Template API for more details.

To create a new model, users must refer to an existing model template and provide the necessary parameters the the model template requires in its model_creation_parameters_schema field. The combination of the model template and the model creation parameters will be used to create and deploy a new model.

Once a model has been created, it can be executed by calling the Execute Model API.

Coming Soon

Some of our EGP APIs depend on models, for example Knowledge Base APIs depend on embedding models, Chunk Ranking APIs depend on ranking models, and Completion APIs depend on LLMs.

In the near future, if a model is created from a model template that is compatible with one of these APIs (based on the model template's model_type field), the model will automatically be registered with the API. This will allow users to immediately start using the model with those API without any additional setup.

Delete Model ->
delete/v4/models/{model_id}

Description

Deletes a model

Details

This API can be used to delete a model by ID. To use this API, pass in the id that was returned from your Create Model API call as a path parameter.

List Models -> PageResponse<>
get/v4/models

Description

Lists all models accessible to the user.

Details

This API can be used to list models. If a user has access to multiple accounts, all models from all accounts the user is associated with will be returned.

get/v4/models/{model_id}

Description

Gets the details of a model

Details

This API can be used to get information about a single model by ID. To use this API, pass in the id that was returned from your Create Model API call as a path parameter.

Review the response schema to see the fields that will be returned.

Update Model ->
patch/v4/models/{model_id}

Description

Updates a model

Details

This API can be used to update the model that matches the ID that was passed in as a path parameter. To use this API, pass in the id that was returned from your Create Model API call as a path parameter.

Review the request schema to see the fields that can be updated.

Domain types

EmbeddingResponse = { embeddings, tokens_used }
GenericModelResponse = { error_message, status, status_code }
ModelInstance = { id, account_id, created_at, 17 more... }
ModelInstanceWithViews = { id, account_id, created_at, 19 more... }
PaginatedModelInstanceWithViews = { current_page, items, items_per_page, 1 more... }
ParameterBindings = { bindings }
RerankingResponse = { chunk_scores, tokens_used }
Models

Chat Completions

models.chat_completions

Methods

Generate Chat Completion ->
post/v4/models/{model_deployment_id}/chat-completions

Description

Interact with the LLM model using the specified model_deployment_id. You can include a list of messages as the conversation history. The conversation can feature multiple messages from the roles user, assistant, and system. If the chosen model does not support chat completion, the API will revert to simple completion, disregarding the provided history. The endpoint manages context length exceedance optimistically: it estimates the token count from the provided history and prompt, and if it exceeds the context or approaches 80% of it, the exact token count will be calculated, and the history will be trimmed to fit the context.

{
    "prompt": "Generate 5 more",
    "chat_history": [
        { "role": "system", "content": "You are a name generator. Do not generate anything else than names" },
        { "role": "user", "content": "Generate 5 names" },
        { "role": "assistant", "content": "1. Olivia Bennett\n2. Ethan Carter\n3. Sophia Ramirez\n4. Liam Thompson\n5. Ava Mitchell" }
    ],
}
Models

Completions

models.completions

Methods

Generate Completion ->
post/v4/models/{model_deployment_id}/completions

Description

Interact with the LLM model using the specified model_deployment_id. The LLM model will generate a text completion based on the provided prompt.

{
    "prompt": "What is the capital of France?"
}
Models

Deployments

models.deployments

Methods

Deploy Model ->
post/v4/models/{model_instance_id}/deployments

Description

Model Deployments are unique endpoints created for custom models in the Scale GenAI Platform. They enable users to interact with and utilize specific instances of models through the API/SDK. Each deployment is associated with a model instance, containing the necessary model template and model-metadata. Model templates describe the creation parameters that are configured on the deployment. The model deployments provide a means to call upon models for inference, logging calls, and monitoring usage.

Built-in models also have deployments for creating a consistent interface for all models. But they don't represent a real deployment, they are just a way to interact with the built-in models. These deployments are created automatically when the model is created and they are immutable.

Endpoint details

This endpoint is used to deploy a model instance. The request payload schema depends on the model_request_parameters_schema of the Model Template that the created model was created from.

Delete Deployment ->
delete/v4/models/{model_instance_id}/deployments/{deployment_id}

Description

Deletes a deployment

Details

This API can be used to delete a deployment by ID. To use this API, pass in the id that was returned from your Create Deployment API call as a path parameter.

Execute Model Deployment ->
post/v4/models/{model_instance_id}/deployments/{model_deployment_id}/execute

Execute Model Deployment

List Model Deployments Of A Model -> PageResponse<>
get/v4/models/{model_instance_id}/deployments

TODO: Document

List All Model Deployments -> PageResponse<>
get/v4/model-deployments

TODO: Document

Get Deployment ->
get/v4/models/{model_instance_id}/deployments/{deployment_id}

Description

Gets the details of a deployment

Details

This API can be used to get information about a single deployment by ID. To use this API, pass in the id that was returned from your Create Deployment API call as a path parameter.

Review the response schema to see the fields that will be returned.

Update Deployment ->
patch/v4/models/{model_instance_id}/deployments/{deployment_id}

Description

Updates a deployment

Details

This API can be used to update the deployment that matches the ID that was passed in as a path parameter. To use this API, pass in the id that was returned from your Create Deployment API call as a path parameter.

Review the request schema to see the fields that can be updated.

Domain types

ModelDeployment = { id, account_id, created_at, 8 more... }
PaginatedModelDeployments = { current_page, items, items_per_page, 1 more... }
ModelsDeployments

Usage Statistics

models.deployments.usage_statistics

Methods

Get Model Usage For One Deployment ->
get/v4/model-deployments/{model_deployment_id}/usage-statistics

Get Model usage for one deployment

Models

Embeddings

models.embeddings

Methods

Generate Text Embedding ->
post/v4/models/{model_deployment_id}/embeddings

Description

Computes the text embeddings for text fragments using the model with the given model_deployment_id.

Details

Users can use this API to execute EMBEDDING type EGP model they have access to. To use this API, pass in the id of a model returned by the V3 Create Model API. An example text embedding request

{
    "texts": ["Please compute my embedding vector", "Another text fragment"]
}
Models

Rerankings

models.rerankings

Methods

Generate Reranking ->
post/v4/models/{model_deployment_id}/rerankings

Description

TODO: Documentation

Models

Usage Statistics

models.usage_statistics

Methods

Get Model Usage By Model Name ->
get/v4/models/{model_name}/usage-statistics

Get Model usage by model name