Models
models
Methods
Description
Creates and hosts a model based on a model template.
Base embedding models, chunk ranking functions, and LLMs are often not sufficient for customer use cases. We have shown in various blogs that fine-tuning these models on customer data can lead to significant improvements in performance.
- We Fine-Tuned GPT-4 to Beat the Industry Standard for Text2SQL
- OpenAI Names Scale as Preferred Partner to Fine-Tune GPT-3.5
- How to Fine-Tune GPT-3.5 Turbo With OpenAI API
Details
Before creating a model, you must first create a model template. A model template serves 2 purposes. First, it provides common scaffolding that is static across multiple models. Second, it exposes several variables that can be injected at model creation time to customize the model.
For example, a model template can define a docker image that contains code to run a HuggingFace or SentenceTransformers model. This docker image code also accepts environment variables that can be set to swap out the model weights or model name. Refer to the Create Model Template API for more details.
To create a new model, users must refer to an existing model template and provide the necessary parameters the the model template requires in its model_creation_parameters_schema field. The combination of the model template and the model creation parameters will be used to create and deploy a new model.
Once a model has been created, it can be executed by calling the Execute Model API.
Coming Soon
Some of our EGP APIs depend on models, for example Knowledge Base APIs depend on embedding models, Chunk Ranking APIs depend on ranking models, and Completion APIs depend on LLMs.
In the near future, if a model is created from a model template that is compatible with one of these APIs (based on the model template's model_type field), the model will automatically be registered with the API. This will allow users to immediately start using the model with those API without any additional setup.
Description
Deletes a model
Details
This API can be used to delete a model by ID. To use this API, pass in the id that was returned from your Create Model API call as a path parameter.
Description
Lists all models accessible to the user.
Details
This API can be used to list models. If a user has access to multiple accounts, all models from all accounts the user is associated with will be returned.
Description
Gets the details of a model
Details
This API can be used to get information about a single model by ID. To use this API, pass in the id that was returned from your Create Model API call as a path parameter.
Review the response schema to see the fields that will be returned.
Description
Updates a model
Details
This API can be used to update the model that matches the ID that was passed in as a path parameter. To use this API, pass in the id that was returned from your Create Model API call as a path parameter.
Review the request schema to see the fields that can be updated.
Domain types
Chat Completions
models.chat_completions
Methods
Description
Interact with the LLM model using the specified model_deployment_id. You can include a list of messages as the conversation history. The conversation can feature multiple messages from the roles user, assistant, and system. If the chosen model does not support chat completion, the API will revert to simple completion, disregarding the provided history. The endpoint manages context length exceedance optimistically: it estimates the token count from the provided history and prompt, and if it exceeds the context or approaches 80% of it, the exact token count will be calculated, and the history will be trimmed to fit the context.
{
"prompt": "Generate 5 more",
"chat_history": [
{ "role": "system", "content": "You are a name generator. Do not generate anything else than names" },
{ "role": "user", "content": "Generate 5 names" },
{ "role": "assistant", "content": "1. Olivia Bennett\n2. Ethan Carter\n3. Sophia Ramirez\n4. Liam Thompson\n5. Ava Mitchell" }
],
}
Completions
models.completions
Methods
Description
Interact with the LLM model using the specified model_deployment_id. The LLM model will generate a text completion based on the provided prompt.
{
"prompt": "What is the capital of France?"
}
Deployments
models.deployments
Methods
Description
Model Deployments are unique endpoints created for custom models in the Scale GenAI Platform. They enable users to interact with and utilize specific instances of models through the API/SDK. Each deployment is associated with a model instance, containing the necessary model template and model-metadata. Model templates describe the creation parameters that are configured on the deployment. The model deployments provide a means to call upon models for inference, logging calls, and monitoring usage.
Built-in models also have deployments for creating a consistent interface for all models. But they don't represent a real deployment, they are just a way to interact with the built-in models. These deployments are created automatically when the model is created and they are immutable.
Endpoint details
This endpoint is used to deploy a model instance. The request payload schema depends on the model_request_parameters_schema of the Model Template that the created model was created from.
Description
Deletes a deployment
Details
This API can be used to delete a deployment by ID. To use this API, pass in the id that was returned from your Create Deployment API call as a path parameter.
Execute Model Deployment
TODO: Document
TODO: Document
Description
Gets the details of a deployment
Details
This API can be used to get information about a single deployment by ID. To use this API, pass in the id that was returned from your Create Deployment API call as a path parameter.
Review the response schema to see the fields that will be returned.
Description
Updates a deployment
Details
This API can be used to update the deployment that matches the ID that was passed in as a path parameter. To use this API, pass in the id that was returned from your Create Deployment API call as a path parameter.
Review the request schema to see the fields that can be updated.
Domain types
models.deployments.usage_statistics
Methods
Get Model usage for one deployment
Embeddings
models.embeddings
Methods
Description
Computes the text embeddings for text fragments using the model with the given model_deployment_id.
Details
Users can use this API to execute EMBEDDING type EGP model they have access to. To use this API, pass in the id of a model returned by the V3 Create Model API. An example text embedding request
{
"texts": ["Please compute my embedding vector", "Another text fragment"]
}
Rerankings
models.rerankings
Methods
Description
TODO: Documentation
Usage Statistics
models.usage_statistics
Methods
Get Model usage by model name