API
Chat Completions
POST /v1/chat/completions
Generates a chat completion for the given messages. Fully compatible with the OpenAI chat completions API — request and response formats are identical.
Request Body
{
"model": "auto",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
"stream": true,
"temperature": 0.7,
"max_completion_tokens": 4096
}
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | Model ID or "auto" for smart routing. See Models. |
messages |
array | Yes | Conversation messages. See Message Format below. |
stream |
boolean | No | If true, returns Server-Sent Events. Default false. |
temperature |
number | No | Sampling temperature (0–2). Omit to use the model's default. |
max_completion_tokens |
integer | No | Maximum tokens in the response. Also accepts deprecated max_tokens. |
top_p |
number | No | Nucleus sampling parameter. |
frequency_penalty |
number | No | Penalize repeated tokens (-2.0 to 2.0). |
presence_penalty |
number | No | Penalize tokens already present (-2.0 to 2.0). |
stop |
string or array | No | Up to 4 sequences where the model stops generating. |
tools |
array | No | Tool definitions for function calling. See Tool Calling. |
tool_choice |
string or object | No | Controls tool usage: "auto", "none", "required", or a specific function. |
stream_options |
object | No | {"include_usage": true} to receive token counts in the final streaming chunk. |
Message Format
Messages follow the OpenAI conversation format:
User Message
{"role": "user", "content": "What is the capital of France?"}
Content can be a string or an array of content parts (text, images):
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
]
}
System Message
{"role": "system", "content": "You are a helpful coding assistant."}
Assistant Message
{"role": "assistant", "content": "The capital of France is Paris."}
Tool Messages
After receiving a tool call from the assistant, send the result back:
{"role": "tool", "tool_call_id": "call_abc123", "content": "{\"result\": 42}"}
Response (Non-Streaming)
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1712100000,
"model": "deepseek.v3.2",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing uses quantum bits (qubits)..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 25,
"completion_tokens": 150,
"total_tokens": 175
}
}
The model field in the response tells you which model actually handled the request — useful when you sent "auto".
Streaming
Set stream: true to receive Server-Sent Events. Each event is a JSON object prefixed with data: :
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1712100000,"model":"deepseek.v3.2","choices":[{"index":0,"delta":{"role":"assistant","content":"Quantum"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1712100000,"model":"deepseek.v3.2","choices":[{"index":0,"delta":{"content":" computing"},"finish_reason":null}]}
data: [DONE]
Getting Usage in Streams
Add stream_options: {"include_usage": true} to receive a final chunk with token counts before [DONE]:
{
"stream_options": {"include_usage": true}
}
The usage chunk has an empty choices array and a populated usage object.
Tool Calling
Define tools in the request and the model can choose to call them:
Defining Tools
{
"model": "auto",
"messages": [{"role": "user", "content": "What's the weather in Tokyo?"}],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"]
}
}
}
]
}
Tool Call Response
When the model decides to call a tool, the response looks like:
{
"choices": [
{
"message": {
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "call_abc123",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"city\": \"Tokyo\"}"
}
}
]
},
"finish_reason": "tool_calls"
}
]
}
Sending Tool Results
Send the tool result back in the next request to continue the conversation:
{
"model": "auto",
"messages": [
{"role": "user", "content": "What's the weather in Tokyo?"},
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "call_abc123",
"type": "function",
"function": {"name": "get_weather", "arguments": "{\"city\": \"Tokyo\"}"}
}
]
},
{
"role": "tool",
"tool_call_id": "call_abc123",
"content": "{\"temp\": 22, \"condition\": \"sunny\"}"
}
]
}
Provider Metadata
Every response includes provider metadata so you can maintain conversation continuity. When the response includes provider_metadata, echo it back in the next request's metadata field — this ensures multi-turn tool conversations stay on the same provider.
Error Responses
Errors follow the OpenAI error format:
{
"error": {
"message": "Invalid token",
"type": "invalid_request_error",
"code": "invalid_api_key"
}
}
| HTTP Status | Code | Meaning |
|---|---|---|
| 400 | invalid_request |
Malformed request body or invalid parameters |
| 401 | invalid_api_key |
Missing or invalid JWT token |
| 405 | method_not_allowed |
Wrong HTTP method (must be POST) |
| 429 | USAGE_LIMIT_EXCEEDED |
Budget exhausted — see Usage & Limits |
| 500 | internal_error |
Server error |
Headers
Request Headers
| Header | Required | Description |
|---|---|---|
Authorization |
Yes | Bearer <jwt-token> |
Content-Type |
Yes | application/json |
X-Bot-ID |
No | Identifies your application for per-app usage tracking. Defaults to "default". |
X-Provider-API-Key |
No | Your own provider API key for BYOK. |
Response Headers
Rate limit headers are included on every response. See Usage & Limits for the full list.
Next Steps
- Models — available models and how smart routing works
- Usage & Limits — understand your budget and rate limits
- Bring Your Own Key — use your own provider API keys