Developers

API

Chat Completions

POST /v1/chat/completions

Generates a chat completion for the given messages. Fully compatible with the OpenAI chat completions API — request and response formats are identical.

Request Body

{
  "model": "auto",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum computing in simple terms."}
  ],
  "stream": true,
  "temperature": 0.7,
  "max_completion_tokens": 4096
}

Parameters

Parameter Type Required Description
model string Yes Model ID or "auto" for smart routing. See Models.
messages array Yes Conversation messages. See Message Format below.
stream boolean No If true, returns Server-Sent Events. Default false.
temperature number No Sampling temperature (0–2). Omit to use the model's default.
max_completion_tokens integer No Maximum tokens in the response. Also accepts deprecated max_tokens.
top_p number No Nucleus sampling parameter.
frequency_penalty number No Penalize repeated tokens (-2.0 to 2.0).
presence_penalty number No Penalize tokens already present (-2.0 to 2.0).
stop string or array No Up to 4 sequences where the model stops generating.
tools array No Tool definitions for function calling. See Tool Calling.
tool_choice string or object No Controls tool usage: "auto", "none", "required", or a specific function.
stream_options object No {"include_usage": true} to receive token counts in the final streaming chunk.

Message Format

Messages follow the OpenAI conversation format:

User Message

{"role": "user", "content": "What is the capital of France?"}

Content can be a string or an array of content parts (text, images):

{
  "role": "user",
  "content": [
    {"type": "text", "text": "What's in this image?"},
    {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
  ]
}

System Message

{"role": "system", "content": "You are a helpful coding assistant."}

Assistant Message

{"role": "assistant", "content": "The capital of France is Paris."}

Tool Messages

After receiving a tool call from the assistant, send the result back:

{"role": "tool", "tool_call_id": "call_abc123", "content": "{\"result\": 42}"}

Response (Non-Streaming)

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1712100000,
  "model": "deepseek.v3.2",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses quantum bits (qubits)..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 150,
    "total_tokens": 175
  }
}

The model field in the response tells you which model actually handled the request — useful when you sent "auto".

Streaming

Set stream: true to receive Server-Sent Events. Each event is a JSON object prefixed with data: :

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1712100000,"model":"deepseek.v3.2","choices":[{"index":0,"delta":{"role":"assistant","content":"Quantum"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1712100000,"model":"deepseek.v3.2","choices":[{"index":0,"delta":{"content":" computing"},"finish_reason":null}]}

data: [DONE]

Getting Usage in Streams

Add stream_options: {"include_usage": true} to receive a final chunk with token counts before [DONE]:

{
  "stream_options": {"include_usage": true}
}

The usage chunk has an empty choices array and a populated usage object.

Tool Calling

Define tools in the request and the model can choose to call them:

Defining Tools

{
  "model": "auto",
  "messages": [{"role": "user", "content": "What's the weather in Tokyo?"}],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {"type": "string", "description": "City name"}
          },
          "required": ["city"]
        }
      }
    }
  ]
}

Tool Call Response

When the model decides to call a tool, the response looks like:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"city\": \"Tokyo\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}

Sending Tool Results

Send the tool result back in the next request to continue the conversation:

{
  "model": "auto",
  "messages": [
    {"role": "user", "content": "What's the weather in Tokyo?"},
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [
        {
          "id": "call_abc123",
          "type": "function",
          "function": {"name": "get_weather", "arguments": "{\"city\": \"Tokyo\"}"}
        }
      ]
    },
    {
      "role": "tool",
      "tool_call_id": "call_abc123",
      "content": "{\"temp\": 22, \"condition\": \"sunny\"}"
    }
  ]
}

Provider Metadata

Every response includes provider metadata so you can maintain conversation continuity. When the response includes provider_metadata, echo it back in the next request's metadata field — this ensures multi-turn tool conversations stay on the same provider.

Error Responses

Errors follow the OpenAI error format:

{
  "error": {
    "message": "Invalid token",
    "type": "invalid_request_error",
    "code": "invalid_api_key"
  }
}
HTTP Status Code Meaning
400 invalid_request Malformed request body or invalid parameters
401 invalid_api_key Missing or invalid JWT token
405 method_not_allowed Wrong HTTP method (must be POST)
429 USAGE_LIMIT_EXCEEDED Budget exhausted — see Usage & Limits
500 internal_error Server error

Headers

Request Headers

Header Required Description
Authorization Yes Bearer <jwt-token>
Content-Type Yes application/json
X-Bot-ID No Identifies your application for per-app usage tracking. Defaults to "default".
X-Provider-API-Key No Your own provider API key for BYOK.

Response Headers

Rate limit headers are included on every response. See Usage & Limits for the full list.

Next Steps