Chat Completions — Nebo Developers

POST /v1/chat/completions

Generates a chat completion for the given messages. Fully compatible with the OpenAI chat completions API — request and response formats are identical.

Request Body

{
  "model": "auto",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain quantum computing in simple terms."}
  ],
  "stream": true,
  "temperature": 0.7,
  "max_completion_tokens": 4096
}

Parameters

Parameter	Type	Required	Description
`model`	string	Yes	Model ID or `"auto"` for smart routing. See Models.
`messages`	array	Yes	Conversation messages. See Message Format below.
`stream`	boolean	No	If `true`, returns Server-Sent Events. Default `false`.
`temperature`	number	No	Sampling temperature (0–2). Omit to use the model's default.
`max_completion_tokens`	integer	No	Maximum tokens in the response. Also accepts deprecated `max_tokens`.
`top_p`	number	No	Nucleus sampling parameter.
`frequency_penalty`	number	No	Penalize repeated tokens (-2.0 to 2.0).
`presence_penalty`	number	No	Penalize tokens already present (-2.0 to 2.0).
`stop`	string or array	No	Up to 4 sequences where the model stops generating.
`tools`	array	No	Tool definitions for function calling. See Tool Calling.
`tool_choice`	string or object	No	Controls tool usage: `"auto"`, `"none"`, `"required"`, or a specific function.
`stream_options`	object	No	`{"include_usage": true}` to receive token counts in the final streaming chunk.

Message Format

Messages follow the OpenAI conversation format:

User Message

{"role": "user", "content": "What is the capital of France?"}

Content can be a string or an array of content parts (text, images):

{
  "role": "user",
  "content": [
    {"type": "text", "text": "What's in this image?"},
    {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}}
  ]
}

System Message

{"role": "system", "content": "You are a helpful coding assistant."}

Assistant Message

{"role": "assistant", "content": "The capital of France is Paris."}

Tool Messages

After receiving a tool call from the assistant, send the result back:

{"role": "tool", "tool_call_id": "call_abc123", "content": "{\"result\": 42}"}

Response (Non-Streaming)

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1712100000,
  "model": "deepseek.v3.2",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses quantum bits (qubits)..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 150,
    "total_tokens": 175
  }
}

The model field in the response tells you which model actually handled the request — useful when you sent "auto".

Streaming

Set stream: true to receive Server-Sent Events. Each event is a JSON object prefixed with data: :

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1712100000,"model":"deepseek.v3.2","choices":[{"index":0,"delta":{"role":"assistant","content":"Quantum"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1712100000,"model":"deepseek.v3.2","choices":[{"index":0,"delta":{"content":" computing"},"finish_reason":null}]}

data: [DONE]

Getting Usage in Streams

Add stream_options: {"include_usage": true} to receive a final chunk with token counts before [DONE]:

{
  "stream_options": {"include_usage": true}
}

The usage chunk has an empty choices array and a populated usage object.

Tool Calling

Define tools in the request and the model can choose to call them:

Defining Tools

{
  "model": "auto",
  "messages": [{"role": "user", "content": "What's the weather in Tokyo?"}],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {"type": "string", "description": "City name"}
          },
          "required": ["city"]
        }
      }
    }
  ]
}

Tool Call Response

When the model decides to call a tool, the response looks like:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_abc123",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"city\": \"Tokyo\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}

Sending Tool Results

Send the tool result back in the next request to continue the conversation:

{
  "model": "auto",
  "messages": [
    {"role": "user", "content": "What's the weather in Tokyo?"},
    {
      "role": "assistant",
      "content": null,
      "tool_calls": [
        {
          "id": "call_abc123",
          "type": "function",
          "function": {"name": "get_weather", "arguments": "{\"city\": \"Tokyo\"}"}
        }
      ]
    },
    {
      "role": "tool",
      "tool_call_id": "call_abc123",
      "content": "{\"temp\": 22, \"condition\": \"sunny\"}"
    }
  ]
}

Provider Metadata

Every response includes provider metadata so you can maintain conversation continuity. When the response includes provider_metadata, echo it back in the next request's metadata field — this ensures multi-turn tool conversations stay on the same provider.

Error Responses

Errors follow the OpenAI error format:

{
  "error": {
    "message": "Invalid token",
    "type": "invalid_request_error",
    "code": "invalid_api_key"
  }
}

HTTP Status	Code	Meaning
400	`invalid_request`	Malformed request body or invalid parameters
401	`invalid_api_key`	Missing or invalid JWT token
405	`method_not_allowed`	Wrong HTTP method (must be POST)
429	`USAGE_LIMIT_EXCEEDED`	Budget exhausted — see Usage & Limits
500	`internal_error`	Server error

Headers

Request Headers

Header	Required	Description
`Authorization`	Yes	`Bearer <jwt-token>`
`Content-Type`	Yes	`application/json`
`X-Bot-ID`	No	Identifies your application for per-app usage tracking. Defaults to `"default"`.
`X-Provider-API-Key`	No	Your own provider API key for BYOK.

Response Headers

Rate limit headers are included on every response. See Usage & Limits for the full list.

Next Steps

Models — available models and how smart routing works
Usage & Limits — understand your budget and rate limits
Bring Your Own Key — use your own provider API keys