Get API key

Errors

HTTP status codes, error envelope, and retry guidance.

The API uses standard HTTP status codes. Error bodies follow the OpenAI shape:

JSON
{
  "error": {
    "message": "Human-readable description",
    "type": "<category>",
    "code": "<machine-readable>"
  }
}

Common status codes

Code Meaning What to do
200 Success
400 Bad request: malformed JSON, unknown field, value out of range, exceeds context window Fix the request; check supported_parameters for the model
401 Authentication failed: key missing, malformed, revoked, or expired Verify the Authorization header; mint a new key
402 Budget exhausted: your key or org spend has reached its cap Raise the key cap in Settings, or ask your org admin
403 Forbidden: model not in your org's allowed list Pick a permitted model (see GET /v1/models)
404 Unknown endpoint or model id Check the path; check /v1/models
413 Payload too large: mainly audio uploads (75 MB cap) Shorten or split
429 Rate limited: too many requests per minute on this key Back off with exponential delay
5xx Server error: timeout, memory limit, transient failure Retry with exponential backoff; switch model if persistent

Retry strategy

Retry 429 and 5xx. Don't retry other 4xx codes; they'll fail the same way.

  • Exponential backoff with jitter: start ~1s, cap ~30s.
  • Set max-retries (3–5) and surface the final error rather than retrying forever.
  • For 429, lower per-key concurrency or split across keys.

Streaming errors

Errors before the stream starts return a regular JSON error body with the appropriate status code.

Errors mid-stream are delivered as an SSE chunk with an error field and the stream is closed:

Text
data: {"error": {"message": "Upstream timeout", "type": "engine_error", "code": "timeout"}}

Always handle the possibility of receiving an error chunk instead of [DONE].

Error codes

Code Meaning
budget_exceeded Key or org budget is exhausted
rpm_exceeded Too many requests per minute
timeout Model didn't respond in time
context_exceeded Total tokens exceed the model's context window

When a request looks stuck

  • Most often: streaming with an intermediate proxy buffering the response. See Streaming.
  • Otherwise: check the model's status. If it's in maintenance, it'll be dropped from GET /v1/models.

Where next