The API uses standard HTTP status codes. Error bodies follow the OpenAI shape:
JSON
{
"error": {
"message": "Human-readable description",
"type": "<category>",
"code": "<machine-readable>"
}
}Common status codes
| Code | Meaning | What to do |
|---|---|---|
200 |
Success | — |
400 |
Bad request: malformed JSON, unknown field, value out of range, exceeds context window | Fix the request; check supported_parameters for the model |
401 |
Authentication failed: key missing, malformed, revoked, or expired | Verify the Authorization header; mint a new key |
402 |
Budget exhausted: your key or org spend has reached its cap | Raise the key cap in Settings, or ask your org admin |
403 |
Forbidden: model not in your org's allowed list | Pick a permitted model (see GET /v1/models) |
404 |
Unknown endpoint or model id | Check the path; check /v1/models |
413 |
Payload too large: mainly audio uploads (75 MB cap) | Shorten or split |
429 |
Rate limited: too many requests per minute on this key | Back off with exponential delay |
5xx |
Server error: timeout, memory limit, transient failure | Retry with exponential backoff; switch model if persistent |
Retry strategy
Retry 429 and 5xx. Don't retry other 4xx codes; they'll fail the same way.
- Exponential backoff with jitter: start ~1s, cap ~30s.
- Set max-retries (3–5) and surface the final error rather than retrying forever.
- For
429, lower per-key concurrency or split across keys.
Streaming errors
Errors before the stream starts return a regular JSON error body with the appropriate status code.
Errors mid-stream are delivered as an SSE chunk with an error field and the stream is closed:
Text
data: {"error": {"message": "Upstream timeout", "type": "engine_error", "code": "timeout"}}Always handle the possibility of receiving an error chunk instead of [DONE].
Error codes
| Code | Meaning |
|---|---|
budget_exceeded |
Key or org budget is exhausted |
rpm_exceeded |
Too many requests per minute |
timeout |
Model didn't respond in time |
context_exceeded |
Total tokens exceed the model's context window |
When a request looks stuck
- Most often: streaming with an intermediate proxy buffering the response. See Streaming.
- Otherwise: check the model's status. If it's in maintenance, it'll be dropped from
GET /v1/models.
Where next
- Rate limits for the
402and429thresholds your key inherits. - Authentication for what
401means and how to rotate. - Streaming for how errors surface mid-response.