Get API key

Rate limits & budgets

Per-key budgets, RPM ceilings, and how to handle 402 and 429.

Every PrivateMind API key carries two independent controls: a budget cap (cumulative spend) and a rate limit (calls per minute). Both are enforced server-side.

Budgets

Each key has a USD budget set at creation. Every request's prompt_tokens + completion_tokens is priced and added to the key's spend. When spent_usd >= budget_usd, calls return:

Text
HTTP/1.1 402 Payment Required
JSON
{
  "error": {
    "message": "Key budget exhausted",
    "type": "billing_error",
    "code": "budget_exceeded"
  }
}

To recover, raise the cap in Settings → API Keys, or rotate to a key with budget left.

How spend is calculated

Each model has an input-token cost and an output-token cost. Streaming and non-streaming responses are priced identically. The final usage block on a stream is the source of truth.

Embeddings are priced on input tokens only (completion_tokens is always 0).

Spend is updated after each request completes. Mid-flight you can't observe it; instrument your client if you need finer-grained tracking.

Rate limits

Each key has a sliding-window requests-per-minute limit. Exceeding returns:

Text
HTTP/1.1 429 Too Many Requests
JSON
{
  "error": {
    "message": "Rate limit exceeded",
    "type": "rate_limit_error",
    "code": "rpm_exceeded"
  }
}

The rate is per-key, not per-org and not per-IP. Splitting traffic across multiple keys multiplies your effective ceiling.

Backoff

Python
import random, time, openai

def call_with_retry(fn, max_attempts=5):
    for attempt in range(max_attempts):
        try:
            return fn()
        except openai.RateLimitError:
            if attempt == max_attempts - 1:
                raise
            delay = min(30, (2 ** attempt) + random.random())
            time.sleep(delay)

Designing around the limits

  • One key per workload. Don't share a key across unrelated services. One misbehaving service burns the others' budget.
  • Tight budgets in dev. A runaway loop in a notebook can spend a month's quota in an hour. Use a separate dev key with a low cap.
  • Watch token usage in responses. Every response includes usage; surface it in logs.
  • Cache where you can. Identical embedding queries can be cached client-side.

Where to view spend

Per-key spend, budget remaining, and recent activity are shown in Settings → API Keys alongside the key list. The same view lets you rotate, revoke, and raise caps without involving an admin.

Where next

  • Errors for the full envelope around 402 and 429.
  • Authentication for key shape and rotation.
  • Usage for the API endpoint that surfaces spend programmatically.