Every PrivateMind API key carries two independent controls: a budget cap (cumulative spend) and a rate limit (calls per minute). Both are enforced server-side.
Budgets
Each key has a USD budget set at creation. Every request's prompt_tokens + completion_tokens is priced and added to the key's spend. When spent_usd >= budget_usd, calls return:
HTTP/1.1 402 Payment Required{
"error": {
"message": "Key budget exhausted",
"type": "billing_error",
"code": "budget_exceeded"
}
}To recover, raise the cap in Settings → API Keys, or rotate to a key with budget left.
How spend is calculated
Each model has an input-token cost and an output-token cost. Streaming and non-streaming responses are priced identically. The final usage block on a stream is the source of truth.
Embeddings are priced on input tokens only (completion_tokens is always 0).
Spend is updated after each request completes. Mid-flight you can't observe it; instrument your client if you need finer-grained tracking.
Rate limits
Each key has a sliding-window requests-per-minute limit. Exceeding returns:
HTTP/1.1 429 Too Many Requests{
"error": {
"message": "Rate limit exceeded",
"type": "rate_limit_error",
"code": "rpm_exceeded"
}
}The rate is per-key, not per-org and not per-IP. Splitting traffic across multiple keys multiplies your effective ceiling.
Backoff
import random, time, openai
def call_with_retry(fn, max_attempts=5):
for attempt in range(max_attempts):
try:
return fn()
except openai.RateLimitError:
if attempt == max_attempts - 1:
raise
delay = min(30, (2 ** attempt) + random.random())
time.sleep(delay)Designing around the limits
- One key per workload. Don't share a key across unrelated services. One misbehaving service burns the others' budget.
- Tight budgets in dev. A runaway loop in a notebook can spend a month's quota in an hour. Use a separate dev key with a low cap.
- Watch token usage in responses. Every response includes
usage; surface it in logs. - Cache where you can. Identical embedding queries can be cached client-side.
Where to view spend
Per-key spend, budget remaining, and recent activity are shown in Settings → API Keys alongside the key list. The same view lets you rotate, revoke, and raise caps without involving an admin.
Where next
- Errors for the full envelope around
402and429. - Authentication for key shape and rotation.
- Usage for the API endpoint that surfaces spend programmatically.