PrivateMind enforces limits at three levels: per-API-key, per-user, and per-organization. PrivateMind enforces limits at three levels: per API key, per user, and per organization.
Organization limits
Org admins control budgets from /settings/org/overview and /settings/org/chat-usage. What you see depends on whether your organization meters usage by tokens or by cost.
Token-based organizations
The Overview page at /settings/org/overview shows:
- Token limit / user / month: a badge on the Token Usage card
- Total Tokens (this month): sum across the org for the current calendar month
- Avg Tokens / User: total divided by user count
- Org Allocation Used: percentage with a progress bar
- Top Users by Token Usage: table with each user's tokens and their share of the per-user cap
When the allocation hits 90%, a warning appears. Usage above the cap is throttled via standard HTTP rate-limit responses (see Rate limits & budgets).
You cannot raise the per-user token cap from the org admin pages. Contact your PrivateMind account contact to request an increase.
Cost-based organizations
If your org meters by cost, the Overview page shows a Cost stat instead of Tokens. Every API call and chat message is priced using each model's input and output rates; the result is stored against the calling user.
The Chat Usage page at /settings/org/chat-usage is the cost dashboard. It has three tabs:
- Overview: total spend, total budget, count of users over budget, top spenders chart
- Users & Budgets: every user's monthly budget, current spend, percent used, and an Edit action
- Models: usage broken down by model with token and cost totals
Editing a user's budget
From the Users & Budgets tab, click any row to open the user detail panel, then Edit Budget. The dialog accepts a new monthly cap in dollars. Save and the change applies immediately; the user's next request is metered against the new ceiling.
What "monthly" means
Budgets reset on the first of each calendar month, UTC. There is no rolling 30-day window. A budget raised mid-month carries into the next month at the new value unless you lower it again.
How spend is calculated
Each request bills prompt_tokens × input_rate + completion_tokens × output_rate for chat. Embeddings bill input tokens only. Streamed and non-streamed responses are priced identically. The final usage block on a stream is the source of truth.
Spend is updated after each request completes. There is no mid-flight observation; instrument clients if you need finer-grained tracking. See Usage for the API-side view.
Per-API-key budgets and rate limits
The org-level budgets above are about chat-app users. PrivateMind also issues per-API-key budgets and per-key rate limits, which are documented from the developer's angle in Rate limits & budgets.
The key thing for an org admin to know:
- API keys belong to a user. Their spend rolls up to that user's monthly total.
- Rate limits are per-key. A user with two keys gets two RPM windows.
- Budgets are per-key. A user's overall spend is the sum of their keys' spend plus their chat-app spend, all measured against their monthly budget.
Users mint and manage their own keys from /settings/api-keys. Org admins do not mint keys for other users from the admin pages today.
Limits in action
When a user hits their budget or token cap:
- API requests return
402 Payment Required(cost-based budgets) or429 Too Many Requests(token budgets). - Chat messages in the app return a friendly error directing the user to ask their org admin.
There is no automatic top-up. Either the budget is raised, the calendar rolls over, or the user is blocked.
Where next
- Audit logs: every budget edit is logged with who, when, and the new value
- Rate limits & budgets: the developer-side reference for
402,429, and per-key behaviour - Usage: the API endpoint for programmatic spend lookup