Get API key

Usage

Track tokens, cost, and limits from every response.

Every chat and embedding response includes a usage block. That single object is the most reliable way to track spend in real time — no extra API calls needed.

The response usage block

Any successful chat completion or embedding call returns something like this:

JSON
{
  ...
  "usage": {
    "prompt_tokens": 412,
    "completion_tokens": 188,
    "total_tokens": 600,
    "cost_usd": 0.000915
  }
}
Field Meaning
prompt_tokens Tokens you sent (system prompt + messages, or input text).
completion_tokens Tokens the model generated in its response.
total_tokens prompt_tokens + completion_tokens. This is what counts against rate limits.
cost_usd Estimated cost of the call in USD. Streaming and non-streaming calls are priced identically.

If you need running totals, sum these values as you handle responses. There is no latency between the request and what the block reports — the server prices the call synchronously before returning it to you.

Per-key totals in the dashboard

For a human-friendly view, open the Usage tab on any API key page in the PrivateMind app. It shows total requests, total tokens, and total spend for that key. Data refreshes in real time.

Query your own usage

GET /v1/usage returns the individual usage rows for your own key. Authenticate with the same PMIND… key you call the inference endpoints with — no special header. Results are scoped to that key: you only ever see calls that key made.

cURL
curl -s "https://api.privatemind.com/v1/usage?from=2026-06-01&limit=100" \
  -H "Authorization: Bearer $PMIND_KEY"
JSON
{
  "success": true,
  "data": [
    {
      "id": 90431,
      "model": "deepseek-v4-pro",
      "prompt_tokens": 1200,
      "cached_prompt_tokens": 320,
      "completion_tokens": 280,
      "cost_usd": 0.0041,
      "latency_ms": 1840,
      "status_code": 200,
      "created_at": "2026-06-08T10:12:04Z",
      "metadata": { "project": "load-company-data", "stage": "webcrawl" }
    }
  ],
  "pagination": { "limit": 100, "offset": 0, "has_more": false }
}

Query parameters, all optional:

Param Meaning
model Only rows for this model id.
metadata Tag filter — a URL-encoded JSON object: metadata={"project":"x"}. Matches rows whose tags contain every pair you list. See Tagging requests.
from / to RFC 3339 datetime or YYYY-MM-DD date. Defaults to the last 30 days.
limit / offset Page size (default 100, max 1000) and offset. pagination.has_more is true when another page exists.

Break usage down by tag, model, or day

GET /v1/usage/summary does the rollup server-side, so "tokens per project" is one call instead of paging every row. Same auth and filters as /v1/usage, plus a required group_by:

cURL
# Spend per project
curl -s "https://api.privatemind.com/v1/usage/summary?group_by=metadata.project" \
  -H "Authorization: Bearer $PMIND_KEY"

# Sub-costs within one project
curl -s 'https://api.privatemind.com/v1/usage/summary?group_by=metadata.stage&metadata={"project":"load-company-data"}' \
  -H "Authorization: Bearer $PMIND_KEY"
JSON
{
  "success": true,
  "group_by": "metadata.project",
  "data": [
    { "group_value": "load-company-data", "requests": 4120, "prompt_tokens": 5680000, "cached_prompt_tokens": 910000, "completion_tokens": 1240000, "total_tokens": 6920000, "cost_usd": 18.42 },
    { "group_value": "sentiment-batch",   "requests": 1300, "prompt_tokens": 980000,  "cached_prompt_tokens": 0, "completion_tokens": 210000, "total_tokens": 1190000, "cost_usd": 4.10 },
    { "group_value": null,                "requests": 88,   "prompt_tokens": 40000,   "cached_prompt_tokens": 0, "completion_tokens": 9000,   "total_tokens": 49000,   "cost_usd": 0.20 }
  ],
  "pagination": { "limit": 100, "offset": 0, "has_more": false }
}

group_by accepts:

Value Groups by
model Model id
day UTC calendar date of the request
key_id The key that issued the call
workspace Traffic source (chat, embeddings, …)
metadata.<key> A request-tag value, e.g. metadata.project

The null bucket is untagged traffic. Groups are ordered by cost. The window may span at most 366 days — page longer ranges in sub-windows.

Tagging requests

Attach an OpenAI-shape metadata map to any chat completion to label it, then filter or group your usage by those labels. Up to 16 string→string pairs (keys ≤64 chars, values ≤512):

cURL
curl -s "https://api.privatemind.com/v1/chat/completions" \
  -H "Authorization: Bearer $PMIND_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fast",
    "messages": [{"role": "user", "content": "…"}],
    "metadata": {"project": "load-company-data", "stage": "webcrawl"}
  }'

Tags are stored on the usage row and never forwarded to the model. A request can carry several tags at once, so a single call rolls up under both project=load-company-data and stage=webcrawl. A map that breaks the limits (too many pairs, non-string values, over-length) returns a 400.

Where next