Usage · PrivateMind Docs

Every chat and embedding response includes a usage block. That single object is the most reliable way to track spend in real time, no extra API calls needed.

The response `usage` block

Any successful chat completion or embedding call returns something like this:

JSON

{
  ...
  "usage": {
    "prompt_tokens": 412,
    "completion_tokens": 188,
    "total_tokens": 600,
    "cost_usd": 0.000915
  }
}

Field	Meaning
`prompt_tokens`	Tokens you sent (system prompt + messages, or input text).
`completion_tokens`	Tokens the model generated in its response.
`total_tokens`	`prompt_tokens + completion_tokens`. This is what counts against rate limits.
`cost_usd`	Estimated cost of the call in USD. Streaming and non-streaming calls are priced identically.

If you need running totals, sum these values as you handle responses. There is no latency between the request and what the block reports: the server prices the call synchronously before returning it to you.

Per-key totals in the dashboard

For a human-friendly view, open the Usage tab on any API key page in the PrivateMind app. It shows total requests, total tokens, and total spend for that key. Data refreshes in real time.

Query your own usage

GET /v1/usage returns the individual usage rows for your own key. Authenticate with the same PMIND… key you call the inference endpoints with, no special header. Results are scoped to that key: you only ever see calls that key made.

cURL

curl -s "https://api.privatemind.com/v1/usage?from=2026-06-01&limit=100" \
  -H "Authorization: Bearer $PMIND_KEY"

JSON

{
  "success": true,
  "data": [
    {
      "id": 90431,
       "model": "fast",
      "prompt_tokens": 1200,
      "cached_prompt_tokens": 320,
      "completion_tokens": 280,
      "cost_usd": 0.0041,
      "latency_ms": 1840,
      "status_code": 200,
      "created_at": "2026-06-08T10:12:04Z",
      "metadata": { "project": "load-company-data", "stage": "webcrawl" }
    }
  ],
  "pagination": { "limit": 100, "offset": 0, "has_more": false }
}

Query parameters, all optional:

Param	Meaning
`model`	Only rows for this model id.
`metadata`	Tag filter, a URL-encoded JSON object: `metadata={"project":"x"}`. Matches rows whose tags contain every pair you list. See Tagging requests.
`from` / `to`	RFC 3339 datetime or `YYYY-MM-DD` date. Defaults to the last 30 days.
`limit` / `offset`	Page size (default 100, max 1000) and offset. `pagination.has_more` is `true` when another page exists.

Break usage down by tag, model, or day

GET /v1/usage/summary does the rollup server-side, so "tokens per project" is one call instead of paging every row. Same auth and filters as /v1/usage, plus a required group_by:

cURL

# Spend per project
curl -s "https://api.privatemind.com/v1/usage/summary?group_by=metadata.project" \
  -H "Authorization: Bearer $PMIND_KEY"

# Sub-costs within one project
curl -s 'https://api.privatemind.com/v1/usage/summary?group_by=metadata.stage&metadata={"project":"load-company-data"}' \
  -H "Authorization: Bearer $PMIND_KEY"

JSON

{
  "success": true,
  "group_by": "metadata.project",
  "data": [
    { "group_value": "load-company-data", "requests": 4120, "prompt_tokens": 5680000, "cached_prompt_tokens": 910000, "completion_tokens": 1240000, "total_tokens": 6920000, "cost_usd": 18.42 },
    { "group_value": "sentiment-batch",   "requests": 1300, "prompt_tokens": 980000,  "cached_prompt_tokens": 0, "completion_tokens": 210000, "total_tokens": 1190000, "cost_usd": 4.10 },
    { "group_value": null,                "requests": 88,   "prompt_tokens": 40000,   "cached_prompt_tokens": 0, "completion_tokens": 9000,   "total_tokens": 49000,   "cost_usd": 0.20 }
  ],
  "pagination": { "limit": 100, "offset": 0, "has_more": false }
}

group_by accepts:

Value	Groups by
`model`	Model id
`day`	UTC calendar date of the request
`key_id`	The key that issued the call
`workspace`	Traffic source (`chat`, `embeddings`, …)
`metadata.<key>`	A request-tag value, e.g. `metadata.project`

The null bucket is untagged traffic. Groups are ordered by cost. The window may span at most 366 days. Page longer ranges in sub-windows.

Tagging requests

Attach an OpenAI-shape metadata map to any chat completion to label it, then filter or group your usage by those labels. Up to 16 string→string pairs (keys ≤64 chars, values ≤512):

cURL

curl -s "https://api.privatemind.com/v1/chat/completions" \
  -H "Authorization: Bearer $PMIND_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fast",
    "messages": [{"role": "user", "content": "…"}],
    "metadata": {"project": "load-company-data", "stage": "webcrawl"}
  }'

Tags are stored on the usage row and never forwarded to the model. A request can carry several tags at once, so a single call rolls up under both project=load-company-data and stage=webcrawl. A map that breaks the limits (too many pairs, non-string values, over-length) returns a 400.

Where next

Chat completions: the metadata field you set tags with.
Rate limits & budgets: how total_tokens maps to rate-limit buckets, and how to read limit headers.
Authentication: how to use your PMIND… API key.

The response usage block

Per-key totals in the dashboard

Query your own usage

Break usage down by tag, model, or day

Tagging requests

Where next

The response `usage` block