Models · PrivateMind Docs

The catalogue changes per environment and per org. Call GET /v1/models to see what's available. Don't assume a model exists or supports a given feature.

Loading model catalog...

List models

cURL

curl -s "https://api.privatemind.com/v1/models" \
  -H "Authorization: Bearer $PMIND_KEY"

Returns an OpenAI-shaped response. Each model carries a model_type, a structured capabilities block, the derived supported_parameters list, any role aliases (such as "fast"), a published cost, and, when configured, a model_full_name and model_icon_url for building a model picker:

JSON

{
  "object": "list",
  "data": [
    {
      "id": "fast",
      "object": "model",
      "owned_by": "privatemind",
      "context_length": 128000,
      "model_type": "vision-chat",
      "model_full_name": "Gemma 4",
      "model_icon_url": "/model-icons/gemma.svg",
      "capabilities": {
        "tools": true,
        "response_format": true,
        "reasoning_effort": true,
        "image_input": true
      },
      "supported_parameters": [
        "max_tokens", "temperature", "top_p", "top_k", "seed",
        "frequency_penalty", "presence_penalty", "stream", "stop",
        "reasoning", "include_reasoning",
        "tools", "tool_choice", "response_format", "reasoning_effort"
      ],
      "aliases": ["fast"],
      "cost": {
        "input_per_m_token": 0.27,
        "output_per_m_token": 0.85
      }
    }
  ]
}

Capability discovery

Feature-detect against capabilities and supported_parameters, not the model id.

`capabilities`

Structured boolean flags. Only the capabilities a model has are present (an absent flag means false):

Flag	Meaning
`tools`	Accepts `tools` / `tool_choice` for function calling. See Tool use
`response_format`	Accepts `response_format` for JSON mode / structured outputs
`reasoning_effort`	Supports the `reasoning_effort` parameter (hybrid and thinking-only models)
`image_input`	Accepts images in chat messages. See Vision

`supported_parameters`

OpenRouter-style list derived from the model's capabilities plus its base sampling controls, a complete list of what the model accepts. Includes reasoning / include_reasoning when the model emits chain-of-thought (see Streaming → Reasoning output), and reasoning_effort, tools, response_format mirroring the flags above. An empty list means the model is not chat-shaped.

`model_type`

What kind of model it is, so you can route to the right endpoint without trial and error:

`model_type`	Endpoint
`chat`, `vision-chat`	`/v1/chat/completions` (vision-chat also accepts image input)
`embeddings`	`/v1/embeddings`
`reranker`	`/v1/rerank`
`ocr`, `ocr-vision`	`/v1/chat/completions` (document extraction)
`tts`	`/v1/audio/speech`
`asr`	`/v1/audio/transcriptions`

The Hybrid badge in the table above flags models that can run in either thinking or non-thinking mode on a per-request basis. Toggle the mode with the reasoning_effort field on /v1/chat/completions: "off" disables thinking, "low"/"medium"/"high" enables it at increasing budgets.

Display metadata

Two optional fields help you label models in a picker instead of showing raw id slugs. Both are omitted when not set. Treat absence as "no value", not an error, and always keep id as the value you send back as model.

Field	Meaning
`model_full_name`	Human-friendly display name (for example `"Gemma 4"` rather than `fast`). Falls back to `id` when absent.
`model_icon_url`	Relative path to a PrivateMind-hosted brand icon for the model (for example `/model-icons/gemma.svg`), used by the PrivateMind UI. It points at PrivateMind's own assets, so unless you're embedding that UI you'll generally show your own icons. Absent for models with no brand icon.

Pricing

When a model has a published price, it carries a cost object. Rates are in USD per 1,000,000 tokens (per 1M):

Field	Meaning
`input_per_m_token`	USD per 1M prompt (input) tokens
`output_per_m_token`	USD per 1M completion (output) tokens
`image_per_generation`	USD per generated image. Present only on image-generation models; omitted otherwise

cost is the published list price for the model, and the rate to use when estimating spend before a call. The object is omitted for models with no published price.

To estimate a request: prompt_tokens / 1e6 × input_per_m_token + completion_tokens / 1e6 × output_per_m_token. Embeddings bill input tokens only. The authoritative spend for any request is the usage block on the response, not an up-front estimate. See Rate limits & budgets.

Context length

context_length is the maximum total token window (prompt + completion). Set max_tokens to fit inside this limit. Exceeding the window returns a 400.

What the list contains

Sourced from the running PrivateMind deployment and filtered by:

What's actually live in this environment
What your org is permitted to use

The list you see is exactly the list of model ids you can call.

Where next

Ready to make a call? Head over to Chat Completions to send your first request using one of these IDs or aliases.

Chat completions for the request shape that consumes model ids.
Tool use and Vision for the capabilities you'll feature-detect against.
Errors for what happens when a model is unavailable or out of budget.