Chat completions · PrivateMind Docs

POST /v1/chat/completions is the primary inference endpoint. Click Try it on the cURL tab to run against the demo API.

curl -s "https://api.privatemind.com/v1/chat/completions" \
  -H "Authorization: Bearer $PMIND_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fast",
    "messages": [
      {"role": "system", "content": "You are a concise assistant."},
      {"role": "user",   "content": "Explain the CAP theorem."}
    ],
    "temperature": 0.2,
    "max_tokens": 400
  }'

from openai import OpenAI

client = OpenAI(
    base_url="https://api.privatemind.com/v1",
    api_key="PMIND...:abcdef...",
)

resp = client.chat.completions.create(
    model="fast",
    messages=[
        {"role": "system", "content": "You are a concise assistant."},
        {"role": "user",   "content": "Explain the CAP theorem."},
    ],
    temperature=0.2,
    max_tokens=400,
)
print(resp.choices[0].message.content)

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.privatemind.com/v1',
  apiKey: process.env.PMIND_KEY,
});

const resp = await client.chat.completions.create({
  model: 'fast',
  messages: [
    { role: 'system', content: 'You are a concise assistant.' },
    { role: 'user',   content: 'Explain the CAP theorem.' },
  ],
  temperature: 0.2,
  max_tokens: 400,
});
console.log(resp.choices[0].message.content);

Parameters

model (required): id of a model returned by GET /v1/models.
messages (required): conversation so far. Each entry has a role (system, user, assistant, or tool) and a content string. For multimodal inputs, content may be an array. See Vision.
temperature (default 1.0): sampling temperature, 0–2. Lower is more deterministic.
top_p (default 1.0): nucleus sampling cutoff.
max_tokens: maximum tokens to generate. Hard ceiling is the model's context_length minus the prompt.
stream (default false): when true, response is delivered as SSE chunks. See Streaming.
stop: up to four stop sequences.
tools: function definitions the model may call. See Tool use.
reasoning_effort: "off", "low", "medium", or "high". Switches thinking on/off on hybrid models. See Reasoning effort below for the full behaviour matrix.
metadata: an OpenAI-shape map of up to 16 string→string tags (keys ≤64 chars, values ≤512) for attributing the call. Stored on the usage row and never forwarded to the model; query or roll up by it later. See Usage → Tagging requests.

Additional OpenAI fields (presence_penalty, frequency_penalty, response_format, seed) are forwarded to the engine. Honour depends on the model: check supported_parameters in GET /v1/models.

Reasoning effort

reasoning_effort (optional) is the standard way to switch a model's thinking on or off without passing raw chat_template_kwargs. Allowed values:

Value	Meaning
`"off"`	Disable thinking (hybrid models only).
`"low"`	Enable thinking with the lightest budget the model exposes.
`"medium"`	Enable thinking with a moderate budget.
`"high"`	Enable thinking with the largest budget the model exposes.

What the API does with the value depends on the model's runtime mode (see Models):

Runtime mode	Example models	`"off"`	`"low" / "medium" / "high"`
Hybrid	Qwen 3.5, DeepSeek V4 Pro, Kimi K2.6, Mistral Medium, Nemotron	Disables thinking	Enables thinking
Thinking-only	Qwen 3 VL Thinking	`400`	No-op (model always thinks)
Non-thinking	Gemma family	`400`	`400`

cURL

curl -s "https://api.privatemind.com/v1/chat/completions" \
  -H "Authorization: Bearer $PMIND_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "reasoning",
    "messages": [{"role": "user", "content": "Plan a four-step proof of Pythagoras."}],
    "reasoning_effort": "low"
  }'

Response

JSON

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "<model-id>",
  "choices": [
    {
      "index": 0,
      "message": { "role": "assistant", "content": "The CAP theorem ..." },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 42,
    "completion_tokens": 128,
    "total_tokens": 170
  }
}

usage.prompt_tokens and usage.completion_tokens are what your key is billed against.

Finish reasons

Value	Meaning
`stop`	Natural end of generation or a `stop` sequence matched
`length`	Hit `max_tokens` or the context window
`tool_calls`	The model wants to call one or more tools. See Tool use
`content_filter`	Output was blocked by a safety filter

Where next

Streaming for the SSE variant of this endpoint.
Tool use for function calling on top of chat completions.
Vision for the multimodal content-array shape.
Models for capability discovery against your org's catalogue.