POST /v1/chat/completions is the primary inference endpoint. Click Try it on the cURL tab to run against the demo API.
curl -s "https://api.privatemind.com/v1/chat/completions" \
-H "Authorization: Bearer $PMIND_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "fast",
"messages": [
{"role": "system", "content": "You are a concise assistant."},
{"role": "user", "content": "Explain the CAP theorem."}
],
"temperature": 0.2,
"max_tokens": 400
}'from openai import OpenAI
client = OpenAI(
base_url="https://api.privatemind.com/v1",
api_key="PMIND...:abcdef...",
)
resp = client.chat.completions.create(
model="fast",
messages=[
{"role": "system", "content": "You are a concise assistant."},
{"role": "user", "content": "Explain the CAP theorem."},
],
temperature=0.2,
max_tokens=400,
)
print(resp.choices[0].message.content)import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://api.privatemind.com/v1',
apiKey: process.env.PMIND_KEY,
});
const resp = await client.chat.completions.create({
model: 'fast',
messages: [
{ role: 'system', content: 'You are a concise assistant.' },
{ role: 'user', content: 'Explain the CAP theorem.' },
],
temperature: 0.2,
max_tokens: 400,
});
console.log(resp.choices[0].message.content);Parameters
model(required): id of a model returned byGET /v1/models.messages(required): conversation so far. Each entry has arole(system,user,assistant, ortool) and acontentstring. For multimodal inputs,contentmay be an array. See Vision.temperature(default1.0): sampling temperature,0–2. Lower is more deterministic.top_p(default1.0): nucleus sampling cutoff.max_tokens: maximum tokens to generate. Hard ceiling is the model'scontext_lengthminus the prompt.stream(defaultfalse): whentrue, response is delivered as SSE chunks. See Streaming.stop: up to four stop sequences.tools: function definitions the model may call. See Tool use.reasoning_effort:"off","low","medium", or"high". Switches thinking on/off on hybrid models. See Reasoning effort below for the full behaviour matrix.metadata: an OpenAI-shape map of up to 16 string→string tags (keys ≤64 chars, values ≤512) for attributing the call. Stored on the usage row and never forwarded to the model; query or roll up by it later. See Usage → Tagging requests.
Additional OpenAI fields (presence_penalty, frequency_penalty, response_format, seed) are forwarded to the engine. Honour depends on the model: check supported_parameters in GET /v1/models.
Reasoning effort
reasoning_effort (optional) is the standard way to switch a model's thinking on or off without passing raw chat_template_kwargs. Allowed values:
| Value | Meaning |
|---|---|
"off" |
Disable thinking (hybrid models only). |
"low" |
Enable thinking with the lightest budget the model exposes. |
"medium" |
Enable thinking with a moderate budget. |
"high" |
Enable thinking with the largest budget the model exposes. |
What the API does with the value depends on the model's runtime mode (see Models):
| Runtime mode | Example models | "off" |
"low" / "medium" / "high" |
|---|---|---|---|
| Hybrid | Qwen 3.5, DeepSeek V4 Pro, Kimi K2.6, Mistral Medium, Nemotron | Disables thinking | Enables thinking |
| Thinking-only | Qwen 3 VL Thinking | 400 |
No-op (model always thinks) |
| Non-thinking | Gemma family | 400 |
400 |
cURL
curl -s "https://api.privatemind.com/v1/chat/completions" \
-H "Authorization: Bearer $PMIND_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "reasoning",
"messages": [{"role": "user", "content": "Plan a four-step proof of Pythagoras."}],
"reasoning_effort": "low"
}'Response
JSON
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"model": "<model-id>",
"choices": [
{
"index": 0,
"message": { "role": "assistant", "content": "The CAP theorem ..." },
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 42,
"completion_tokens": 128,
"total_tokens": 170
}
}usage.prompt_tokens and usage.completion_tokens are what your key is billed against.
Finish reasons
| Value | Meaning |
|---|---|
stop |
Natural end of generation or a stop sequence matched |
length |
Hit max_tokens or the context window |
tool_calls |
The model wants to call one or more tools. See Tool use |
content_filter |
Output was blocked by a safety filter |