Get API key

Streaming

Server-Sent Events for token-by-token responses.

Set "stream": true on a chat-completions request and the API responds with Server-Sent Events instead of a single JSON body.

cURL
curl -N "https://api.privatemind.com/v1/chat/completions" \
  -H "Authorization: Bearer $PMIND_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fast",
    "messages": [{"role": "user", "content": "Write a haiku about TCP."}],
    "stream": true
  }'

curl -N disables curl's output buffering so chunks print as they arrive.

Wire format

Each event is a data: <json> line:

Text
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant"},"index":0}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":"Packets"},"index":0}]}

...

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{},"index":0,"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":28,"total_tokens":40}}

data: [DONE]
  • choices[].delta.content is the incremental token text. Concatenate to build the full response.
  • The last non-DONE chunk carries usage. PrivateMind always requests stream_options.include_usage so you get token counts at the end without a second call.
  • data: [DONE] terminates the stream. It is not JSON. Treat as a sentinel.

SDK examples

Python
from openai import OpenAI
client = OpenAI(base_url="https://api.privatemind.com/v1", api_key="PMIND...:...")

stream = client.chat.completions.create(
    model="<model-id>",
    messages=[{"role": "user", "content": "Write a haiku about TCP."}],
    stream=True,
)
for chunk in stream:
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Reasoning output

Reasoning models stream their chain-of-thought in a separate field, delta.reasoning, and keep delta.content for the clean final answer — no inline <think>…</think> wrapper. Non-streaming responses carry the same text in message.reasoning.

Text
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"reasoning":"Let me work through"},"index":0}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"reasoning":" the proof step by step…"},"index":0}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":"The answer is 42."},"index":0}]}

Render or collapse delta.reasoning as you like; concatenate delta.content for the answer. Feature-detect with reasoning in supported_parameters, and turn thinking on or off per request with reasoning_effort.

When to use streaming

  • Interactive UIs. Show output as it arrives.
  • Long generations. Render progress instead of a spinner.
  • Cost capture mid-flight. Final chunk's usage block tells you the exact spend.

For batch jobs or short responses, the non-streaming endpoint is simpler.

Where next

  • Chat completions for the non-streaming variant and the full parameter list.
  • Tool use for how tool calls surface inside a stream.
  • Errors for how errors appear mid-stream.