Set "stream": true on a chat-completions request and the API responds with Server-Sent Events instead of a single JSON body.
curl -N "https://api.privatemind.com/v1/chat/completions" \
-H "Authorization: Bearer $PMIND_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "fast",
"messages": [{"role": "user", "content": "Write a haiku about TCP."}],
"stream": true
}'curl -N disables curl's output buffering so chunks print as they arrive.
Wire format
Each event is a data: <json> line:
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant"},"index":0}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":"Packets"},"index":0}]}
...
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{},"index":0,"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":28,"total_tokens":40}}
data: [DONE]choices[].delta.contentis the incremental token text. Concatenate to build the full response.- The last non-DONE chunk carries
usage. PrivateMind always requestsstream_options.include_usageso you get token counts at the end without a second call. data: [DONE]terminates the stream. It is not JSON. Treat as a sentinel.
SDK examples
from openai import OpenAI
client = OpenAI(base_url="https://api.privatemind.com/v1", api_key="PMIND...:...")
stream = client.chat.completions.create(
model="<model-id>",
messages=[{"role": "user", "content": "Write a haiku about TCP."}],
stream=True,
)
for chunk in stream:
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)Reasoning output
Reasoning models stream their chain-of-thought in a separate field, delta.reasoning, and keep delta.content for the clean final answer — no inline <think>…</think> wrapper. Non-streaming responses carry the same text in message.reasoning.
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"reasoning":"Let me work through"},"index":0}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"reasoning":" the proof step by step…"},"index":0}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":"The answer is 42."},"index":0}]}Render or collapse delta.reasoning as you like; concatenate delta.content for the answer. Feature-detect with reasoning in supported_parameters, and turn thinking on or off per request with reasoning_effort.
When to use streaming
- Interactive UIs. Show output as it arrives.
- Long generations. Render progress instead of a spinner.
- Cost capture mid-flight. Final chunk's
usageblock tells you the exact spend.
For batch jobs or short responses, the non-streaming endpoint is simpler.
Where next
- Chat completions for the non-streaming variant and the full parameter list.
- Tool use for how tool calls surface inside a stream.
- Errors for how errors appear mid-stream.