Vision-capable models accept the standard OpenAI multimodal content shape: an array of typed parts instead of a plain string.
cURL
curl -s "https://api.privatemind.com/v1/chat/completions" \
-H "Authorization: Bearer $PMIND_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-vl-32b-thinking-fp8",
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{"type": "image_url", "image_url": {"url": "data:image/png;base64,iVBORw0KGgo..."}}
]
}]
}'The content array can mix any number of text and image_url parts.
Image sources
- Data URL:
data:image/<mime>;base64,<payload>. Use this when you have the image bytes locally. - HTTPS URL: a public URL the model can fetch. Subject to the model's network reachability; data URLs are more reliable.
Supported MIME types depend on the model. PNG and JPEG always work; WebP and GIF depend on the engine.
Encoding a local file
Python
import base64
from openai import OpenAI
client = OpenAI(base_url="https://api.privatemind.com/v1", api_key="PMIND...:...")
with open("photo.png", "rb") as f:
b64 = base64.b64encode(f.read()).decode()
resp = client.chat.completions.create(
model="qwen3-vl-32b-thinking-fp8",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Describe this."},
{"type": "image_url", "image_url": {"url": f"data:image/png;base64,{b64}"}},
],
}],
)
print(resp.choices[0].message.content)Which models are vision-capable?
Vision models are listed by GET /v1/models like any other. The current best signal is the model id (look for vision, vl, vlm).
Where next
- Chat completions for the underlying request shape.
- Models to identify vision-capable models in your catalogue.
- Tool use for combining vision with function calling.