Get API key

Vision

Multimodal chat: text plus images, in the OpenAI content-array shape.

Vision-capable models accept the standard OpenAI multimodal content shape: an array of typed parts instead of a plain string.

cURL
curl -s "https://api.privatemind.com/v1/chat/completions" \
  -H "Authorization: Bearer $PMIND_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-vl-32b-thinking-fp8",
    "messages": [{
      "role": "user",
      "content": [
        {"type": "text", "text": "What is in this image?"},
        {"type": "image_url", "image_url": {"url": "data:image/png;base64,iVBORw0KGgo..."}}
      ]
    }]
  }'

The content array can mix any number of text and image_url parts.

Image sources

  • Data URL: data:image/<mime>;base64,<payload>. Use this when you have the image bytes locally.
  • HTTPS URL: a public URL the model can fetch. Subject to the model's network reachability; data URLs are more reliable.

Supported MIME types depend on the model. PNG and JPEG always work; WebP and GIF depend on the engine.

Encoding a local file

Python
import base64
from openai import OpenAI

client = OpenAI(base_url="https://api.privatemind.com/v1", api_key="PMIND...:...")

with open("photo.png", "rb") as f:
    b64 = base64.b64encode(f.read()).decode()

resp = client.chat.completions.create(
    model="qwen3-vl-32b-thinking-fp8",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Describe this."},
            {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{b64}"}},
        ],
    }],
)
print(resp.choices[0].message.content)

Which models are vision-capable?

Vision models are listed by GET /v1/models like any other. The current best signal is the model id (look for vision, vl, vlm).

Where next

  • Chat completions for the underlying request shape.
  • Models to identify vision-capable models in your catalogue.
  • Tool use for combining vision with function calling.