POST /v1/embeddings returns dense vector representations of input text. The shape matches OpenAI's.
curl -s "https://api.privatemind.com/v1/embeddings" \
-H "Authorization: Bearer $PMIND_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "kalm-embedding-gemma3-12b-2511",
"input": ["the quick brown fox", "jumps over the lazy dog"]
}' | jq '.data[].embedding | length'from openai import OpenAI
client = OpenAI(base_url="https://api.privatemind.com/v1", api_key=PMIND_KEY)
resp = client.embeddings.create(
model="kalm-embedding-gemma3-12b-2511",
input=["the quick brown fox", "jumps over the lazy dog"],
)
print(len(resp.data[0].embedding))import OpenAI from 'openai';
const client = new OpenAI({ baseURL: `https://api.privatemind.com/v1`, apiKey: PMIND_KEY });
const resp = await client.embeddings.create({
model: 'kalm-embedding-gemma3-12b-2511',
input: ['the quick brown fox', 'jumps over the lazy dog'],
});
console.log(resp.data[0].embedding.length);Parameters
model(required): id of an embedding model. InGET /v1/models, embedding models report an emptysupported_parametersarray; identify them by id ormodelType.input(required): a single string or an array of strings. Batch processing is more efficient than one call per string.encoding_format(defaultfloat):floatorbase64. Base64 is ~4× smaller on the wire, useful for large batches.dimensions: truncate the output vector to N dimensions, where the model supports it. Lower dimensions trade quality for storage.
Response
JSON
{
"object": "list",
"data": [
{ "object": "embedding", "index": 0, "embedding": [0.012, -0.087, ...] },
{ "object": "embedding", "index": 1, "embedding": [0.034, 0.121, ...] }
],
"model": "<embedding-model-id>",
"usage": { "prompt_tokens": 14, "total_tokens": 14 }
}completion_tokens is always zero for embeddings. Billing is on input tokens only.
Batching
Send up to several hundred strings per request. Batching is much faster than one-call-per-string — often 10× or more. because tokenisation and dispatch overhead dominate at small batch sizes.
Where next
- Models to identify embedding models in your org's catalogue.
- Sources for the built-in ingestion + retrieval pipeline.
- Rate limits for budget and RPM behaviour on embedding calls.