Audio · PrivateMind Docs

PrivateMind exposes OpenAI-compatible audio endpoints for synthesis and transcription.

Text to speech

POST /v1/audio/speech synthesises an audio file from text.

cURL

curl -s "https://api.privatemind.com/v1/audio/speech" \
  -H "Authorization: Bearer $PMIND_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "chatterbox-turbo",
    "voice": "default",
    "input": "Hello world."
  }' \
  --output speech.mp3

model (required): TTS model id. List with GET /v1/models.
voice (required): voice id. List available voices with GET /v1/voices.
input (required): text to synthesise.
response_format (default mp3): mp3, opus, aac, flac, wav, pcm. Engine support varies.
speed (default 1.0): playback speed multiplier.

The response body is the raw audio bytes.

Speech to text

POST /v1/audio/transcriptions takes an audio file and returns text.

cURL

curl -s "https://api.privatemind.com/v1/audio/transcriptions" \
  -H "Authorization: Bearer $PMIND_KEY" \
  -F "file=@call.m4a" \
  -F "model=qwen3-asr-1-7b"

Multipart form-data; Content-Type is set by curl -F automatically.

file (required): audio file. Supported formats: mp3, mp4, m4a, wav, webm, flac, ogg, mpga, mpeg.
model (required): ASR model id.
language: ISO-639-1 code (en, fr, de, ...). Improves accuracy and latency when known.
prompt: priming text. Names, jargon, or context.
response_format (default json): json, text, srt, verbose_json, vtt.

Default json response:

JSON

{ "text": "Hello, this is a test transcription." }

verbose_json adds per-segment timestamps and language detection. srt and vtt return ready-to-use subtitle formats.

List voices

cURL

curl -s "https://api.privatemind.com/v1/voices" \
  -H "Authorization: Bearer $PMIND_KEY"

Limits

Transcription accepts uploads up to 75 MB. Above that, expect 413.
TTS responses are buffered before being sent. Keep input under a few thousand characters.

Where next

Models to list audio-capable models in your org.
Errors for the 413 upload-size response and other status codes.
Rate limits for budget and RPM behaviour on audio calls.