Get API key

Audio

OpenAI-compatible endpoints for text-to-speech and transcription.

PrivateMind exposes OpenAI-compatible audio endpoints for synthesis and transcription.

Text to speech

POST /v1/audio/speech synthesises an audio file from text.

cURL
curl -s "https://api.privatemind.com/v1/audio/speech" \
  -H "Authorization: Bearer $PMIND_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "chatterbox-turbo",
    "voice": "default",
    "input": "Hello world."
  }' \
  --output speech.mp3
  • model (required): TTS model id. List with GET /v1/models.
  • voice (required): voice id. List available voices with GET /v1/voices.
  • input (required): text to synthesise.
  • response_format (default mp3): mp3, opus, aac, flac, wav, pcm. Engine support varies.
  • speed (default 1.0): playback speed multiplier.

The response body is the raw audio bytes.

Speech to text

POST /v1/audio/transcriptions takes an audio file and returns text.

cURL
curl -s "https://api.privatemind.com/v1/audio/transcriptions" \
  -H "Authorization: Bearer $PMIND_KEY" \
  -F "file=@call.m4a" \
  -F "model=qwen3-asr-1-7b"

Multipart form-data; Content-Type is set by curl -F automatically.

  • file (required): audio file. Supported formats: mp3, mp4, m4a, wav, webm, flac, ogg, mpga, mpeg.
  • model (required): ASR model id.
  • language: ISO-639-1 code (en, fr, de, ...). Improves accuracy and latency when known.
  • prompt: priming text. Names, jargon, or context.
  • response_format (default json): json, text, srt, verbose_json, vtt.

Default json response:

JSON
{ "text": "Hello, this is a test transcription." }

verbose_json adds per-segment timestamps and language detection. srt and vtt return ready-to-use subtitle formats.

List voices

cURL
curl -s "https://api.privatemind.com/v1/voices" \
  -H "Authorization: Bearer $PMIND_KEY"

Limits

  • Transcription accepts uploads up to 75 MB. Above that, expect 413.
  • TTS responses are buffered before being sent. Keep input under a few thousand characters.

Where next

  • Models to list audio-capable models in your org.
  • Errors for the 413 upload-size response and other status codes.
  • Rate limits for budget and RPM behaviour on audio calls.