PrivateMind exposes OpenAI-compatible audio endpoints for synthesis and transcription.
Text to speech
POST /v1/audio/speech synthesises an audio file from text.
cURL
curl -s "https://api.privatemind.com/v1/audio/speech" \
-H "Authorization: Bearer $PMIND_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "chatterbox-turbo",
"voice": "default",
"input": "Hello world."
}' \
--output speech.mp3model(required): TTS model id. List withGET /v1/models.voice(required): voice id. List available voices withGET /v1/voices.input(required): text to synthesise.response_format(defaultmp3):mp3,opus,aac,flac,wav,pcm. Engine support varies.speed(default1.0): playback speed multiplier.
The response body is the raw audio bytes.
Speech to text
POST /v1/audio/transcriptions takes an audio file and returns text.
cURL
curl -s "https://api.privatemind.com/v1/audio/transcriptions" \
-H "Authorization: Bearer $PMIND_KEY" \
-F "file=@call.m4a" \
-F "model=qwen3-asr-1-7b"Multipart form-data; Content-Type is set by curl -F automatically.
file(required): audio file. Supported formats:mp3,mp4,m4a,wav,webm,flac,ogg,mpga,mpeg.model(required): ASR model id.language: ISO-639-1 code (en,fr,de, ...). Improves accuracy and latency when known.prompt: priming text. Names, jargon, or context.response_format(defaultjson):json,text,srt,verbose_json,vtt.
Default json response:
JSON
{ "text": "Hello, this is a test transcription." }verbose_json adds per-segment timestamps and language detection. srt and vtt return ready-to-use subtitle formats.
List voices
cURL
curl -s "https://api.privatemind.com/v1/voices" \
-H "Authorization: Bearer $PMIND_KEY"Limits
- Transcription accepts uploads up to 75 MB. Above that, expect
413. - TTS responses are buffered before being sent. Keep
inputunder a few thousand characters.
Where next
- Models to list audio-capable models in your org.
- Errors for the
413upload-size response and other status codes. - Rate limits for budget and RPM behaviour on audio calls.