API Reference#

Base URL#

The server listens on http://<host>:<port>.

Generate speech from text (and optional audio) and return a WAV response.

text (string, required): Input prompt.
audio (file, optional): Input audio for STS-capable models.
streaming (bool, optional, default true): - true streams WAV chunks as they are produced. - false returns a single WAV file after completion.

curl -X POST "http://localhost:8000/generate" \
  -F "text=Hello world" \
  -F "streaming=true" \
  -o output.wav

curl -X POST "http://localhost:8000/generate" \
  -F "text=Hello world" \
  -F "audio=@input.wav" \
  -F "streaming=true" \
  -o output.wav

Health check endpoint.

JSON payload:

{"status": "healthy"}

VoxServe: A serving system for Speech Language Models.