Text-to-Speech (TTS) Service User Guide¶
Welcome! This guide will help you quickly get started with the Language Server Text-to-Speech (TTS) API. You'll learn how to convert text into speech using a simple API call, understand the required parameters, and see example requests and responses.
Supported Providers: Azure, Google, Sarvam, and Open Source (Indic-Parler)
Quickstart¶
- Get your API key from your admin or dashboard.
- Choose your provider (Azure, Google, Sarvam, or Karya Local).
- Prepare your text and language code (see supported languages below).
- Send a POST request to the TTS endpoint with the required headers and form data.
- Download your audio from the response URLs.
1. Request Headers¶
Include these headers in every request:
Accept: application/json
X-API-Key: YOUR_API_KEY # Replace with your actual API key
Content-Type: multipart/form-data
2. Request Body: Form Data¶
Send the request body as multipart/form-data with a field named payload_data (see below for structure).
Synchronous Only
Text-to-Speech is only available as a synchronous (real-time) service. Unlike transcription and completion, there is no batch/async TTS endpoint — results are returned directly in the API response.
3. Endpoint¶
- URL:
https://languageserver.karya.in/v1/task/text-to-speech - Method:
POST
4. Example Request (cURL)¶
curl -X POST 'https://languageserver.karya.in/v1/task/text-to-speech' \
-H 'accept: application/json' \
-H 'X-API-Key: YOUR_API_KEY' \
-H 'Content-Type: multipart/form-data' \
-F 'payload_data={
"user_email": "your_email@example.com",
"project_name": "Individual",
"provider": "AZURE",
"models": ["NEURAL"],
"voice_gender": "FEMALE",
"task_name": "my_first_tts_task_unique_001",
"datapoints": {
"datapoint 1": {"text": "नमस्ते, यह एक परीक्षण है।", "speed": "1.0x", "lang": "hi-IN"}
}
}'
5. payload_data Structure¶
This is a JSON string containing your task and datapoints. Example:
{
"user_email": "your_email@example.com",
"project_name": "your_project_name",
"task_name": "my_first_tts_task_001",
"provider": "AZURE",
"models": ["NEURAL"],
"voice_gender": "FEMALE",
"datapoints": {
"datapoint 1": {
"text": "नमस्ते, यह एक परीक्षण है।",
"speed": "1.0x",
"lang": "hi-IN"
},
"datapoint 2": {
"text": "This is a test for English audio generation.",
"speed": "1.2x",
"lang": "en-IN"
}
}
}
Important: The
modelsfield is mandatory. It must be a non-empty list of model names (e.g.,["NEURAL"]). Omitting this field or leaving it empty will result in an error.
You can add as many datapoints as you need.
Required Fields in payload_data¶
| Field | Description | Type |
|---|---|---|
user_email |
Your registered email address | string |
project_name |
Name of your project | string |
provider |
Backend provider (see table below) | string |
models |
Required. List of model names to use (e.g., ["NEURAL"]). |
list of strings |
voice_gender |
Optional. Voice gender: "MALE" or "FEMALE". Defaults to "FEMALE" if not provided. |
string |
datapoints |
Dictionary of datapoint objects. Each key (e.g., "datapoint 1") is a unique identifier. |
dict |
Optional Fields in payload_data¶
| Field | Description | Type | Default |
|---|---|---|---|
task_name |
Optional label for the task (does not need to be unique) | string | Auto-generated |
Provider & Model Mapping¶
| Provider | provider value |
Required models value |
Notes |
|---|---|---|---|
| Azure | "AZURE" |
["NEURAL"] |
Commercial. Only the model name NEURAL is supported. |
"GOOGLE" |
["WAVENET"] or ["STANDARD"] |
Commercial. Use ["WAVENET"] for premium, ["STANDARD"] for basic. |
|
| Sarvam | "SARVAM" |
["BULBULV2"] |
Commercial. Only BULBULV2 is supported. |
| Karya Local (Indic-Parler) | "KARYA_LOCAL" |
["INDIC_PARLER"] |
Open Source. Requires a separately deployed Indic-Parler TTS service. Contact the server administrator to confirm availability. |
Tip: For personal use, set
project_nameto"Individual".
Available Models for Each Provider¶
Azure¶
Set models to ["NEURAL"] (the only supported model for Azure TTS).
Google¶
Set models to either ["WAVENET"] (premium) or ["STANDARD"] (basic).
Google WAVENET Fallback: If WAVENET is not available for a specific language (e.g., Telugu), the server automatically falls back to the STANDARD model.
Sarvam¶
Set models to ["BULBULV2"] (the only supported model).
Karya Local (Indic-Parler)¶
Set models to ["INDIC_PARLER"] (the only supported model).
Warning: Leaving
modelsas["default"]or omitting it will not work. You must specify the correct model for your provider as shown above.Note: - If a model exists but is currently unserviceable (e.g., due to downtime or errors), the system will fall back to the most basic model for that provider.
Each datapoint must include:
| Field | Description | Type |
|---|---|---|
text |
Text to convert to speech | string |
speed |
Playback speed as a string ending with "x" (e.g., "1.0x", "1.2x"). Valid ranges differ by provider — see table below. |
string |
lang |
Language code (see below) | string |
Speed Ranges by Provider¶
| Provider | Speed Range | Notes |
|---|---|---|
| Azure | Any positive value | Interpreted as percentage adjustment |
0.25x – 4.0x |
Values outside range are clamped silently | |
| Sarvam | 0.5x – 2.0x |
Values outside range are clamped silently |
| Karya Local | Any positive value | Converted to "slow" (< 0.9x), "moderate" (0.9x–1.1x), or "fast" (> 1.1x) |
> Note on Voice Gender: The voice_gender field is specified at the task level (not per datapoint) and applies to all datapoints in the request. If not provided, it defaults to "FEMALE". |
About the models Field¶
- The
modelsfield is required for all TTS requests. - It must be a non-empty list of model names. For most providers, use
["NEURAL"]. - If you omit this field or provide an empty list, your request will be rejected.
- The available model names may vary by provider. See the Provider & Model Mapping table above for details.
About the voice_gender Field¶
- The
voice_genderfield is optional and can be set to"MALE"or"FEMALE". - If not provided, it defaults to
"FEMALE". - This is a task-level setting that applies to all datapoints in the request.
- Voice availability varies by provider and language. Some provider/language pairs currently expose only one configured gender.
Supported Languages¶
| Language | Code | Supported By |
|---|---|---|
| Assamese | as-IN |
Azure, Sarvam, Karya Local |
| Bengali | bn-IN |
All providers |
| Bodo | brx-IN |
Karya Local only |
| Chhattisgarhi | hne-IN |
Karya Local only |
| Dogri | doi-IN |
Karya Local only |
| English (India) | en-IN |
All providers |
| Gujarati | gu-IN |
All providers |
| Hindi | hi-IN |
All providers |
| Kannada | kn-IN |
All providers |
| Malayalam | ml-IN |
All providers |
| Manipuri | mni-IN |
Karya Local only |
| Marathi | mr-IN |
All providers |
| Nepali | ne-IN |
Karya Local only |
| Odia | or-IN |
Azure, Sarvam, Karya Local |
| Punjabi | pa-IN |
All providers |
| Sanskrit | sa-IN |
Karya Local only |
| Tamil | ta-IN |
All providers |
| Telugu | te-IN |
All providers* |
| Urdu | ur-IN |
Azure, Google |
*Telugu uses STANDARD model only for Google (WAVENET not available; falls back automatically)
Provider Language Support¶
| Language | Azure | Sarvam | Karya Local | |
|---|---|---|---|---|
| Assamese (as-IN) | ✅ | ❌ | ✅ | ✅ |
| Bengali (bn-IN) | ✅ | ✅ | ✅ | ✅ |
| Bodo (brx-IN) | ❌ | ❌ | ❌ | ✅ |
| Chhattisgarhi (hne-IN) | ❌ | ❌ | ❌ | ✅ |
| Dogri (doi-IN) | ❌ | ❌ | ❌ | ✅ |
| English (en-IN) | ✅ | ✅ | ✅ | ✅ |
| Gujarati (gu-IN) | ✅ | ✅ | ✅ | ✅ |
| Hindi (hi-IN) | ✅ | ✅ | ✅ | ✅ |
| Kannada (kn-IN) | ✅ | ✅ | ✅ | ✅ |
| Malayalam (ml-IN) | ✅ | ✅ | ✅ | ✅ |
| Manipuri (mni-IN) | ❌ | ❌ | ❌ | ✅ |
| Marathi (mr-IN) | ✅ | ✅ | ✅ | ✅ |
| Nepali (ne-IN) | ❌ | ❌ | ❌ | ✅ |
| Odia (or-IN) | ✅ | ❌ | ✅ | ✅ |
| Punjabi (pa-IN) | ✅ | ✅ | ✅ | ✅ |
| Sanskrit (sa-IN) | ❌ | ❌ | ❌ | ✅ |
| Tamil (ta-IN) | ✅ | ✅ | ✅ | ✅ |
| Telugu (te-IN) | ✅ | ✅* | ✅ | ✅ |
| Urdu (ur-IN) | ✅ | ✅ | ❌ | ❌ |
*Telugu uses STANDARD model only for Google (WAVENET not available; falls back automatically)
6. Example Response¶
On success, you'll get a JSON response with audio URLs:
{
"task_id": "your_task_id",
"status": "COMPLETED",
"result": {
"user_email": "your_email@example.com",
"project_name": "your_project_name",
"task_name": "my_first_tts_task_001",
"provider": "AZURE",
"models": ["NEURAL"],
"voice_gender": "FEMALE",
"datapoints": {
"datapoint 1": {
"text": "नमस्ते, यह एक परीक्षण है।",
"speed": "1.0x",
"lang": "hi-IN",
"saas_url": "https://<storage_account_name>.blob.core.windows.net/audio-output1/..."
}
}
}
}
The outer task_id and status fields are always present. status will be "COMPLETED" for a successful request, or the request will return HTTP 500 for failures.
Field Descriptions:
| Field | Description |
|---|---|
task_id |
Unique task identifier |
status |
Task status (COMPLETED) |
saas_url |
Temporary download link for the audio file |
Warning: Download your audio files within 24 hours. Links expire after that.
All-or-Nothing Processing
If any datapoint fails to synthesize (e.g., unsupported language for the chosen provider), the entire task fails with HTTP 500. There is no partial-success behavior — all datapoints must succeed for the request to return successfully.
7. Response & Error Codes¶
| Code | Description |
|---|---|
| 200 | Request successful |
| 400 | Invalid or missing parameters (invalid provider, model, speed format, missing required fields) |
| 500 | Internal server error (TTS synthesis failure, provider API error) |