Transcription Service Guide¶
This guide explains how to use the Language Server Transcription API for both batch (asynchronous) and single (synchronous) audio transcription tasks. It covers endpoints, request formats, and all supported payload parameters.
Request Headers¶
Ensure your requests include the following headers:
Accept: application/json
X-API-Key: YOUR_API_KEY # (Replace YOUR_API_KEY with your actual API key)
Content-Type: multipart/form-data
Request Body: Form Data Fields¶
The request body should be sent as multipart/form-data and must contain the following fields:
payload_data (JSON string)¶
This field contains essential information about your transcription task.
Example payload_data:
{
"user_email": "your_email@example.com",
"project_name": "Individual",
"dataset_ids": ["id1", "id2"],
"language_identification": false,
"locale": "hi-IN",
"provider": "AZURE",
"models": ["long"],
"wordLevelTimeStamp": false,
"diarization": false
}
Mandatory Fields¶
| Field | Description |
|---|---|
user_email |
Your user email address |
project_name |
Name of your project |
dataset_ids |
List of dataset ids (only for batch transcription) |
Optional Fields¶
| Field | Type | Description | Default |
|---|---|---|---|
models |
array | List of models specifying which model to use | long for Google, base for Azure |
locale |
string | Language in which you want the transcription | en-US |
provider |
string | AZURE or GOOGLE |
AZURE |
language_identification |
boolean | Automatic language detection of the audio | false |
wordLevelTimeStamp |
boolean | Include timing information for each word | false |
diarization |
boolean | Enable speaker identification | false |
If Single Transcription¶
files(File upload): This field is used to upload one or more audio files for transcription- File Format: All files must be audio files (
.mp3,.wav,.flac, etc.)
Note
- For single transcription, users can directly upload one or more audio files with their request, as long as the total size of all files is less than 25 MB. For batch transcription, users must first upload their audio files as a dataset using the provided script, and then include the corresponding
dataset_id(s)in the payload when making the API request. - For Sarvam ASR, users can upload 1–20 audio files per request. Uploading more than 20 files may cause the task to fail.
1. Batch (Async) Transcription¶
Info
Batch transcription is used for processing multiple audio files or large files asynchronously. The API will return a job ID, and you can poll for results.
Endpoint Details¶
- Endpoint:
https://language-server-url/v1/batch/transcription - Method:
POST
Example cURL Command¶
Here's an example of how to make a request using curl:
curl -X 'POST' 'https://language-server-url/v1/batch/transcription' -H 'accept: application/json' -H 'X-API-Key: YOUR_API_KEY' -H 'Content-Type: multipart/form-data' -F 'payload_data={"user_email":"your_email@example.com","project_name":"Individual", "dataset_ids": ["d1", "d2"], "models":["long"],"locale":"hi-IN","provider":"GOOGLE"}'
Note
Replace YOUR_API_KEY with your actual API key and your_email@example.com with your email.
Example Response Body¶
Upon successful submission, the API will return a JSON object providing details about the submitted task:
{
"task_id": "c64baaa1c99d42f589ee184369aa6843",
"dataset_ids": ["4dd4e846bc014a988cbc49bccd217e96"],
"status": "PENDING"
}
Response Fields¶
| Field | Description |
|---|---|
task_id |
A unique identifier assigned to your submitted task |
dataset_ids |
IDs of dataset sent by the user |
status |
The current status of the task (e.g., "PENDING", "COMPLETED", "FAILED") |
2. Single (Sync) Transcription¶
Info
Single transcription is used for processing audio files synchronously. The API will return the transcription result directly.
Endpoint Details¶
- Endpoint:
https://language-server-url/v1/transcription - Method:
POST
Example cURL Command¶
Here's an example of how to make a request using curl:
curl -X 'POST' 'https://language-server-url/v1/transcription' -H 'accept: application/json' -H 'X-API-Key: YOUR_API_KEY' -H 'Content-Type: multipart/form-data' -F 'payload_data={"user_email":"your_email@example.com","project_name":"Individual","models":["long"],"locale":"hi-IN","provider":"GOOGLE"}' -F 'files=@output.wav;type=audio/wav'
Note
Replace YOUR_API_KEY with your actual API key and your_email@example.com with your email.
Example Response Body¶
Upon successful submission, the API will return a JSON object providing result of the submitted task:
Response Fields¶
| Field | Description |
|---|---|
result |
Final output of transcription task (a JSON file containing transcription of all the audios uploaded by user) |
skipped_files |
A list of skipped files (skipped due to any reason while preprocessing like 0 sec audio etc) |
status |
The status of the task ("COMPLETED", "FAILED") |
Response & Error Codes¶
| Code | Description |
|---|---|
| 200 | Request Successful |
| 404 | Issue from the user side, any parameter is needed but not found |
| 500 | Issue from server side, server unavailable |
Models and Locales¶
Provider Models¶
| Provider | Default Model | Available Models | Supported Locales |
|---|---|---|---|
| Azure | base |
base | "en-US", "gu-IN", "bn-IN", "hi-IN", "kn-IN", "ml-IN", "mr-IN", "pa-IN", "ta-IN", "te-IN" |
long |
long, short, telephony, telephony_short | "hi-IN", "en_IN", "mr-IN"... | |
| Sarvam | saarika:v2.5 |
saarika:v2.5 |
"en-IN", "gu-IN", "bn-IN", "hi-IN", "kn-IN", "ml-IN", "mr-IN", "pa-IN", "ta-IN", "te-IN", "od-IN" |
| AWS | base |
base | "en-IN", "gu-IN", "bn-IN", "hi-IN", "kn-IN", "ml-IN", "mr-IN", "pa-IN", "ta-IN", "te-IN", "od-IN" |
Google Model and Locale Combinations¶
| Provider | Model | Supported Locales |
|---|---|---|
long |
"hi-IN", "en-IN", "en-US", "mr-IN", "te-IN", "ta-IN", "kn-IN", "ml-IN", "bn-BD" | |
short |
"hi-IN", "en-IN", "en-US", "mr-IN", "te-IN", "ta-IN", "kn-IN", "ml-IN", "bn-BD" | |
telephony |
"hi-IN", "en_IN", "en-US" | |
telephony_short |
"hi-IN", "en_IN", "en-US" |
Model Selection Guide¶
Google Models¶
| Model | Use Case | Description |
|---|---|---|
long |
Long-form content | Use for any type of long form content, such as media or spontaneous speech and conversations. Consider using this model especially if they aren't available in your target language |
short |
Short utterances | Use for short utterances that are a few seconds in length. It's useful for trying to capture commands or other single-short directed speech use cases |
telephony |
Phone calls | Use for audio that originated from an audio phone call, typically recorded at an 8 kHz sampling rate. Ideal for customer service, teleconferencing, and automated kiosk applications |
telephony_short |
Short phone calls | Dedicated version of the telephony model for short or even single-word utterances for audio that originated from a phone call, typically recorded at an 8 kHz sampling rate. Useful for utterances only a few seconds long in customer service, teleconferencing, and automated kiosk applications |