Transcription Service Guide¶

This guide explains how to use the Language Server Transcription API for both batch (asynchronous) and single (synchronous) audio transcription tasks. It covers endpoints, request formats, and all supported payload parameters.

Request Headers¶

Ensure your requests include the following headers:

Accept: application/json
X-API-Key: YOUR_API_KEY   # (Replace YOUR_API_KEY with your actual API key)
Content-Type: multipart/form-data

Request Body: Form Data Fields¶

The request body should be sent as multipart/form-data and must contain the following fields:

payload_data (JSON string)¶

This field contains essential information about your transcription task.

Example payload_data:

{
  "user_email": "your_email@example.com",
  "project_name": "Individual",
  "dataset_ids": ["id1", "id2"],
  "language_identification": false,
  "locale": "hi-IN",
  "provider": "AZURE",
  "models": ["long"],
  "wordLevelTimeStamp": false,
  "diarization": false
}

Mandatory Fields¶

Field	Description
`user_email`	Your user email address
`project_name`	Name of your project
`dataset_ids`	List of dataset ids (only for batch transcription)

Optional Fields¶

Field	Type	Description	Default
`models`	array	List of models specifying which model to use	`long` for Google, `base` for Azure
`locale`	string	Language in which you want the transcription	`en-US`
`provider`	string	`AZURE` or `GOOGLE`	`AZURE`
`language_identification`	boolean	Automatic language detection of the audio	`false`
`wordLevelTimeStamp`	boolean	Include timing information for each word	`false`
`diarization`	boolean	Enable speaker identification	`false`

If Single Transcription¶

files (File upload): This field is used to upload one or more audio files for transcription
File Format: All files must be audio files (.mp3, .wav, .flac, etc.)

Note

For single transcription, users can directly upload one or more audio files with their request, as long as the total size of all files is less than 25 MB. For batch transcription, users must first upload their audio files as a dataset using the provided script, and then include the corresponding dataset_id(s) in the payload when making the API request.
For Sarvam ASR, users can upload 1–20 audio files per request. Uploading more than 20 files may cause the task to fail.

1. Batch (Async) Transcription¶

Info

Batch transcription is used for processing multiple audio files or large files asynchronously. The API will return a job ID, and you can poll for results.

Endpoint Details¶

Endpoint: https://language-server-url/v1/batch/transcription
Method: POST

Example cURL Command¶

Here's an example of how to make a request using curl:

curl -X 'POST' 'https://language-server-url/v1/batch/transcription' -H 'accept: application/json' -H 'X-API-Key: YOUR_API_KEY' -H 'Content-Type: multipart/form-data' -F 'payload_data={"user_email":"your_email@example.com","project_name":"Individual", "dataset_ids": ["d1", "d2"], "models":["long"],"locale":"hi-IN","provider":"GOOGLE"}'

Note

Replace YOUR_API_KEY with your actual API key and your_email@example.com with your email.

Example Response Body¶

Upon successful submission, the API will return a JSON object providing details about the submitted task:

{
  "task_id": "c64baaa1c99d42f589ee184369aa6843",
  "dataset_ids": ["4dd4e846bc014a988cbc49bccd217e96"],
  "status": "PENDING"
}

Response Fields¶

Field	Description
`task_id`	A unique identifier assigned to your submitted task
`dataset_ids`	IDs of dataset sent by the user
`status`	The current status of the task (e.g., "PENDING", "COMPLETED", "FAILED")

2. Single (Sync) Transcription¶

Info

Single transcription is used for processing audio files synchronously. The API will return the transcription result directly.

Endpoint Details¶

Endpoint: https://language-server-url/v1/transcription
Method: POST

Example cURL Command¶

Here's an example of how to make a request using curl:

curl -X 'POST' 'https://language-server-url/v1/transcription' -H 'accept: application/json' -H 'X-API-Key: YOUR_API_KEY' -H 'Content-Type: multipart/form-data' -F 'payload_data={"user_email":"your_email@example.com","project_name":"Individual","models":["long"],"locale":"hi-IN","provider":"GOOGLE"}' -F 'files=@output.wav;type=audio/wav'

Note

Replace YOUR_API_KEY with your actual API key and your_email@example.com with your email.

Example Response Body¶

Upon successful submission, the API will return a JSON object providing result of the submitted task:

{
  "result": {
    "transcription": "...."
  },
  "skipped_files": ["file1"],
  "status": "COMPLETED"
}

Response Fields¶

Field	Description
`result`	Final output of transcription task (a JSON file containing transcription of all the audios uploaded by user)
`skipped_files`	A list of skipped files (skipped due to any reason while preprocessing like 0 sec audio etc)
`status`	The status of the task ("COMPLETED", "FAILED")

Response & Error Codes¶

Code	Description
200	Request Successful
404	Issue from the user side, any parameter is needed but not found
500	Issue from server side, server unavailable

Models and Locales¶

Provider Models¶

Provider	Default Model	Available Models	Supported Locales
Azure	`base`	base	"en-US", "gu-IN", "bn-IN", "hi-IN", "kn-IN", "ml-IN", "mr-IN", "pa-IN", "ta-IN", "te-IN"
Google	`long`	long, short, telephony, telephony_short	"hi-IN", "en_IN", "mr-IN"...
Sarvam	`saarika:v2.5`	`saarika:v2.5`	"en-IN", "gu-IN", "bn-IN", "hi-IN", "kn-IN", "ml-IN", "mr-IN", "pa-IN", "ta-IN", "te-IN", "od-IN"
AWS	`base`	base	"en-IN", "gu-IN", "bn-IN", "hi-IN", "kn-IN", "ml-IN", "mr-IN", "pa-IN", "ta-IN", "te-IN", "od-IN"

Google Model and Locale Combinations¶

Provider	Model	Supported Locales
Google	`long`	"hi-IN", "en-IN", "en-US", "mr-IN", "te-IN", "ta-IN", "kn-IN", "ml-IN", "bn-BD"
Google	`short`	"hi-IN", "en-IN", "en-US", "mr-IN", "te-IN", "ta-IN", "kn-IN", "ml-IN", "bn-BD"
Google	`telephony`	"hi-IN", "en_IN", "en-US"
Google	`telephony_short`	"hi-IN", "en_IN", "en-US"

Model Selection Guide¶

Google Models¶

Model	Use Case	Description
`long`	Long-form content	Use for any type of long form content, such as media or spontaneous speech and conversations. Consider using this model especially if they aren't available in your target language
`short`	Short utterances	Use for short utterances that are a few seconds in length. It's useful for trying to capture commands or other single-short directed speech use cases
`telephony`	Phone calls	Use for audio that originated from an audio phone call, typically recorded at an 8 kHz sampling rate. Ideal for customer service, teleconferencing, and automated kiosk applications
`telephony_short`	Short phone calls	Dedicated version of the telephony model for short or even single-word utterances for audio that originated from a phone call, typically recorded at an 8 kHz sampling rate. Useful for utterances only a few seconds long in customer service, teleconferencing, and automated kiosk applications