Skip to content

Transcription Service Guide

This guide explains how to use the Language Server Transcription API for both batch (asynchronous) and single (synchronous) audio transcription tasks. It covers endpoints, request formats, and all supported payload parameters.


Request Headers

Ensure your requests include the following headers:

Accept: application/json
X-API-Key: YOUR_API_KEY   # (Replace YOUR_API_KEY with your actual API key)
Content-Type: multipart/form-data

Request Body: Form Data Fields

The request body should be sent as multipart/form-data and must contain the following fields:

payload_data (JSON string)

This field contains essential information about your transcription task.

Example payload_data:

{
  "user_email": "your_email@example.com",
  "project_name": "Individual",
  "dataset_ids": ["id1", "id2"],
  "language_identification": false,
  "locale": "hi-IN",
  "provider": "AZURE",
  "models": ["long"],
  "wordLevelTimeStamp": false,
  "diarization": false
}

Mandatory Fields

Field Description
user_email Your user email address
project_name Name of your project
dataset_ids List of dataset ids (only for batch transcription)

Optional Fields

Field Type Description Default
models array List of models specifying which model to use long for Google, base for Azure
locale string Language in which you want the transcription en-US
provider string AZURE or GOOGLE AZURE
language_identification boolean Automatic language detection of the audio false
wordLevelTimeStamp boolean Include timing information for each word false
diarization boolean Enable speaker identification false

If Single Transcription

  • files (File upload): This field is used to upload one or more audio files for transcription
  • File Format: All files must be audio files (.mp3, .wav, .flac, etc.)

Note

  • For single transcription, users can directly upload one or more audio files with their request, as long as the total size of all files is less than 25 MB. For batch transcription, users must first upload their audio files as a dataset using the provided script, and then include the corresponding dataset_id(s) in the payload when making the API request.
  • For Sarvam ASR, users can upload 1–20 audio files per request. Uploading more than 20 files may cause the task to fail.

1. Batch (Async) Transcription

Info

Batch transcription is used for processing multiple audio files or large files asynchronously. The API will return a job ID, and you can poll for results.

Endpoint Details

  • Endpoint: https://language-server-url/v1/batch/transcription
  • Method: POST

Example cURL Command

Here's an example of how to make a request using curl:

curl -X 'POST' 'https://language-server-url/v1/batch/transcription' -H 'accept: application/json' -H 'X-API-Key: YOUR_API_KEY' -H 'Content-Type: multipart/form-data' -F 'payload_data={"user_email":"your_email@example.com","project_name":"Individual", "dataset_ids": ["d1", "d2"], "models":["long"],"locale":"hi-IN","provider":"GOOGLE"}'

Note

Replace YOUR_API_KEY with your actual API key and your_email@example.com with your email.

Example Response Body

Upon successful submission, the API will return a JSON object providing details about the submitted task:

{
  "task_id": "c64baaa1c99d42f589ee184369aa6843",
  "dataset_ids": ["4dd4e846bc014a988cbc49bccd217e96"],
  "status": "PENDING"
}

Response Fields

Field Description
task_id A unique identifier assigned to your submitted task
dataset_ids IDs of dataset sent by the user
status The current status of the task (e.g., "PENDING", "COMPLETED", "FAILED")

2. Single (Sync) Transcription

Info

Single transcription is used for processing audio files synchronously. The API will return the transcription result directly.

Endpoint Details

  • Endpoint: https://language-server-url/v1/transcription
  • Method: POST

Example cURL Command

Here's an example of how to make a request using curl:

curl -X 'POST' 'https://language-server-url/v1/transcription' -H 'accept: application/json' -H 'X-API-Key: YOUR_API_KEY' -H 'Content-Type: multipart/form-data' -F 'payload_data={"user_email":"your_email@example.com","project_name":"Individual","models":["long"],"locale":"hi-IN","provider":"GOOGLE"}' -F 'files=@output.wav;type=audio/wav'

Note

Replace YOUR_API_KEY with your actual API key and your_email@example.com with your email.

Example Response Body

Upon successful submission, the API will return a JSON object providing result of the submitted task:

{
  "result": {
    "transcription": "...."
  },
  "skipped_files": ["file1"],
  "status": "COMPLETED"
}

Response Fields

Field Description
result Final output of transcription task (a JSON file containing transcription of all the audios uploaded by user)
skipped_files A list of skipped files (skipped due to any reason while preprocessing like 0 sec audio etc)
status The status of the task ("COMPLETED", "FAILED")

Response & Error Codes

Code Description
200 Request Successful
404 Issue from the user side, any parameter is needed but not found
500 Issue from server side, server unavailable

Models and Locales

Provider Models

Provider Default Model Available Models Supported Locales
Azure base base "en-US", "gu-IN", "bn-IN", "hi-IN", "kn-IN", "ml-IN", "mr-IN", "pa-IN", "ta-IN", "te-IN"
Google long long, short, telephony, telephony_short "hi-IN", "en_IN", "mr-IN"...
Sarvam saarika:v2.5 saarika:v2.5 "en-IN", "gu-IN", "bn-IN", "hi-IN", "kn-IN", "ml-IN", "mr-IN", "pa-IN", "ta-IN", "te-IN", "od-IN"
AWS base base "en-IN", "gu-IN", "bn-IN", "hi-IN", "kn-IN", "ml-IN", "mr-IN", "pa-IN", "ta-IN", "te-IN", "od-IN"

Google Model and Locale Combinations

Provider Model Supported Locales
Google long "hi-IN", "en-IN", "en-US", "mr-IN", "te-IN", "ta-IN", "kn-IN", "ml-IN", "bn-BD"
Google short "hi-IN", "en-IN", "en-US", "mr-IN", "te-IN", "ta-IN", "kn-IN", "ml-IN", "bn-BD"
Google telephony "hi-IN", "en_IN", "en-US"
Google telephony_short "hi-IN", "en_IN", "en-US"

Model Selection Guide

Google Models

Model Use Case Description
long Long-form content Use for any type of long form content, such as media or spontaneous speech and conversations. Consider using this model especially if they aren't available in your target language
short Short utterances Use for short utterances that are a few seconds in length. It's useful for trying to capture commands or other single-short directed speech use cases
telephony Phone calls Use for audio that originated from an audio phone call, typically recorded at an 8 kHz sampling rate. Ideal for customer service, teleconferencing, and automated kiosk applications
telephony_short Short phone calls Dedicated version of the telephony model for short or even single-word utterances for audio that originated from a phone call, typically recorded at an 8 kHz sampling rate. Useful for utterances only a few seconds long in customer service, teleconferencing, and automated kiosk applications