Transcription Service Guide¶
This guide explains how to use the Language Server Transcription API for both batch (asynchronous) and single (synchronous) audio transcription tasks. It covers endpoints, request formats, and all supported payload parameters.
Request Headers¶
Ensure your requests include the following headers:
Accept: application/json
X-API-Key: YOUR_API_KEY # (Replace YOUR_API_KEY with your actual API key)
Content-Type: multipart/form-data
Request Format
- Batch transcription (
/v1/task/batch/transcription): Send asContent-Type: application/jsonwith a JSON body. - Single transcription (
/v1/task/transcription): Send asContent-Type: multipart/form-datawithpayload_data(JSON string) andfiles(audio files) form fields.
Request Body: Form Data Fields¶
The request body should be sent as multipart/form-data and must contain the following fields:
payload_data (JSON string)¶
This field contains essential information about your transcription task.
Example payload_data:
{
"user_email": "your_email@example.com",
"project_name": "Individual",
"dataset_ids": ["id1", "id2"],
"language_identification": false,
"locale": "hi-IN",
"provider": "AZURE",
"models": ["long"],
"wordlevel_timestamp": false,
"diarization": false
}
Mandatory Fields¶
| Field | Description |
|---|---|
user_email |
Your user email address |
project_name |
Name of your project |
dataset_ids |
List of dataset ids (only for batch transcription) |
Optional Fields¶
| Field | Type | Description | Default |
|---|---|---|---|
models |
array | List of models specifying which model to use | Per provider: Google long, Azure base, Sarvam saarika:v2.5, AWS base, KARYA_LOCAL indic-conformer |
locale |
string | Language in which you want the transcription (BCP-47, e.g. hi-IN) |
If omitted, set automatically by provider: Azure en-US, Google/Sarvam/AWS en-IN, KARYA_LOCAL hi-IN |
provider |
string | AZURE, GOOGLE, SARVAM, AWS, or KARYA_LOCAL (Indic-Conformer) |
AZURE |
language_identification |
boolean | Automatic language detection. Supported for Azure only. Ignored by Google, Sarvam, and AWS. For KARYA_LOCAL, use false and set locale yourself. If you send language_identification: true with KARYA_LOCAL, the API returns HTTP 400 with an error message explaining that automatic detection is not supported for this provider yet. |
false |
wordlevel_timestamp |
boolean | Include timing information for each word. Ignored for KARYA_LOCAL (Indic-Conformer). |
false |
diarization |
boolean | Enable speaker identification. Ignored for KARYA_LOCAL (Indic-Conformer). |
false |
If Single Transcription¶
files(File upload): This field is used to upload one or more audio files for transcription- File Format: All files must be audio files. Supported extensions:
.wav,.mp3,.flac,.webm,.m4a,.aac,.speex. The server validates by MIME type, so any format recognized asaudio/*will also be accepted.
Note
- For single transcription, users can directly upload one or more audio files with their request, as long as the total size of all files is less than 25 MB. For batch transcription, users must first upload their audio files as a dataset using the provided script, and then include the corresponding
dataset_id(s)in the payload when making the API request. - For Sarvam ASR, users can upload 1–20 audio files per request. Uploading more than 20 files may cause the task to fail.
- For Indic-Conformer (
provider: "KARYA_LOCAL"), you can send up to 20 files per request. When audio is referenced by remote URIs (as with batch datasets backed by Google Cloud Storage or Azure Blob), the service reads it from that cloud storage, so files must already live there. For single transcription, you can upload files directly with the request; the server accepts them and temporarily stages them for Indic-Conformer (see Indic-Conformer below).
1. Batch (Async) Transcription¶
Info
Batch transcription is used for processing multiple audio files or large files asynchronously. The API will return a job ID, and you can poll for results.
Endpoint Details¶
- Endpoint:
https://languageserver.karya.in/v1/task/batch/transcription - Method:
POST
Example cURL Command¶
Here's an example of how to make a request using curl:
curl -X 'POST' 'https://languageserver.karya.in/v1/task/batch/transcription' \
-H 'accept: application/json' \
-H 'X-API-Key: YOUR_API_KEY' \
-H 'Content-Type: application/json' \
-d '{"user_email":"your_email@example.com","project_name":"Individual","dataset_ids":["d1","d2"],"models":["long"],"locale":"hi-IN","provider":"GOOGLE"}'
Note
Replace YOUR_API_KEY with your actual API key and your_email@example.com with your email.
Example Response Body¶
Upon successful submission, the API will return a JSON object providing details about the submitted task:
{
"task_id": "c64baaa1c99d42f589ee184369aa6843",
"dataset_ids": ["4dd4e846bc014a988cbc49bccd217e96"],
"status": "PENDING"
}
Response Fields¶
| Field | Description |
|---|---|
task_id |
A unique identifier assigned to your submitted task |
dataset_ids |
IDs of dataset sent by the user |
status |
The current status of the task (e.g., "PENDING", "COMPLETED", "FAILED") |
Note
Once submitted, poll GET /v1/status/{task_id} until the status is COMPLETED or FAILED. The completed response will contain the transcription results in the responses field.
2. Single (Sync) Transcription¶
Info
Single transcription is used for processing audio files synchronously. The API will return the transcription result directly.
Endpoint Details¶
- Endpoint:
https://languageserver.karya.in/v1/task/transcription - Method:
POST
Example cURL Command¶
Here's an example of how to make a request using curl:
curl -X 'POST' 'https://languageserver.karya.in/v1/task/transcription' \
-H 'accept: application/json' \
-H 'X-API-Key: YOUR_API_KEY' \
-H 'Content-Type: multipart/form-data' \
-F 'payload_data={"user_email":"your_email@example.com","project_name":"Individual","models":["long"],"locale":"hi-IN","provider":"GOOGLE"}' \
-F 'files=@output.wav;type=audio/wav'
Note
Replace YOUR_API_KEY with your actual API key and your_email@example.com with your email.
Example Response Body¶
Upon successful submission, the API will return a JSON object providing result of the submitted task:
{
"task_id": "c64baaa1c99d42f589ee184369aa6843",
"status": "COMPLETED",
"result": {
"transcriptions": [
{
"file": "audio_file_name.wav",
"transcription": "The transcribed text here..."
}
]
}
}
Response Fields¶
| Field | Description |
|---|---|
task_id |
Unique identifier for the task |
status |
Task status (COMPLETED or FAILED) |
result |
Task output — structure varies by provider (see below) |
result.transcriptions |
Array of transcription objects, one per audio file |
Note
The result structure varies by provider. Google returns results keyed by model name: {"long": {"transcriptions": [...]}}. Azure, Sarvam, AWS, and KARYA_LOCAL (Indic-Conformer) return {"transcriptions": [...]} directly, with one entry per file (uri plus transcript text under transcript).
Response & Error Codes¶
| Code | Description |
|---|---|
| 200 | Request successful |
| 400 | Bad request — invalid payload, unsupported locale or model, no valid files, too many files (Sarvam and KARYA_LOCAL: max 20 per request), language_identification: true with KARYA_LOCAL, or file storage URLs the service cannot use for Indic-Conformer |
| 403 | Permission denied (e.g., GCS credentials issue for Google) |
| 408 | Request timeout — provider took too long to respond (Sarvam, AWS) |
| 422 | Validation error — missing required fields |
| 429 | Rate limit exceeded (Google) |
| 500 | Internal server error |
| 502 | Provider API error (Azure, Google) |
| 504 | Transcription timed out waiting for provider result (Azure) |
Models and Locales¶
Provider Models¶
| Provider | Default Model | Available Models | Supported Locales |
|---|---|---|---|
| Azure | base |
base | "en-US", "gu-IN", "bn-IN", "hi-IN", "kn-IN", "ml-IN", "mr-IN", "pa-IN", "ta-IN", "te-IN" |
long |
long, short, telephony, telephony_short | "hi-IN", "en-IN", "mr-IN"... | |
| Sarvam | saarika:v2.5 |
saarika:v2.5 |
"en-IN", "gu-IN", "bn-IN", "hi-IN", "kn-IN", "ml-IN", "mr-IN", "pa-IN", "ta-IN", "te-IN", "od-IN" |
| AWS | base |
base | "en-IN", "gu-IN", "bn-IN", "hi-IN", "kn-IN", "ml-IN", "mr-IN", "pa-IN", "ta-IN", "te-IN" |
| KARYA_LOCAL (Indic-Conformer) | indic-conformer |
indic-conformer only |
See supported locales below |
Google Model and Locale Combinations¶
| Provider | Model | Supported Locales |
|---|---|---|
long |
"hi-IN", "en-IN", "en-US", "mr-IN", "te-IN", "ta-IN", "kn-IN", "ml-IN", "bn-BD" | |
short |
"hi-IN", "en-IN", "en-US", "mr-IN", "te-IN", "ta-IN", "kn-IN", "ml-IN", "bn-BD" | |
telephony |
"hi-IN", "en-IN", "en-US" | |
telephony_short |
"hi-IN", "en-IN", "en-US" |
Model Selection Guide¶
Google Models¶
| Model | Use Case | Description |
|---|---|---|
long |
Long-form content | Use for any type of long form content, such as media or spontaneous speech and conversations. Consider using this model especially if they aren't available in your target language |
short |
Short utterances | Use for short utterances that are a few seconds in length. It's useful for trying to capture commands or other single-short directed speech use cases |
telephony |
Phone calls | Use for audio that originated from an audio phone call, typically recorded at an 8 kHz sampling rate. Ideal for customer service, teleconferencing, and automated kiosk applications |
telephony_short |
Short phone calls | Dedicated version of the telephony model for short or even single-word utterances for audio that originated from a phone call, typically recorded at an 8 kHz sampling rate. Useful for utterances only a few seconds long in customer service, teleconferencing, and automated kiosk applications |
Indic-Conformer (KARYA_LOCAL)¶
Indic-Conformer is an option for transcribing Indic and related languages when your organization routes traffic to that engine. In API requests it appears as provider KARYA_LOCAL with model indic-conformer.
Availability
This provider is only active on deployments where it has been turned on. If you use KARYA_LOCAL and see errors about the service being unavailable or misconfigured, contact your platform or DevOps team—setup is handled outside this API.
Indic-Conformer — what callers need to know
- Set
providertoKARYA_LOCALand choose a supportedlocale(for examplehi-IN). If you omitlocale, the server uses the default for this provider:hi-IN. - Automatic language detection is not supported: keep
language_identificationfalseand setlocaleyourself. If you sendlanguage_identification: true, you get HTTP 400 with a clear error message. - Up to 20 audio files per request (single or batch).
wordlevel_timestampanddiarizationare ignored for this provider (no word timings or speaker separation from Indic-Conformer through this API).
Choosing Indic-Conformer¶
Set provider to KARYA_LOCAL when you want this engine (for example for supported Indian languages). You can omit models; the default for this provider is indic-conformer.
Set locale to the spoken language of the audio. Use one of the supported locale codes below (for example hi-IN for Hindi). If you omit locale, the default is hi-IN.
Keep language_identification set to false. You must pick the language yourself via locale. Sending language_identification: true results in HTTP 400 with a message that automatic language detection is not available for KARYA_LOCAL yet.
Word-level timestamps and speaker diarization are not supported for Indic-Conformer—the wordlevel_timestamp and diarization fields are ignored for this provider.
Files and storage¶
- Single or batch: at most 20 audio files per request (same idea as Sarvam’s limit).
- Single transcription (direct upload): send audio with
multipart/form-dataas for other providers. The server accepts the upload and temporarily stages it in cloud storage before calling Indic-Conformer—you do not need to pre-store files in your own bucket when uploading this way. - Batch (remote URIs via datasets): upload files to a dataset as usual. The service reads them from Google Cloud Storage (
gs://links or standard GCS HTTPS URLs) or from Azure Blob using a SAS URL—audio must already be in storage the backend can access. Other random HTTP links are not supported for this provider.
Supported locales¶
Use these values in the locale field:
as-IN, bn-IN, br-IN, doi-IN, en-IN, gu-IN, hi-IN, kn-IN, kok-IN, ks-IN, mai-IN, ml-IN, mni-IN, mr-IN, ne-IN, or-IN, pa-IN, sa-IN, sat-IN, sd-IN, ta-IN, te-IN, ur-IN
Example payload (single transcription)¶
{
"user_email": "your_email@example.com",
"project_name": "Individual",
"provider": "KARYA_LOCAL",
"locale": "hi-IN",
"language_identification": false
}
For batch transcription, use the same fields in your JSON body plus dataset_ids, as for other providers.