Skip to content

Transcription Service Guide

This guide explains how to use the Language Server Transcription API for both batch (asynchronous) and single (synchronous) audio transcription tasks. It covers endpoints, request formats, and all supported payload parameters.


Request Headers

Ensure your requests include the following headers:

Accept: application/json
X-API-Key: YOUR_API_KEY   # (Replace YOUR_API_KEY with your actual API key)
Content-Type: multipart/form-data

Request Format

  • Batch transcription (/v1/task/batch/transcription): Send as Content-Type: application/json with a JSON body.
  • Single transcription (/v1/task/transcription): Send as Content-Type: multipart/form-data with payload_data (JSON string) and files (audio files) form fields.

Request Body: Form Data Fields

The request body should be sent as multipart/form-data and must contain the following fields:

payload_data (JSON string)

This field contains essential information about your transcription task.

Example payload_data:

{
  "user_email": "your_email@example.com",
  "project_name": "Individual",
  "dataset_ids": ["id1", "id2"],
  "language_identification": false,
  "locale": "hi-IN",
  "provider": "AZURE",
  "models": ["long"],
  "wordlevel_timestamp": false,
  "diarization": false
}

Mandatory Fields

Field Description
user_email Your user email address
project_name Name of your project
dataset_ids List of dataset ids (only for batch transcription)

Optional Fields

Field Type Description Default
models array List of models specifying which model to use Per provider: Google long, Azure base, Sarvam saarika:v2.5, AWS base, KARYA_LOCAL indic-conformer
locale string Language in which you want the transcription (BCP-47, e.g. hi-IN) If omitted, set automatically by provider: Azure en-US, Google/Sarvam/AWS en-IN, KARYA_LOCAL hi-IN
provider string AZURE, GOOGLE, SARVAM, AWS, or KARYA_LOCAL (Indic-Conformer) AZURE
language_identification boolean Automatic language detection. Supported for Azure only. Ignored by Google, Sarvam, and AWS. For KARYA_LOCAL, use false and set locale yourself. If you send language_identification: true with KARYA_LOCAL, the API returns HTTP 400 with an error message explaining that automatic detection is not supported for this provider yet. false
wordlevel_timestamp boolean Include timing information for each word. Ignored for KARYA_LOCAL (Indic-Conformer). false
diarization boolean Enable speaker identification. Ignored for KARYA_LOCAL (Indic-Conformer). false

If Single Transcription

  • files (File upload): This field is used to upload one or more audio files for transcription
  • File Format: All files must be audio files. Supported extensions: .wav, .mp3, .flac, .webm, .m4a, .aac, .speex. The server validates by MIME type, so any format recognized as audio/* will also be accepted.

Note

  • For single transcription, users can directly upload one or more audio files with their request, as long as the total size of all files is less than 25 MB. For batch transcription, users must first upload their audio files as a dataset using the provided script, and then include the corresponding dataset_id(s) in the payload when making the API request.
  • For Sarvam ASR, users can upload 1–20 audio files per request. Uploading more than 20 files may cause the task to fail.
  • For Indic-Conformer (provider: "KARYA_LOCAL"), you can send up to 20 files per request. When audio is referenced by remote URIs (as with batch datasets backed by Google Cloud Storage or Azure Blob), the service reads it from that cloud storage, so files must already live there. For single transcription, you can upload files directly with the request; the server accepts them and temporarily stages them for Indic-Conformer (see Indic-Conformer below).

1. Batch (Async) Transcription

Info

Batch transcription is used for processing multiple audio files or large files asynchronously. The API will return a job ID, and you can poll for results.

Endpoint Details

  • Endpoint: https://languageserver.karya.in/v1/task/batch/transcription
  • Method: POST

Example cURL Command

Here's an example of how to make a request using curl:

curl -X 'POST' 'https://languageserver.karya.in/v1/task/batch/transcription' \
  -H 'accept: application/json' \
  -H 'X-API-Key: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{"user_email":"your_email@example.com","project_name":"Individual","dataset_ids":["d1","d2"],"models":["long"],"locale":"hi-IN","provider":"GOOGLE"}'

Note

Replace YOUR_API_KEY with your actual API key and your_email@example.com with your email.

Example Response Body

Upon successful submission, the API will return a JSON object providing details about the submitted task:

{
  "task_id": "c64baaa1c99d42f589ee184369aa6843",
  "dataset_ids": ["4dd4e846bc014a988cbc49bccd217e96"],
  "status": "PENDING"
}

Response Fields

Field Description
task_id A unique identifier assigned to your submitted task
dataset_ids IDs of dataset sent by the user
status The current status of the task (e.g., "PENDING", "COMPLETED", "FAILED")

Note

Once submitted, poll GET /v1/status/{task_id} until the status is COMPLETED or FAILED. The completed response will contain the transcription results in the responses field.


2. Single (Sync) Transcription

Info

Single transcription is used for processing audio files synchronously. The API will return the transcription result directly.

Endpoint Details

  • Endpoint: https://languageserver.karya.in/v1/task/transcription
  • Method: POST

Example cURL Command

Here's an example of how to make a request using curl:

curl -X 'POST' 'https://languageserver.karya.in/v1/task/transcription' \
  -H 'accept: application/json' \
  -H 'X-API-Key: YOUR_API_KEY' \
  -H 'Content-Type: multipart/form-data' \
  -F 'payload_data={"user_email":"your_email@example.com","project_name":"Individual","models":["long"],"locale":"hi-IN","provider":"GOOGLE"}' \
  -F 'files=@output.wav;type=audio/wav'

Note

Replace YOUR_API_KEY with your actual API key and your_email@example.com with your email.

Example Response Body

Upon successful submission, the API will return a JSON object providing result of the submitted task:

{
  "task_id": "c64baaa1c99d42f589ee184369aa6843",
  "status": "COMPLETED",
  "result": {
    "transcriptions": [
      {
        "file": "audio_file_name.wav",
        "transcription": "The transcribed text here..."
      }
    ]
  }
}

Response Fields

Field Description
task_id Unique identifier for the task
status Task status (COMPLETED or FAILED)
result Task output — structure varies by provider (see below)
result.transcriptions Array of transcription objects, one per audio file

Note

The result structure varies by provider. Google returns results keyed by model name: {"long": {"transcriptions": [...]}}. Azure, Sarvam, AWS, and KARYA_LOCAL (Indic-Conformer) return {"transcriptions": [...]} directly, with one entry per file (uri plus transcript text under transcript).


Response & Error Codes

Code Description
200 Request successful
400 Bad request — invalid payload, unsupported locale or model, no valid files, too many files (Sarvam and KARYA_LOCAL: max 20 per request), language_identification: true with KARYA_LOCAL, or file storage URLs the service cannot use for Indic-Conformer
403 Permission denied (e.g., GCS credentials issue for Google)
408 Request timeout — provider took too long to respond (Sarvam, AWS)
422 Validation error — missing required fields
429 Rate limit exceeded (Google)
500 Internal server error
502 Provider API error (Azure, Google)
504 Transcription timed out waiting for provider result (Azure)

Models and Locales

Provider Models

Provider Default Model Available Models Supported Locales
Azure base base "en-US", "gu-IN", "bn-IN", "hi-IN", "kn-IN", "ml-IN", "mr-IN", "pa-IN", "ta-IN", "te-IN"
Google long long, short, telephony, telephony_short "hi-IN", "en-IN", "mr-IN"...
Sarvam saarika:v2.5 saarika:v2.5 "en-IN", "gu-IN", "bn-IN", "hi-IN", "kn-IN", "ml-IN", "mr-IN", "pa-IN", "ta-IN", "te-IN", "od-IN"
AWS base base "en-IN", "gu-IN", "bn-IN", "hi-IN", "kn-IN", "ml-IN", "mr-IN", "pa-IN", "ta-IN", "te-IN"
KARYA_LOCAL (Indic-Conformer) indic-conformer indic-conformer only See supported locales below

Google Model and Locale Combinations

Provider Model Supported Locales
Google long "hi-IN", "en-IN", "en-US", "mr-IN", "te-IN", "ta-IN", "kn-IN", "ml-IN", "bn-BD"
Google short "hi-IN", "en-IN", "en-US", "mr-IN", "te-IN", "ta-IN", "kn-IN", "ml-IN", "bn-BD"
Google telephony "hi-IN", "en-IN", "en-US"
Google telephony_short "hi-IN", "en-IN", "en-US"

Model Selection Guide

Google Models

Model Use Case Description
long Long-form content Use for any type of long form content, such as media or spontaneous speech and conversations. Consider using this model especially if they aren't available in your target language
short Short utterances Use for short utterances that are a few seconds in length. It's useful for trying to capture commands or other single-short directed speech use cases
telephony Phone calls Use for audio that originated from an audio phone call, typically recorded at an 8 kHz sampling rate. Ideal for customer service, teleconferencing, and automated kiosk applications
telephony_short Short phone calls Dedicated version of the telephony model for short or even single-word utterances for audio that originated from a phone call, typically recorded at an 8 kHz sampling rate. Useful for utterances only a few seconds long in customer service, teleconferencing, and automated kiosk applications

Indic-Conformer (KARYA_LOCAL)

Indic-Conformer is an option for transcribing Indic and related languages when your organization routes traffic to that engine. In API requests it appears as provider KARYA_LOCAL with model indic-conformer.

Availability

This provider is only active on deployments where it has been turned on. If you use KARYA_LOCAL and see errors about the service being unavailable or misconfigured, contact your platform or DevOps team—setup is handled outside this API.

Indic-Conformer — what callers need to know

  1. Set provider to KARYA_LOCAL and choose a supported locale (for example hi-IN). If you omit locale, the server uses the default for this provider: hi-IN.
  2. Automatic language detection is not supported: keep language_identification false and set locale yourself. If you send language_identification: true, you get HTTP 400 with a clear error message.
  3. Up to 20 audio files per request (single or batch).
  4. wordlevel_timestamp and diarization are ignored for this provider (no word timings or speaker separation from Indic-Conformer through this API).

Choosing Indic-Conformer

Set provider to KARYA_LOCAL when you want this engine (for example for supported Indian languages). You can omit models; the default for this provider is indic-conformer.

Set locale to the spoken language of the audio. Use one of the supported locale codes below (for example hi-IN for Hindi). If you omit locale, the default is hi-IN.

Keep language_identification set to false. You must pick the language yourself via locale. Sending language_identification: true results in HTTP 400 with a message that automatic language detection is not available for KARYA_LOCAL yet.

Word-level timestamps and speaker diarization are not supported for Indic-Conformer—the wordlevel_timestamp and diarization fields are ignored for this provider.

Files and storage

  • Single or batch: at most 20 audio files per request (same idea as Sarvam’s limit).
  • Single transcription (direct upload): send audio with multipart/form-data as for other providers. The server accepts the upload and temporarily stages it in cloud storage before calling Indic-Conformer—you do not need to pre-store files in your own bucket when uploading this way.
  • Batch (remote URIs via datasets): upload files to a dataset as usual. The service reads them from Google Cloud Storage (gs:// links or standard GCS HTTPS URLs) or from Azure Blob using a SAS URL—audio must already be in storage the backend can access. Other random HTTP links are not supported for this provider.

Supported locales

Use these values in the locale field:

as-IN, bn-IN, br-IN, doi-IN, en-IN, gu-IN, hi-IN, kn-IN, kok-IN, ks-IN, mai-IN, ml-IN, mni-IN, mr-IN, ne-IN, or-IN, pa-IN, sa-IN, sat-IN, sd-IN, ta-IN, te-IN, ur-IN

Example payload (single transcription)

{
  "user_email": "your_email@example.com",
  "project_name": "Individual",
  "provider": "KARYA_LOCAL",
  "locale": "hi-IN",
  "language_identification": false
}

For batch transcription, use the same fields in your JSON body plus dataset_ids, as for other providers.