Transcription Service Guide¶

This guide explains how to use the Language Server Transcription API for both batch (asynchronous) and single (synchronous) audio transcription tasks. It covers endpoints, request formats, and all supported payload parameters.

Request Headers¶

Ensure your requests include the following headers:

Accept: application/json
X-API-Key: YOUR_API_KEY   # (Replace YOUR_API_KEY with your actual API key)
Content-Type: multipart/form-data

Request Format

Batch transcription (/v1/task/batch/transcription): Send as Content-Type: application/json with a JSON body.
Single transcription (/v1/task/transcription): Send as Content-Type: multipart/form-data with payload_data (JSON string) and files (audio files) form fields.

Request Body: Form Data Fields¶

The request body should be sent as multipart/form-data and must contain the following fields:

payload_data (JSON string)¶

This field contains essential information about your transcription task.

Example payload_data:

{
  "user_email": "your_email@example.com",
  "project_name": "Individual",
  "dataset_ids": ["id1", "id2"],
  "language_identification": false,
  "locale": "hi-IN",
  "provider": "AZURE",
  "models": ["long"],
  "wordlevel_timestamp": false,
  "diarization": false
}

Mandatory Fields¶

Field	Description
`user_email`	Your user email address
`project_name`	Name of your project
`dataset_ids`	List of dataset ids (only for batch transcription)

Optional Fields¶

Field	Type	Description	Default
`models`	array	List of models specifying which model to use	Per provider: Google `long`, Azure `base`, Sarvam `saarika:v2.5`, AWS `base`, KARYA_LOCAL `indic-conformer`
`locale`	string	Language in which you want the transcription (BCP-47, e.g. `hi-IN`)	If omitted, set automatically by provider: Azure `en-US`, Google/Sarvam/AWS `en-IN`, KARYA_LOCAL `hi-IN`
`provider`	string	`AZURE`, `GOOGLE`, `SARVAM`, `AWS`, or `KARYA_LOCAL` (Indic-Conformer)	`AZURE`
`language_identification`	boolean	Automatic language detection. Supported for Azure only. Ignored by Google, Sarvam, and AWS. For `KARYA_LOCAL`, use `false` and set `locale` yourself. If you send `language_identification: true` with `KARYA_LOCAL`, the API returns HTTP 400 with an error message explaining that automatic detection is not supported for this provider yet.	`false`
`wordlevel_timestamp`	boolean	Include timing information for each word. Ignored for `KARYA_LOCAL` (Indic-Conformer).	`false`
`diarization`	boolean	Enable speaker identification. Ignored for `KARYA_LOCAL` (Indic-Conformer).	`false`

If Single Transcription¶

files (File upload): This field is used to upload one or more audio files for transcription
File Format: All files must be audio files. Supported extensions: .wav, .mp3, .flac, .webm, .m4a, .aac, .speex. The server validates by MIME type, so any format recognized as audio/* will also be accepted.

Note

For single transcription, users can directly upload one or more audio files with their request, as long as the total size of all files is less than 25 MB. For batch transcription, users must first upload their audio files as a dataset using the provided script, and then include the corresponding dataset_id(s) in the payload when making the API request.
For Sarvam ASR, users can upload 1–20 audio files per request. Uploading more than 20 files may cause the task to fail.
For Indic-Conformer (provider: "KARYA_LOCAL"), you can send up to 20 files per request. When audio is referenced by remote URIs (as with batch datasets backed by Google Cloud Storage or Azure Blob), the service reads it from that cloud storage, so files must already live there. For single transcription, you can upload files directly with the request; the server accepts them and temporarily stages them for Indic-Conformer (see Indic-Conformer below).

1. Batch (Async) Transcription¶

Info

Batch transcription is used for processing multiple audio files or large files asynchronously. The API will return a job ID, and you can poll for results.

Endpoint Details¶

Endpoint: https://languageserver.karya.in/v1/task/batch/transcription
Method: POST

Example cURL Command¶

Here's an example of how to make a request using curl:

curl -X 'POST' 'https://languageserver.karya.in/v1/task/batch/transcription' \
  -H 'accept: application/json' \
  -H 'X-API-Key: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{"user_email":"your_email@example.com","project_name":"Individual","dataset_ids":["d1","d2"],"models":["long"],"locale":"hi-IN","provider":"GOOGLE"}'

Note

Replace YOUR_API_KEY with your actual API key and your_email@example.com with your email.

Example Response Body¶

Upon successful submission, the API will return a JSON object providing details about the submitted task:

{
  "task_id": "c64baaa1c99d42f589ee184369aa6843",
  "dataset_ids": ["4dd4e846bc014a988cbc49bccd217e96"],
  "status": "PENDING"
}

Response Fields¶

Field	Description
`task_id`	A unique identifier assigned to your submitted task
`dataset_ids`	IDs of dataset sent by the user
`status`	The current status of the task (e.g., "PENDING", "COMPLETED", "FAILED")

Note

Once submitted, poll GET /v1/status/{task_id} until the status is COMPLETED or FAILED. The completed response will contain the transcription results in the responses field.

2. Single (Sync) Transcription¶

Info

Single transcription is used for processing audio files synchronously. The API will return the transcription result directly.

Endpoint Details¶

Endpoint: https://languageserver.karya.in/v1/task/transcription
Method: POST

Example cURL Command¶

Here's an example of how to make a request using curl:

curl -X 'POST' 'https://languageserver.karya.in/v1/task/transcription' \
  -H 'accept: application/json' \
  -H 'X-API-Key: YOUR_API_KEY' \
  -H 'Content-Type: multipart/form-data' \
  -F 'payload_data={"user_email":"your_email@example.com","project_name":"Individual","models":["long"],"locale":"hi-IN","provider":"GOOGLE"}' \
  -F 'files=@output.wav;type=audio/wav'

Note

Replace YOUR_API_KEY with your actual API key and your_email@example.com with your email.

Example Response Body¶

Upon successful submission, the API will return a JSON object providing result of the submitted task:

{
  "task_id": "c64baaa1c99d42f589ee184369aa6843",
  "status": "COMPLETED",
  "result": {
    "transcriptions": [
      {
        "file": "audio_file_name.wav",
        "transcription": "The transcribed text here..."
      }
    ]
  }
}

Response Fields¶

Field	Description
`task_id`	Unique identifier for the task
`status`	Task status (`COMPLETED` or `FAILED`)
`result`	Task output — structure varies by provider (see below)
`result.transcriptions`	Array of transcription objects, one per audio file

Note

The result structure varies by provider. Google returns results keyed by model name: {"long": {"transcriptions": [...]}}. Azure, Sarvam, AWS, and KARYA_LOCAL (Indic-Conformer) return {"transcriptions": [...]} directly, with one entry per file (uri plus transcript text under transcript).

Response & Error Codes¶

Code	Description
200	Request successful
400	Bad request — invalid payload, unsupported locale or model, no valid files, too many files (Sarvam and KARYA_LOCAL: max 20 per request), `language_identification: true` with `KARYA_LOCAL`, or file storage URLs the service cannot use for Indic-Conformer
403	Permission denied (e.g., GCS credentials issue for Google)
408	Request timeout — provider took too long to respond (Sarvam, AWS)
422	Validation error — missing required fields
429	Rate limit exceeded (Google)
500	Internal server error
502	Provider API error (Azure, Google)
504	Transcription timed out waiting for provider result (Azure)

Models and Locales¶

Provider Models¶

Provider	Default Model	Available Models	Supported Locales
Azure	`base`	base	"en-US", "gu-IN", "bn-IN", "hi-IN", "kn-IN", "ml-IN", "mr-IN", "pa-IN", "ta-IN", "te-IN"
Google	`long`	long, short, telephony, telephony_short	"hi-IN", "en-IN", "mr-IN"...
Sarvam	`saarika:v2.5`	`saarika:v2.5`	"en-IN", "gu-IN", "bn-IN", "hi-IN", "kn-IN", "ml-IN", "mr-IN", "pa-IN", "ta-IN", "te-IN", "od-IN"
AWS	`base`	base	"en-IN", "gu-IN", "bn-IN", "hi-IN", "kn-IN", "ml-IN", "mr-IN", "pa-IN", "ta-IN", "te-IN"
KARYA_LOCAL (Indic-Conformer)	`indic-conformer`	`indic-conformer` only	See supported locales below

Google Model and Locale Combinations¶

Provider	Model	Supported Locales
Google	`long`	"hi-IN", "en-IN", "en-US", "mr-IN", "te-IN", "ta-IN", "kn-IN", "ml-IN", "bn-BD"
Google	`short`	"hi-IN", "en-IN", "en-US", "mr-IN", "te-IN", "ta-IN", "kn-IN", "ml-IN", "bn-BD"
Google	`telephony`	"hi-IN", "en-IN", "en-US"
Google	`telephony_short`	"hi-IN", "en-IN", "en-US"

Model Selection Guide¶

Google Models¶

Model	Use Case	Description
`long`	Long-form content	Use for any type of long form content, such as media or spontaneous speech and conversations. Consider using this model especially if they aren't available in your target language
`short`	Short utterances	Use for short utterances that are a few seconds in length. It's useful for trying to capture commands or other single-short directed speech use cases
`telephony`	Phone calls	Use for audio that originated from an audio phone call, typically recorded at an 8 kHz sampling rate. Ideal for customer service, teleconferencing, and automated kiosk applications
`telephony_short`	Short phone calls	Dedicated version of the telephony model for short or even single-word utterances for audio that originated from a phone call, typically recorded at an 8 kHz sampling rate. Useful for utterances only a few seconds long in customer service, teleconferencing, and automated kiosk applications

Indic-Conformer (`KARYA_LOCAL`)¶

Indic-Conformer is an option for transcribing Indic and related languages when your organization routes traffic to that engine. In API requests it appears as provider KARYA_LOCAL with model indic-conformer.

Availability

This provider is only active on deployments where it has been turned on. If you use KARYA_LOCAL and see errors about the service being unavailable or misconfigured, contact your platform or DevOps team—setup is handled outside this API.

Indic-Conformer — what callers need to know

Set provider to KARYA_LOCAL and choose a supported locale (for example hi-IN). If you omit locale, the server uses the default for this provider: hi-IN.
Automatic language detection is not supported: keep language_identification false and set locale yourself. If you send language_identification: true, you get HTTP 400 with a clear error message.
Up to 20 audio files per request (single or batch).
wordlevel_timestamp and diarization are ignored for this provider (no word timings or speaker separation from Indic-Conformer through this API).

Choosing Indic-Conformer¶

Set provider to KARYA_LOCAL when you want this engine (for example for supported Indian languages). You can omit models; the default for this provider is indic-conformer.

Set locale to the spoken language of the audio. Use one of the supported locale codes below (for example hi-IN for Hindi). If you omit locale, the default is hi-IN.

Keep language_identification set to false. You must pick the language yourself via locale. Sending language_identification: true results in HTTP 400 with a message that automatic language detection is not available for KARYA_LOCAL yet.

Word-level timestamps and speaker diarization are not supported for Indic-Conformer—the wordlevel_timestamp and diarization fields are ignored for this provider.

Files and storage¶

Single or batch: at most 20 audio files per request (same idea as Sarvam’s limit).
Single transcription (direct upload): send audio with multipart/form-data as for other providers. The server accepts the upload and temporarily stages it in cloud storage before calling Indic-Conformer—you do not need to pre-store files in your own bucket when uploading this way.
Batch (remote URIs via datasets): upload files to a dataset as usual. The service reads them from Google Cloud Storage (gs:// links or standard GCS HTTPS URLs) or from Azure Blob using a SAS URL—audio must already be in storage the backend can access. Other random HTTP links are not supported for this provider.

Supported locales¶

Use these values in the locale field:

as-IN, bn-IN, br-IN, doi-IN, en-IN, gu-IN, hi-IN, kn-IN, kok-IN, ks-IN, mai-IN, ml-IN, mni-IN, mr-IN, ne-IN, or-IN, pa-IN, sa-IN, sat-IN, sd-IN, ta-IN, te-IN, ur-IN

Example payload (single transcription)¶

{
  "user_email": "your_email@example.com",
  "project_name": "Individual",
  "provider": "KARYA_LOCAL",
  "locale": "hi-IN",
  "language_identification": false
}

For batch transcription, use the same fields in your JSON body plus dataset_ids, as for other providers.

Transcription Service Guide¶

Request Headers¶

Request Body: Form Data Fields¶

payload_data (JSON string)¶

Mandatory Fields¶

Optional Fields¶

If Single Transcription¶

1. Batch (Async) Transcription¶

Endpoint Details¶

Example cURL Command¶

Example Response Body¶

Response Fields¶

2. Single (Sync) Transcription¶

Endpoint Details¶

Example cURL Command¶

Example Response Body¶

Response Fields¶

Response & Error Codes¶

Models and Locales¶

Provider Models¶

Google Model and Locale Combinations¶

Model Selection Guide¶

Google Models¶

Indic-Conformer (KARYA_LOCAL)¶

Choosing Indic-Conformer¶

Files and storage¶

Supported locales¶

Example payload (single transcription)¶

Indic-Conformer (`KARYA_LOCAL`)¶