Language Server Completion API Guide¶

This document outlines how to interact with the Language Server Completion API, covering both asynchronous batch processing and synchronous single (chat) completion methods.

Batch Completion (Asynchronous)¶

Info

This method is designed for processing a large number of requests from a .jsonl file

The process involves three main steps: 1. Dataset Preparation - Prepare your .jsonl file 2. Dataset Upload - Upload the file to the server 3. Send Batch Request - Submit the batch completion request

Step 1: Dataset Preparation¶

Prepare your .jsonl file where each line contains a single completion request in JSON format.

Example .jsonl file content:

{"task_id": "task_1", "system_prompt": "You are a Helpful Assistant", "messages": [{"role": "user", "content": "Tell me about Karya Inc?"}], "max_tokens": 1000}
{"task_id": "task_2", "system_prompt": "You are a coding assistant", "messages": [{"role": "user", "content": "Write a Python function to calculate factorial"}], "max_tokens": 500}
{"task_id": "task_3", "system_prompt": "You are a helpful assistant", "messages": [{"role": "user", "content": "What is machine learning?"}], "max_tokens": 300}

Required Fields¶

Field	Description
`task_id`	Unique identifier for the request
`messages`	Array of message objects with `role` and `content`
`system_prompt`	System instruction for the model
`max_tokens`	Maximum tokens for the response

Message Roles in Batch JSONL

In the batch JSONL format, messages only supports "user" and "assistant" roles. Do not put system instructions in the messages array — use the dedicated system_prompt field instead.

Warning

max_tokens >= 1000

When setting max_tokens, consider that some models are reasoning models that use "thinking tokens" for internal processing. These thinking tokens are included in your token count, so setting max_tokens too low may result in incomplete responses. We recommend setting max_tokens to at least 1000 to ensure adequate space for both reasoning and response generation.

Optional Fields¶

Field	Type	Description
`temperature`	float	Controls randomness (0.0 to 1.0)
`top_p`	float	Controls nucleus sampling (0.0 to 1.0)
`top_k`	integer	Controls top-k sampling (integer ≥ 0)
`stop_sequences`	array	List of strings to stop generation
`response_format`	object	Format specification for the response. For Gemini models, supports `{"type": "json_object"}` for JSON output and `{"type": "json_object", "schema": {...}}` to pass a JSON Schema for structured output.
`n`	integer	Number of responses to generate (integer > 0)
`presence_penalty`	float	Penalty for presence of tokens
`frequency_penalty`	float	Penalty for frequency of tokens
`seed`	integer	Random seed for reproducible results

Step 2: Upload Dataset¶

Upload your prepared .jsonl file to the Language Server. For detailed instructions on dataset upload, see the Dataset upload guide.

Step 3: Send Batch Completion Request¶

Endpoint Details¶

Endpoint: https://languageserver.karya.in/v1/task/batch/completion
Method: POST

Request Headers¶

Accept: application/json
X-API-Key: YOUR_API_KEY
Content-Type: application/json

Request Body¶

The request body must be JSON. Include user_email, project_name, dataset_ids, and optionally task_name and models (omit or use ["default"] to use the default model).

Example request body:

{
  "user_email": "user@example.com",
  "dataset_ids": ["dataset_id_123"],
  "project_name": "Example-Project",
  "task_name": "batch_completion_task",
  "models": ["gemini-3-flash-preview", "gemini-2.5-flash", "claude-sonnet-4-6"]
}

Example cURL Command (Batch Completion):

curl -X 'POST' \
  'https://languageserver.karya.in/v1/task/batch/completion' \
  -H 'accept: application/json' \
  -H 'X-API-Key: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "user_email": "user@example.com",
    "project_name": "Example-Project",
    "task_name": "batch_completion_task",
    "dataset_ids": ["dataset_id_123"],
    "models": ["gemini-3-flash-preview", "gemini-2.5-flash", "claude-sonnet-4-6"]
  }'

Example Response (Submission)¶

Upon successful submission, the API returns identifiers for the asynchronous task.

{
  "task_id": "task_id_456",
  "dataset_ids": ["dataset_id_789"],
  "status": "PENDING"
}

Step 4: Checking Task Status and Retrieving Results¶

Use the task_id from the submission response to poll this endpoint.

Endpoint Details¶

Endpoint: https://languageserver.karya.in/v1/status/{task_id}
Method: GET

Example cURL Command (Check Status):

curl -X 'GET' \
  'https://languageserver.karya.in/v1/status/task_id_456' \
  -H 'accept: application/json' \
  -H 'X-API-Key: YOUR_API_KEY'

Task Status Values¶

Status	Description
`PENDING`	Task is queued
`COMPLETED`	All items processed successfully
`PARTIAL_COMPLETE`	Some items succeeded, some failed
`FAILED`	Task failed entirely

Example Response (Completed)¶

When the task is complete, the response will contain a link to the output file with the results. The failure_summary field reports per-model file counts.

{
  "status": "COMPLETED",
  "responses": [
    {
      "id": "response_id_123",
      "task_id": "task_id_456",
      "response": {
        "model_statuses": {
          "gemini-2.5-flash": "COMPLETED",
          "gemini-3-flash-preview": "COMPLETED"
        },
        "input_dataset_ids": ["dataset_id_789"],
        "output_dataset_id": "output_dataset_id_101",
        "failure_summary": {
          "gemini-2.5-flash": {"successful_files": 10, "failed_files": 0},
          "gemini-3-flash-preview": {"successful_files": 10, "failed_files": 0}
        }
      },
      "created_at": "2025-01-01T12:00:00.000000",
      "updated_at": "2025-01-01T12:05:00.000000"
    }
  ]
}

Single Completion (Synchronous Chat)¶

Info

This method is for single, synchronous chat completion requests, suitable for real-time and interactive applications

Endpoint Details¶

Endpoint: https://languageserver.karya.in/v1/task/completion
Method: POST

Request Body¶

The request must be multipart/form-data and contain a payload JSON string. The payload can include user_email, project_name, task_name, models (one or more model IDs), messages, max_tokens, and any other parameters from the Model Parameters Reference. For requests that include documents, images, or audio, also send a files field with one or more file parts.

Payload field	Required	Description
`user_email`	✅	User email for the request
`project_name`	✅	Project name
`messages`	✅	Array of message objects with `role` and `content`
`task_name`	❌	Optional task identifier
`models`	❌	List of model IDs. Required for single completion — omitting this or using `["default"]` returns a 400 error. Can list one or more supported models.
`max_tokens`, `temperature`, `reasoning_effort`, etc.	❌	See Model Parameters Reference below

Always specify models explicitly

The default model for single completion is gemini-3-flash-preview, but you must name it explicitly — e.g. "models": ["gemini-3-flash-preview"]. Omitting models or passing ["default"] will return a 400 error.

Example cURL (text-only, multiple models):

curl -X 'POST' \
  'https://languageserver.karya.in/v1/task/completion' \
  -H 'accept: application/json' \
  -H 'X-API-Key: YOUR_API_KEY' \
  -H 'Content-Type: multipart/form-data' \
  -F 'payload={"user_email":"user@example.com","project_name":"Example-Project","task_name":"single_completion_task","models":["gpt-4o","gemini-3-flash-preview","claude-sonnet-4-6"],"messages":[{"role":"user","content":"What is one plus one?"}],"max_tokens":4000}'

Example cURL (with file: document summary across all models):

curl -X 'POST' \
  'https://languageserver.karya.in/v1/task/completion' \
  -H 'accept: application/json' \
  -H 'X-API-Key: YOUR_API_KEY' \
  -H 'Content-Type: multipart/form-data' \
  -F 'payload={"user_email":"user@example.com","project_name":"test","task_name":"manual_pdf_all","models":["gpt-4o","gpt-4o-mini","gpt-5","gpt-5-chat","gpt-5-mini","gpt-5-nano","gemini-2.5-flash","gemini-2.5-pro","gemini-3-flash-preview","gemini-3.1-pro-preview","gemini-3.1-flash-lite-preview","claude-sonnet-4","claude-haiku-4-5","claude-sonnet-4-6","claude-opus-4-6"],"messages":[{"role":"user","content":"Summarize the key points in this document."}],"max_tokens":10000}' \
  -F 'files=@document.pdf;type=application/pdf'

Use your own file path in place of document.pdf. For PDFs, Gemini and Claude models support the file; Azure OpenAI supports images only, so the PDF will be skipped for those models and reported in skipped_files in the response if present.

Example Payloads¶

Minimal Payload¶

{
  "user_email": "user@example.com",
  "project_name": "Example-Project",
  "task_name": "your_task",
  "models": ["gemini-3-flash-preview"],
  "messages": [
    {"role": "system", "content": "You are a Helpful Assistant"},
    {"role": "user", "content": "Hello, who are you?"}
  ]
}

Comprehensive Payload¶

{
  "user_email": "user@example.com",
  "project_name": "Example-Project",
  "task_name": "comprehensive_completion_task",
  "models": ["gpt-4o"],
  "messages": [
    {"role": "system", "content": "You are a Helpful Assistant"},
    {"role": "user", "content": "Describe this image."}
  ],
  "max_tokens": 1024,
  "reasoning_effort": "medium",
  "temperature": 0.7,
  "top_p": 0.9,
  "n": 1,
  "stream": false,
  "stop": ["END", "STOP"],
  "presence_penalty": 0.0,
  "frequency_penalty": 0.0,
  "logit_bias": {
    "12345": 0.1,
    "67890": -0.1
  },
  "seed": 42,
  "logprobs": true,
  "top_logprobs": 5
}

Model Parameters Reference¶

Provider-Specific Parameter Support

Not all parameters are supported by every model:

Claude models: reasoning_effort forces temperature=1 and top_p=0.95. n, presence_penalty, and frequency_penalty are not supported.
GPT-5 family (gpt-5, gpt-5-chat, gpt-5-mini, gpt-5-nano): top_p is not applied. reasoning_effort is not supported for gpt-5-chat.
Batch Anthropic: Only temperature, top_p, top_k, stop_sequences are passed to the provider; other parameters are silently ignored.

Parameter	Type	Description
`messages`	array	Conversation history with role(user, system, assistant) and content
`max_tokens`	integer	Maximum number of tokens to generate in the response
`reasoning_effort`	string	Level of reasoning effort ("disable", "low", "medium", "high"). Use `"disable"` to turn off extended thinking.
`temperature`	float	Controls randomness in generation (0.0 = deterministic, 1.0 = creative)
`top_p`	float	Nucleus sampling parameter (0.0-1.0, higher = more diverse)
`n`	integer	Number of response variations to generate
`stream`	boolean	Whether to stream the response in real-time
`stop`	string/array	Sequences that will stop generation when encountered
`presence_penalty`	float	Penalty for repeating tokens (reduces repetition)
`frequency_penalty`	float	Penalty based on token frequency (reduces common words)
`logit_bias`	object	Bias specific token probabilities using token IDs
`seed`	integer	Random seed for reproducible results
`logprobs`	boolean	Whether to return log probabilities for each token
`top_logprobs`	integer	Number of top log probabilities to return per token

Claude and reasoning_effort

When using reasoning_effort with Claude models, the server automatically:

Sets temperature to 1 (overriding any value you specified)
Sets top_p to 0.95 (overriding any value you specified)
Increases max_tokens if it is smaller than the thinking budget

Budget tokens by level: low = 1024, medium = 2048, high = 4096.

Multimodal Requests¶

Single completion supports file inputs (images, and for some providers, audio and documents). Send a multipart/form-data request with a payload JSON string and a files field containing one or more file parts. You do not need to reference files in the message content—the server attaches all uploaded files to the last user message in the conversation automatically.

Info

Single completion only — Multimodal file uploads are supported only for the single (chat) completion endpoint. Batch completion uses JSONL input only and does not accept file uploads.

Multimodal support by provider¶

Provider	Images	Audio	Documents
Azure OpenAI (gpt-4o, gpt-4o-mini, etc.)	✅	❌	❌
Vertex AI — Google (Gemini models)	✅	✅	✅
Vertex AI — Anthropic (Claude models)	✅	❌	✅

Supported file types:

Images (all providers): .jpg, .jpeg, .png, .gif, .webp, .bmp, .tiff
Audio (Google only): .mp3, .wav, .m4a, .flac, .aac, .ogg, .webm
Documents (Google, Anthropic): .pdf, .txt, .csv, .json

Files are validated by extension at upload. Total size of all files in one request must not exceed 25 MB. If a model does not support a given file type, that file may be skipped for that model and reported in the response under skipped_files.

Example cURL (image):

curl -X 'POST' \
  'https://languageserver.karya.in/v1/task/completion' \
  -H 'accept: application/json' \
  -H 'X-API-Key: YOUR_API_KEY' \
  -H 'Content-Type: multipart/form-data' \
  -F 'payload={"user_email":"user@example.com","project_name":"Example-Project","task_name":"multimodal_task","models":["gpt-4o","gemini-3-flash-preview","claude-sonnet-4-6"],"messages":[{"role":"user","content":"What do you see in this image?"}],"max_tokens":4000}' \
  -F 'files=@image.png;type=image/png'

Example cURL (PDF document): The Request Body section above includes an example that sends a PDF with a "Summarize the key points" prompt and all supported models; use -F 'files=@yourfile.pdf;type=application/pdf' for your file.

Example Response Body¶

The API returns the synchronous response from the model(s). The top-level keys are task_id, status, and result. result.responses contains per-model results as full model response objects. Models that could not run are listed in result.skipped_models; files that weren't compatible with a model appear in result.skipped_files.

{
  "task_id": "task_id_789",
  "status": "COMPLETED",
  "result": {
    "responses": {
      "gpt-4o": {
        "id": "chatcmpl-...",
        "choices": [
          {
            "index": 0,
            "message": {
              "role": "assistant",
              "content": "The image depicts a humorous and imaginative scene where Formula 1 cars are playing soccer..."
            },
            "finish_reason": "stop"
          }
        ],
        "model": "gpt-4o",
        "usage": {
          "completion_tokens": 1448,
          "prompt_tokens": 2333,
          "total_tokens": 3781
        }
      },
      "claude-sonnet-4-6": {
        "id": "msg_...",
        "choices": [
          {
            "index": 0,
            "message": {
              "role": "assistant",
              "content": "This is a creative, digitally-composed image that combines Formula 1 racing with soccer/football..."
            },
            "finish_reason": "end_turn"
          }
        ],
        "model": "claude-sonnet-4-6",
        "usage": {
          "completion_tokens": 326,
          "prompt_tokens": 1413,
          "total_tokens": 1739
        }
      }
    },
    "skipped_models": [],
    "skipped_files": []
  }
}

Currently Supported Models¶

Batch completion (default model: gemini-3-flash-preview):

Azure OpenAI: gpt-4o, gpt-4o-mini
Vertex AI — Anthropic: claude-sonnet-4, claude-haiku-4-5, claude-sonnet-4-6, claude-opus-4-6
Vertex AI — Google: gemini-2.5-flash, gemini-2.5-pro, gemini-3-flash-preview, gemini-3.1-pro-preview, gemini-3.1-flash-lite-preview

Single (chat) completion (default model: gemini-3-flash-preview):

Azure OpenAI: gpt-4o, gpt-4o-mini, gpt-5, gpt-5-chat, gpt-5-mini, gpt-5-nano
Vertex AI — Google: gemini-2.5-flash, gemini-2.5-pro, gemini-3-flash-preview, gemini-3.1-pro-preview, gemini-3.1-flash-lite-preview
Vertex AI — Anthropic: claude-sonnet-4, claude-haiku-4-5, claude-sonnet-4-6, claude-opus-4-6

Error Codes¶

HTTP Code	Cause
`400`	Invalid model name, missing `dataset_ids`, no valid files in JSONL, invalid `reasoning_effort` value, or total upload exceeds 25 MB
`404`	Dataset IDs not found in database
`422`	Missing required fields
`500`	Task execution failed — check server logs

Best Practices¶

Message Structure¶

✅ System message: Usually placed first to set the AI's behavior and context
👤 User message: Contains the actual question, request, or input from the user
🤖 Assistant message: Used for multi-turn conversations to maintain context
📋 Order matters: Messages are processed in the order they appear in the array

Token Management¶

🎯 Max Tokens: Set max_tokens to at least 1000 for reasoning models to account for thinking tokens
💰 Token Efficiency: Monitor usage patterns to optimize costs and performance
📏 Context Length: Keep conversation history reasonable to avoid hitting token limits

Model Selection¶

🧠 Reasoning Models: Use for complex problem-solving tasks that benefit from internal reasoning
📝 Standard Models: Use for straightforward text generation and simple tasks
🔄 Multiple Models: Compare outputs from different models for better results

Parameter Tuning¶

Parameter	Use Case	Recommended Value
Temperature	Factual tasks	0.0-0.3
Temperature	Creative tasks	0.7-1.0
Top-p	General use	Start with 0.9
Stop Sequences	Content control	Use to prevent unwanted content
Penalties	Reduce repetition	Apply presence/frequency penalties