Skip to content

Language Server Completion API Guide

This document outlines how to interact with the Language Server Completion API, covering both asynchronous batch processing and synchronous single (chat) completion methods.


Batch Completion (Asynchronous)

Info

This method is designed for processing a large number of requests from a .jsonl file

The process involves three main steps: 1. Dataset Preparation - Prepare your .jsonl file 2. Dataset Upload - Upload the file to the server 3. Send Batch Request - Submit the batch completion request

Step 1: Dataset Preparation

Prepare your .jsonl file where each line contains a single completion request in JSON format.

Example .jsonl file content:

{"task_id": "task_1", "system_prompt": "You are a Helpful Assistant", "messages": [{"role": "user", "content": "Tell me about Karya Inc?"}], "max_tokens": 1000}
{"task_id": "task_2", "system_prompt": "You are a coding assistant", "messages": [{"role": "user", "content": "Write a Python function to calculate factorial"}], "max_tokens": 500}
{"task_id": "task_3", "system_prompt": "You are a helpful assistant", "messages": [{"role": "user", "content": "What is machine learning?"}], "max_tokens": 300}

Required Fields

Field Description
task_id Unique identifier for the request
messages Array of message objects with role and content
system_prompt System instruction for the model
max_tokens Maximum tokens for the response

Warning

max_tokens >= 1000

When setting max_tokens, consider that some models are reasoning models that use "thinking tokens" for internal processing. These thinking tokens are included in your token count, so setting max_tokens too low may result in incomplete responses. We recommend setting max_tokens to at least 1000 to ensure adequate space for both reasoning and response generation.

Optional Fields

Field Type Description
temperature float Controls randomness (0.0 to 1.0)
top_p float Controls nucleus sampling (0.0 to 1.0)
top_k integer Controls top-k sampling (integer > 0)
stop_sequences array List of strings to stop generation
response_format object Format specification for the response
n integer Number of responses to generate (integer > 0)
presence_penalty float Penalty for presence of tokens
frequency_penalty float Penalty for frequency of tokens
seed integer Random seed for reproducible results

Step 2: Upload Dataset

Upload your prepared .jsonl file to the Language Server. For detailed instructions on dataset upload, see the Dataset upload guide.

Step 3: Send Batch Completion Request

Endpoint Details

  • Endpoint: https://language-server-url/v1/task/batch/completion
  • Method: POST

Request Headers

Accept: application/json
X-API-Key: YOUR_API_KEY
Content-Type: multipart/form-data

Request Body

Example payload_data:

{
  "user_email": "user@example.com",
  "dataset_ids": ["dataset_id_123"],
  "project_name": "Example-Project",
  "task_name": "batch_completion_task",
  "models": ["claude-3-5-sonnet-v2", "gemini-2.5-flash"]
}

Example cURL Command (Batch Completion):

curl -X 'POST' \
  'https://language-server-url/v2/task/batch/completion' \
  -H 'accept: application/json' \
  -H 'X-API-Key: YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "user_email": "user@example.com",
    "project_name": "Example-Project",
    "task_name": "batch_completion_task",
    "dataset_ids": ["dataset_id_123"],
    "models": ["gemini-2.0-flash-lite", "gemini-2.5-flash"]
  }'

Example Response (Submission)

Upon successful submission, the API returns identifiers for the asynchronous task.

{
  "task_id": "task_id_456",
  "dataset_ids": ["dataset_id_789"],
  "status": "PENDING"
}

Step 4: Checking Task Status and Retrieving Results

Use the task_id from the submission response to poll this endpoint.

Endpoint Details

  • Endpoint: https://language-server-url/v1/status/{task_id}
  • Method: GET

Example cURL Command (Check Status):

curl -X 'GET' \
  'https://language-server-url/v1/status/task_id_456' \
  -H 'accept: application/json' \
  -H 'X-API-Key: YOUR_API_KEY'

Example Response (Completed)

When the task is complete, the response will contain a link to the output file with the results.

{
  "status": "COMPLETED",
  "responses": [
    {
      "id": "response_id_123",
      "task_id": "task_id_456",
      "response": {
        "model_statuses": {
          "gemini-2.5-flash": "COMPLETED",
          "gemini-2.0-flash-lite": "COMPLETED"
        },
        "input_dataset_ids": [
          "dataset_id_789"
        ],
        "output_dataset_id": "output_dataset_id_101"
      },
      "created_at": "2025-01-01T12:00:00.000000",
      "updated_at": "2025-01-01T12:05:00.000000"
    }
  ]
}

Single Completion (Synchronous Chat)

Info

This method is for single, synchronous chat completion requests, suitable for real-time and interactive applications

Endpoint Details

  • Endpoint: https://language-server-url/v1/task/completion
  • Method: POST

Request Body

The request must be multipart/form-data and contain a payload JSON string. For multimodal requests, also include a files field.

Example cURL Command (Single Completion):

curl -X 'POST' \
  'https://language-server-url/v2/task/completion' \
  -H 'accept: application/json' \
  -H 'X-API-Key: YOUR_API_KEY' \
  -H 'Content-Type: multipart/form-data' \
  -F 'payload={"user_email":"user@example.com","project_name":"Example-Project","task_name":"single_completion_task","models":["gpt-4o","gemini-2.5-flash","claude-sonnet-4"],"messages":[{"role":"system","content":"You are a helpful assistant"},{"role":"user","content":"What is one plus one?"}],"max_tokens":4000,"reasoning_effort":"low"}' \
  

Example Payloads

Minimal Payload
{
  "user_email": "user@example.com",
  "project_name": "Example-Project",
  "task_name": "your_task",
  "models": ["claude-3-5-sonnet-v2"],
  "messages": [
    {"role": "system", "content": "You are a Helpful Assitant"},
    {"role": "user", "content": "Hello, who are you?"}
  ]
}
Comprehensive Payload
{
  "user_email": "user@example.com",
  "project_name": "Example-Project",
  "task_name": "comprehensive_completion_task",
  "models": ["gpt-4o"],
  "messages": [
    {"role": "system", "content": "You are a Helpful Assistant"},
    {"role": "user", "content": "Describe this image."}
  ],
  "max_tokens": 1024,
  "reasoning_effort": "medium",
  "temperature": 0.7,
  "top_p": 0.9,
  "n": 1,
  "stream": false,
  "stop": ["END", "STOP"],
  "presence_penalty": 0.0,
  "frequency_penalty": 0.0,
  "logit_bias": {
    "12345": 0.1,
    "67890": -0.1
  },
  "seed": 42,
  "logprobs": true,
  "top_logprobs": 5
}

Model Parameters Reference

Parameter Type Description
messages array Conversation history with role(user, system, assistant) and content
max_tokens integer Maximum number of tokens to generate in the response
reasoning_effort string Level of reasoning effort for reasoning models ("low", "medium", "high")
temperature float Controls randomness in generation (0.0 = deterministic, 1.0 = creative)
top_p float Nucleus sampling parameter (0.0-1.0, higher = more diverse)
n integer Number of response variations to generate
stream boolean Whether to stream the response in real-time
stop string/array Sequences that will stop generation when encountered
presence_penalty float Penalty for repeating tokens (reduces repetition)
frequency_penalty float Penalty based on token frequency (reduces common words)
logit_bias object Bias specific token probabilities using token IDs
seed integer Random seed for reproducible results
logprobs boolean Whether to return log probabilities for each token
top_logprobs integer Number of top log probabilities to return per token

Multimodal Requests

To send a request with files (e.g., images), include a files field in your multipart/form-data request.

Warning

Currently supported: Image inputs only

Audio inputs are coming soon!

Example cURL Command (Multimodal):

curl -X 'POST' \
  'https:language-server-url/v2/task/completion' \
  -H 'accept: application/json' \
  -H 'X-API-Key: YOUR_API_KEY' \
  -H 'Content-Type: multipart/form-data' \
  -F 'payload={"user_email":"user@example.com","project_name":"Example-Project","task_name":"multimodal_task","models":["gpt-4o","gemini-2.5-flash","claude-sonnet-4"],"messages":[{"role":"system","content":"You are a helpful assistant"},{"role":"user","content":"What do you see in this image?"}],"max_tokens":4000,"reasoning_effort":"low"}' \
  -F 'files=@image.png;type=image/png'

Example Response Body

The API returns the synchronous response from the model(s).

{
  "task_id": "task_id_789",
  "responses": {
    "responses": {
      "gpt-4o": {
        "response": "The image depicts a humorous and imaginative scene where Formula 1 cars are playing soccer...",
        "usage": {
          "completion_tokens": 1448,
          "prompt_tokens": 2333,
          "total_tokens": 3781,
          "completion_tokens_details": {
            "accepted_prediction_tokens": null,
            "audio_tokens": null,
            "reasoning_tokens": 1127,
            "rejected_prediction_tokens": null,
            "text_tokens": 321
          },
          "prompt_tokens_details": {
            "audio_tokens": null,
            "cached_tokens": null,
            "text_tokens": 11,
            "image_tokens": null
          }
        }
      },
      "claude-sonnet-4": {
        "response": "This is a creative, digitally-composed image that combines Formula 1 racing with soccer/football...",
        "usage": {
          "completion_tokens": 326,
          "prompt_tokens": 1413,
          "total_tokens": 1739,
          "completion_tokens_details": {
            "accepted_prediction_tokens": null,
            "audio_tokens": null,
            "reasoning_tokens": null,
            "rejected_prediction_tokens": null,
            "text_tokens": 326
          },
          "prompt_tokens_details": {
            "audio_tokens": null,
            "cached_tokens": null,
            "text_tokens": 1413,
            "image_tokens": null
          }
        }
      }
    }
  }
}

Currently Supported Models

Batch:

  • gpt-4o
  • gpt-4o-mini
  • claude-3-5-sonnet-v2 (will be removing this soon)
  • claude-sonnet-4
  • gemini-2.0-flash
  • gemini-2.0-flash-lite
  • gemini-2.5-flash
  • gemini-2.5-pro
  • gemini-3-pro

Single:

  • gpt-4o
  • gpt-4o-mini
  • gpt-5
  • gpt-5-chat
  • gpt-5-mini
  • gpt-5-nano
  • gemini-2.0-flash
  • gemini-2.0-flash-lite
  • gemini-2.5-flash
  • gemini-2.5-pro
  • gemini-3-pro
  • claude-sonnet-4

Best Practices

Message Structure

  • System message: Usually placed first to set the AI's behavior and context
  • 👤 User message: Contains the actual question, request, or input from the user
  • 🤖 Assistant message: Used for multi-turn conversations to maintain context
  • 📋 Order matters: Messages are processed in the order they appear in the array

Token Management

  • 🎯 Max Tokens: Set max_tokens to at least 1000 for reasoning models to account for thinking tokens
  • 💰 Token Efficiency: Monitor usage patterns to optimize costs and performance
  • 📏 Context Length: Keep conversation history reasonable to avoid hitting token limits

Model Selection

  • 🧠 Reasoning Models: Use for complex problem-solving tasks that benefit from internal reasoning
  • 📝 Standard Models: Use for straightforward text generation and simple tasks
  • 🔄 Multiple Models: Compare outputs from different models for better results

Parameter Tuning

Parameter Use Case Recommended Value
Temperature Factual tasks 0.0-0.3
Temperature Creative tasks 0.7-1.0
Top-p General use Start with 0.9
Stop Sequences Content control Use to prevent unwanted content
Penalties Reduce repetition Apply presence/frequency penalties