Skip to content

Model (Provider) Onboarding Guide

This guide outlines the process for integrating a new model or provider (e.g., Sarvam) into an existing service (e.g., Transcription). This process assumes the new provider will be integrated into the existing asynchronous (batch) and/or synchronous (single) task pipelines.


1. Update Request Payload and Message Types

Request Payload (common/types/request_types.py)

If the new provider requires specific input fields (e.g., a proprietary API key, specific language codes), extend the relevant request schema (BatchTranscriptionRequest, SingleTranscriptionRequest).

Important

Ensure these new fields are optional to maintain backward compatibility for existing providers.

Message Types (common/types/message_types.py)

For the asynchronous (batch) workflow, add any provider-specific fields to the message schema. This is necessary if core logic handlers require additional information to be published to Pub/Sub.

Important

Like the request payload, these fields should be optional.


2. Manage Configuration and Secrets

Add any new provider-related secrets, such as API keys or service account credentials, to the project's .env file.


3. Implement Provider-Specific Preprocessing

Preprocessors (app/tasks/{service_name}/{batch|single}/preprocessor.py)

Update the preprocessor logic to handle any provider-specific data formatting or validation that is required before calling the core model logic. This ensures that the data sent to the new model is in the correct format.


4. Implement Core Logic for the New Provider

New Provider Handler (app/tasks/{service_name}/batch/{newprovider}_batch_{servicename}.py)

Create a dedicated file for the new provider's core logic. For example, app/tasks/transcription/batch/sarvam_batch_transcription.py. This is a clean way to separate concerns and make the code more modular.

In this file, you should: - ✅ Write the core logic for the new provider's execution - 🔧 Include client setup and API calls to the provider's library or endpoint - 🔄 Handle the provider's response and convert it to a standardized internal format


5. Update File Handling Utility

File Fetching Logic (common/cloud_utils/get_urls_from_dataset.py)

This is a critical step for handling different cloud providers. Extend the existing conditional logic to include the new provider. This ensures the correct, provider-specific URLs are generated for fetching input files.

Example of the updated logic:

# Example of the updated logic
for file_record in file_records:
    if file_record.provider == "AZURE":
        sas_url = generate_sas_url_from_public_url(file_record.file_url)
        sas_urls.append(sas_url)
    elif file_record.provider == "GOOGLE":
        gcs_uri = generate_gcs_uri_from_public_url(file_record.file_url)
        gcs_uris.append(gcs_uri)
    elif file_record.provider == "SARVAM": # New logic for Sarvam
        sarvam_uri = generate_sarvam_uri_from_public_url(file_record.file_url)
        sarvam_uris.append(sarvam_uri)

6. Extend Core Handler to Branch by Provider

Task Handler (app/tasks/{service_name}/batch/task_handler.py)

The main task handler for the service should be extended to act as a router. Based on the provider field in the request payload, it will dynamically call the appropriate handler file you created in Step 4. This is where the service branches its logic to the correct model.

Handler Router (app/handlers/handler_router.py)

The centralized router at app/handlers/handler_router.py contains the MESSAGE_TYPE_HANDLERS dictionary that maps task types to their respective handler functions. When adding a new service, you'll need to:

  1. Import your new handler function
  2. Add an entry to the MESSAGE_TYPE_HANDLERS dictionary

Example:

from app.tasks.translation.batch.task_handler import handle_batch_translation

MESSAGE_TYPE_HANDLERS = {
    "transcription": handle_transcription,
    "completion": handle_batch_completion,
    "translation": handle_batch_translation,  # Add your new handler
}


7. Finalize and Monitor

Sync Endpoint (app/routes/v1/root.py)

If you are adding a new sync provider, ensure the endpoint for the service (/task/{service_name}) is extended to handle sync requests for the new provider. This involves routing the request to the new provider's specific handler for synchronous processing.

Finalization Logic

Confirm that the output from the new provider's handler is correctly processed and saved to the database. The service_onboarding.md document emphasizes this final step for both batch and single workflows, ensuring a consistent state regardless of the provider used.