Implementation:Togethercomputer Together python Batch Request Format

Attribute	Value
Type	Implementation (Pattern Doc)
Domains	Batch_Processing, Inference, API_Client
Repository	togethercomputer/together-python
Source	src/together/types/batch.py:L26-33
Last Updated	2026-02-15 16:00 GMT

Overview

Pattern documentation for constructing JSONL input files for Together AI batch inference. Users construct these files manually; the SDK defines the supported endpoint values via the BatchEndpoint enum.

Code Reference

The BatchEndpoint enum defines the two supported target URLs for batch requests:

class BatchEndpoint(str, Enum):
    """
    The endpoint of a batch job
    """

    COMPLETIONS = "/v1/completions"
    CHAT_COMPLETIONS = "/v1/chat/completions"
    # More endpoints can be added here as needed

Source: src/together/types/batch.py:L26-33

Supported Endpoints

Enum Value	Endpoint URL	Request Body Type
`BatchEndpoint.COMPLETIONS`	`/v1/completions`	CompletionRequest (model + prompt)
`BatchEndpoint.CHAT_COMPLETIONS`	`/v1/chat/completions`	ChatCompletionRequest (model + messages)

I/O Contract

Input: A JSONL file where each line is a JSON object with the following fields:

Field	Type	Description
`custom_id`	string	Unique identifier for tracking this request in the output
`method`	string	HTTP method, always `"POST"`
`url`	string	One of `/v1/completions` or `/v1/chat/completions`
`body`	object	Request payload matching the target endpoint schema

Output: The file is uploaded to Together AI and used as input to a batch job.

Usage Examples

Chat Completions Endpoint

Example JSONL line for the /v1/chat/completions endpoint:

{"custom_id": "req-001", "method": "POST", "url": "/v1/chat/completions", "body": {"model": "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo", "messages": [{"role": "user", "content": "What is the capital of France?"}], "max_tokens": 256}}

Completions Endpoint

Example JSONL line for the /v1/completions endpoint:

{"custom_id": "req-002", "method": "POST", "url": "/v1/completions", "body": {"model": "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo", "prompt": "The capital of France is", "max_tokens": 64}}

Full Example: Creating a Batch Input File

import json

requests = [
    {
        "custom_id": "req-001",
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            "model": "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
            "messages": [{"role": "user", "content": "What is the capital of France?"}],
            "max_tokens": 256,
        },
    },
    {
        "custom_id": "req-002",
        "method": "POST",
        "url": "/v1/chat/completions",
        "body": {
            "model": "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
            "messages": [{"role": "user", "content": "Explain quantum computing briefly."}],
            "max_tokens": 512,
        },
    },
]

with open("batch_input.jsonl", "w") as f:
    for req in requests:
        f.write(json.dumps(req) + "\n")

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment