Principle:Togethercomputer Together python Batch Input Preparation
| Attribute | Value |
|---|---|
| Type | Principle |
| Domains | Batch_Processing, Inference, API_Client |
| Repository | togethercomputer/together-python |
| Last Updated | 2026-02-15 16:00 GMT |
Overview
Pattern for constructing JSONL input files containing batched inference requests for offline processing.
Description
Batch input preparation defines the schema for creating JSONL files where each line represents one API request. Each line contains a custom_id for tracking, the HTTP method (POST), the target endpoint URL (/v1/chat/completions or /v1/completions), and a body matching the corresponding API request format. This enables large-scale offline inference by collecting many requests into a single file that can be submitted as one batch job.
The input file must be valid JSONL (JSON Lines), meaning each line is an independent, self-contained JSON object terminated by a newline. The server processes each line as an individual inference request, matching it against the specified endpoint's request schema.
Usage
Use this principle when you need to process many inference requests asynchronously at reduced cost, rather than making individual real-time API calls. Batch inference is appropriate when:
- You have a large number of prompts or conversations to process and do not require immediate responses.
- You want to take advantage of reduced pricing for offline/batch workloads.
- You need to track individual request results via custom identifiers.
- You are running evaluation, data generation, or bulk classification pipelines.
Theoretical Basis
Batch processing amortizes API overhead across many requests. Instead of incurring per-request connection, authentication, and scheduling overhead, a batch job processes an entire file of requests in a single submission. This reduces total latency and cost for large workloads.
Each line in the JSONL input file follows this schema:
{"custom_id": "<unique-string>", "method": "POST", "url": "/v1/chat/completions", "body": { ... }}
Where:
- custom_id (string) -- A unique identifier for tracking this request in the output. Must be unique within the file.
- method (string) -- The HTTP method, always
"POST". - url (string) -- The target endpoint. Must be one of the supported
BatchEndpointvalues:/v1/completionsor/v1/chat/completions. - body (object) -- The request payload, conforming to the schema of the target endpoint (CompletionRequest for
/v1/completions, ChatCompletionRequest for/v1/chat/completions).
The two supported endpoint types are:
- /v1/chat/completions -- For chat-based models. The body must include a
modeland amessagesarray. - /v1/completions -- For text completion models. The body must include a
modeland apromptstring.