Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Togethercomputer Together python Batch Input Preparation

From Leeroopedia
Attribute Value
Type Principle
Domains Batch_Processing, Inference, API_Client
Repository togethercomputer/together-python
Last Updated 2026-02-15 16:00 GMT

Overview

Pattern for constructing JSONL input files containing batched inference requests for offline processing.

Description

Batch input preparation defines the schema for creating JSONL files where each line represents one API request. Each line contains a custom_id for tracking, the HTTP method (POST), the target endpoint URL (/v1/chat/completions or /v1/completions), and a body matching the corresponding API request format. This enables large-scale offline inference by collecting many requests into a single file that can be submitted as one batch job.

The input file must be valid JSONL (JSON Lines), meaning each line is an independent, self-contained JSON object terminated by a newline. The server processes each line as an individual inference request, matching it against the specified endpoint's request schema.

Usage

Use this principle when you need to process many inference requests asynchronously at reduced cost, rather than making individual real-time API calls. Batch inference is appropriate when:

  • You have a large number of prompts or conversations to process and do not require immediate responses.
  • You want to take advantage of reduced pricing for offline/batch workloads.
  • You need to track individual request results via custom identifiers.
  • You are running evaluation, data generation, or bulk classification pipelines.

Theoretical Basis

Batch processing amortizes API overhead across many requests. Instead of incurring per-request connection, authentication, and scheduling overhead, a batch job processes an entire file of requests in a single submission. This reduces total latency and cost for large workloads.

Each line in the JSONL input file follows this schema:

{"custom_id": "<unique-string>", "method": "POST", "url": "/v1/chat/completions", "body": { ... }}

Where:

  • custom_id (string) -- A unique identifier for tracking this request in the output. Must be unique within the file.
  • method (string) -- The HTTP method, always "POST".
  • url (string) -- The target endpoint. Must be one of the supported BatchEndpoint values: /v1/completions or /v1/chat/completions.
  • body (object) -- The request payload, conforming to the schema of the target endpoint (CompletionRequest for /v1/completions, ChatCompletionRequest for /v1/chat/completions).

The two supported endpoint types are:

  • /v1/chat/completions -- For chat-based models. The body must include a model and a messages array.
  • /v1/completions -- For text completion models. The body must include a model and a prompt string.

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment