Principle:Togethercomputer Together python File Upload

Attribute	Value
Principle Name	File_Upload
Overview	Mechanism for uploading local files to Together AI's cloud storage for use in fine-tuning and batch operations.
Domain	MLOps, Fine_Tuning, Data_Preparation
Repository	togethercomputer/together-python
Last Updated	2026-02-15 16:00 GMT

Description

File upload handles transferring local dataset files to Together AI's servers where they are stored and made available for fine-tuning job creation. The upload mechanism incorporates several design decisions:

Automatic validation -- By default (check=True), the upload method runs the check_file() validation pipeline before initiating any network transfer. If validation fails, a FileTypeError is raised and no data is sent. This prevents wasting bandwidth on files that would be rejected server-side.

Adaptive upload strategy -- The SDK automatically routes between two upload paths based on file size:
- Single-request upload (via UploadManager): Used for files smaller than the multipart threshold (5.0 GB). The entire file is sent in a single HTTP request with redirect handling.
- Multipart concurrent upload (via MultipartUploadManager): Used for files exceeding the 5.0 GB threshold. The file is split into parts (target 250 MB each, minimum 5 MB per S3 requirements) and uploaded concurrently (up to 4 concurrent parts) with a per-part timeout of 300 seconds.

Purpose tagging -- Each uploaded file is tagged with a purpose (default: "fine-tune") that determines how the Together API processes and validates the file on the server side.

The upload returns a FileResponse object containing a unique file ID that is subsequently used when creating fine-tuning jobs.

Usage

Use this principle after preparing and optionally validating a dataset, to make it available for fine-tuning job creation. The typical workflow is:

Prepare the dataset file locally (see Dataset_Preparation principle).
Upload via client.files.upload(file_path).
Use the returned FileResponse.id as the training_file parameter in client.fine_tuning.create().

The upload can be configured with:

check=True (default) to enable pre-upload validation.
check=False to skip validation (useful if you have already validated the file separately or are uploading for a non-fine-tune purpose).
purpose parameter to specify the file's intended use (defaults to "fine-tune").

Theoretical Basis

The adaptive upload strategy follows the common cloud storage pattern of using multipart uploads for large files. This approach provides several benefits:

Resilience -- If a single part fails during upload, only that part needs to be retried rather than the entire file.
Throughput -- Concurrent part uploads can saturate available bandwidth more effectively than a single sequential upload.
S3 compatibility -- The multipart upload follows Amazon S3 conventions (minimum 5 MB part size, maximum 250 parts) since Together's storage layer uses S3-compatible object storage.

The threshold of 5.0 GB is defined in src/together/constants.py as MULTIPART_THRESHOLD_GB. Files below this threshold use a simpler single-request path to avoid the overhead of multipart coordination.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment