Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Togethercomputer Together python File Upload

From Leeroopedia
Attribute Value
Principle Name File_Upload
Overview Mechanism for uploading local files to Together AI's cloud storage for use in fine-tuning and batch operations.
Domain MLOps, Fine_Tuning, Data_Preparation
Repository togethercomputer/together-python
Last Updated 2026-02-15 16:00 GMT

Description

File upload handles transferring local dataset files to Together AI's servers where they are stored and made available for fine-tuning job creation. The upload mechanism incorporates several design decisions:

  • Automatic validation -- By default (check=True), the upload method runs the check_file() validation pipeline before initiating any network transfer. If validation fails, a FileTypeError is raised and no data is sent. This prevents wasting bandwidth on files that would be rejected server-side.
  • Adaptive upload strategy -- The SDK automatically routes between two upload paths based on file size:
    • Single-request upload (via UploadManager): Used for files smaller than the multipart threshold (5.0 GB). The entire file is sent in a single HTTP request with redirect handling.
    • Multipart concurrent upload (via MultipartUploadManager): Used for files exceeding the 5.0 GB threshold. The file is split into parts (target 250 MB each, minimum 5 MB per S3 requirements) and uploaded concurrently (up to 4 concurrent parts) with a per-part timeout of 300 seconds.
  • Purpose tagging -- Each uploaded file is tagged with a purpose (default: "fine-tune") that determines how the Together API processes and validates the file on the server side.

The upload returns a FileResponse object containing a unique file ID that is subsequently used when creating fine-tuning jobs.

Usage

Use this principle after preparing and optionally validating a dataset, to make it available for fine-tuning job creation. The typical workflow is:

  1. Prepare the dataset file locally (see Dataset_Preparation principle).
  2. Upload via client.files.upload(file_path).
  3. Use the returned FileResponse.id as the training_file parameter in client.fine_tuning.create().

The upload can be configured with:

  • check=True (default) to enable pre-upload validation.
  • check=False to skip validation (useful if you have already validated the file separately or are uploading for a non-fine-tune purpose).
  • purpose parameter to specify the file's intended use (defaults to "fine-tune").

Theoretical Basis

The adaptive upload strategy follows the common cloud storage pattern of using multipart uploads for large files. This approach provides several benefits:

  • Resilience -- If a single part fails during upload, only that part needs to be retried rather than the entire file.
  • Throughput -- Concurrent part uploads can saturate available bandwidth more effectively than a single sequential upload.
  • S3 compatibility -- The multipart upload follows Amazon S3 conventions (minimum 5 MB part size, maximum 250 parts) since Together's storage layer uses S3-compatible object storage.

The threshold of 5.0 GB is defined in src/together/constants.py as MULTIPART_THRESHOLD_GB. Files below this threshold use a simpler single-request path to avoid the overhead of multipart coordination.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment