Principle:Togethercomputer Together python File Upload
| Attribute | Value |
|---|---|
| Principle Name | File_Upload |
| Overview | Mechanism for uploading local files to Together AI's cloud storage for use in fine-tuning and batch operations. |
| Domain | MLOps, Fine_Tuning, Data_Preparation |
| Repository | togethercomputer/together-python |
| Last Updated | 2026-02-15 16:00 GMT |
Description
File upload handles transferring local dataset files to Together AI's servers where they are stored and made available for fine-tuning job creation. The upload mechanism incorporates several design decisions:
- Automatic validation -- By default (
check=True), the upload method runs thecheck_file()validation pipeline before initiating any network transfer. If validation fails, aFileTypeErroris raised and no data is sent. This prevents wasting bandwidth on files that would be rejected server-side.
- Adaptive upload strategy -- The SDK automatically routes between two upload paths based on file size:
- Single-request upload (via
UploadManager): Used for files smaller than the multipart threshold (5.0 GB). The entire file is sent in a single HTTP request with redirect handling. - Multipart concurrent upload (via
MultipartUploadManager): Used for files exceeding the 5.0 GB threshold. The file is split into parts (target 250 MB each, minimum 5 MB per S3 requirements) and uploaded concurrently (up to 4 concurrent parts) with a per-part timeout of 300 seconds.
- Single-request upload (via
- Purpose tagging -- Each uploaded file is tagged with a purpose (default:
"fine-tune") that determines how the Together API processes and validates the file on the server side.
The upload returns a FileResponse object containing a unique file ID that is subsequently used when creating fine-tuning jobs.
Usage
Use this principle after preparing and optionally validating a dataset, to make it available for fine-tuning job creation. The typical workflow is:
- Prepare the dataset file locally (see Dataset_Preparation principle).
- Upload via
client.files.upload(file_path). - Use the returned
FileResponse.idas thetraining_fileparameter inclient.fine_tuning.create().
The upload can be configured with:
check=True(default) to enable pre-upload validation.check=Falseto skip validation (useful if you have already validated the file separately or are uploading for a non-fine-tune purpose).purposeparameter to specify the file's intended use (defaults to"fine-tune").
Theoretical Basis
The adaptive upload strategy follows the common cloud storage pattern of using multipart uploads for large files. This approach provides several benefits:
- Resilience -- If a single part fails during upload, only that part needs to be retried rather than the entire file.
- Throughput -- Concurrent part uploads can saturate available bandwidth more effectively than a single sequential upload.
- S3 compatibility -- The multipart upload follows Amazon S3 conventions (minimum 5 MB part size, maximum 250 parts) since Together's storage layer uses S3-compatible object storage.
The threshold of 5.0 GB is defined in src/together/constants.py as MULTIPART_THRESHOLD_GB. Files below this threshold use a simpler single-request path to avoid the overhead of multipart coordination.