Principle:Togethercomputer Together python Fine Tuning Job Creation
| Attribute | Value |
|---|---|
| Principle Name | Fine_Tuning_Job_Creation |
| Overview | Mechanism for initiating a fine-tuning job on Together AI's infrastructure using uploaded training data. |
| Domain | MLOps, Fine_Tuning |
| Repository | togethercomputer/together-python |
| Last Updated | 2026-02-15 16:00 GMT |
Description
Fine-tuning job creation submits a training configuration to Together AI's servers, instructing the platform to train a model on the specified dataset. The configuration encompasses multiple dimensions:
Base Model and Starting Point
A job must specify either a base model (via model) or a previous checkpoint (via from_checkpoint) to continue training from, but not both. Additionally, a Hugging Face Hub model can be specified (via from_hf_model) as the starting weights, which requires that a compatible model is also provided.
Training Method
Two training methods are supported:
- SFT (Supervised Fine-Tuning, default) -- Standard supervised learning on input-output pairs. Supports the
train_on_inputsparameter which controls whether user messages/prompts are masked during loss computation. When set to"auto", masking is determined by data format: conversational and instruction formats mask inputs, while general text format trains on all tokens. - DPO (Direct Preference Optimization) -- Preference-based training using preferred and non-preferred outputs. Supports additional parameters:
dpo_beta(regularization strength),dpo_normalize_logratios_by_length(length normalization),rpo_alpha(NLL loss inclusion), andsimpo_gamma(SimPO variant, which enables reference-free training and length normalization).
Hyperparameters
The job configuration includes:
- Training schedule:
n_epochs,batch_size(integer or"max"),learning_rate,warmup_ratio. - Learning rate scheduler:
lr_scheduler_type("cosine"or"linear"),min_lr_ratio,scheduler_num_cycles(for cosine). - Regularization:
max_grad_norm,weight_decay. - Checkpointing:
n_checkpoints,n_evals. - Validation:
validation_file,n_evals.
LoRA Configuration
LoRA (Low-Rank Adaptation) is enabled by default (lora=True). When enabled:
lora_r-- Rank of LoRA adapters (defaults to model's maximum rank).lora_alpha-- Scaling factor (defaults to2 * lora_r).lora_dropout-- Dropout rate (must be in [0, 1) range).lora_trainable_modules-- Which modules to apply LoRA to (default:"all-linear").
Full training is used when lora=False, but is not supported by all models.
Integration Options
- Weights & Biases:
wandb_api_key,wandb_base_url,wandb_project_name,wandb_namefor logging training metrics. - Hugging Face Hub:
from_hf_model,hf_model_revision,hf_api_tokenfor loading starting weights;hf_output_repo_namefor pushing the fine-tuned model. - Multimodal:
train_visionfor training the vision encoder in multimodal models.
Price Estimation
Before submitting the job, the SDK automatically estimates the training cost and warns if the estimated price significantly exceeds the user's available credits. This estimation is skipped when continuing from a checkpoint or using a Hugging Face model.
Usage
Use this after uploading a training file. The typical workflow is:
- Upload the training file via
client.files.upload()to obtain a file ID. - Call
client.fine_tuning.create(training_file=file_id, model="model-name", ...). - Use the returned job ID to monitor progress via
client.fine_tuning.retrieve()andclient.fine_tuning.list_events().
The method automatically fetches model training limits from the API to validate hyperparameters against model-specific constraints (batch size limits, LoRA support, vision support).
Theoretical Basis
Fine-tuning adapts a pre-trained language model to a specific task by continuing the training process on a targeted dataset. The two supported methods represent different optimization objectives:
- SFT minimizes the standard cross-entropy loss between the model's predictions and the target tokens, optionally masking input tokens to focus the loss on output generation.
- DPO directly optimizes a preference model from pairs of preferred and non-preferred outputs, without requiring a separate reward model. The
dpo_betaparameter controls the strength of the KL divergence constraint from the reference model.
LoRA reduces the number of trainable parameters by decomposing weight updates into low-rank matrices, enabling fine-tuning of large models with significantly less GPU memory and compute. The rank (lora_r) determines the expressiveness of the adaptation, while lora_alpha scales the magnitude of the updates.
Related Pages
- Implementation:Togethercomputer_Together_python_FineTuning_Create
- Principle:Togethercomputer_Together_python_File_Upload
- Principle:Togethercomputer_Together_python_Fine_Tuning_Job_Monitoring
- Principle:Togethercomputer_Together_python_Model_Download
- Heuristic:Togethercomputer_Together_python_Fine_Tuning_Parameter_Validation