Workflow:Neuml Txtai Model Training
| Knowledge Sources | |
|---|---|
| Domains | Model_Training, Fine_Tuning, ONNX_Export |
| Last Updated | 2026-02-09 18:00 GMT |
Overview
End-to-end process for fine-tuning Hugging Face Transformer models on custom datasets and optionally exporting them to ONNX format for optimized inference using txtai's training pipelines.
Description
This workflow covers the training and model export capabilities in txtai. The HFTrainer pipeline wraps the Hugging Face Trainer framework to fine-tune models for text classification, question answering, sequence-to-sequence tasks, language modeling, and causal language modeling. It supports quantization via BitsAndBytes and parameter-efficient fine-tuning via LoRA/PEFT. After training, models can be exported to ONNX format using the HFOnnx pipeline for faster inference, or scikit-learn models can be exported via MLOnnx. The training pipeline handles data tokenization, training loop execution, evaluation metrics, checkpointing, and model saving.
Usage
Execute this workflow when you need to adapt a pre-trained transformer model to a domain-specific task. This includes training text classifiers on custom label sets, fine-tuning question-answering models on domain-specific corpora, training sequence-to-sequence models for custom generation tasks, or creating optimized ONNX models for production deployment with reduced latency.
Execution Steps
Step 1: Prepare Training Data
Format the training dataset according to the target task type. For text classification, provide text-label pairs. For question answering, provide question-context-answer triples. For sequence-to-sequence tasks, provide source-target text pairs. For language modeling, provide raw text. Data can be provided as lists, Hugging Face datasets, or file paths.
Key considerations:
- The columns parameter maps dataset columns to the expected format (text, context, label)
- Validation data is optional but recommended for monitoring training quality
- Data is automatically tokenized using the base model's tokenizer
- Maximum sequence length (maxlength) should match the base model's capacity
Step 2: Configure the Base Model
Select a pre-trained model from Hugging Face Hub as the starting point. The task parameter determines which model architecture to load (text-classification, question-answering, sequence-sequence, language-generation, etc.). Optionally configure quantization for memory-efficient training on consumer hardware.
Key considerations:
- The task parameter must match the training data format and intended use
- Quantization (via BitsAndBytes) enables training larger models on limited GPU memory
- LoRA configuration enables parameter-efficient fine-tuning that trains only small adapter matrices
- A pre-existing (model, tokenizer) tuple can be passed instead of a model path
Step 3: Configure Training Arguments
Set the training hyperparameters: learning rate, batch size, number of epochs, output directory, evaluation strategy, and other Hugging Face TrainingArguments. These are passed as keyword arguments to the trainer call.
Key considerations:
- Key parameters include learning_rate, per_device_train_batch_size, num_train_epochs, and output_dir
- Gradient accumulation steps can simulate larger batch sizes on limited memory
- Checkpoint resumption is supported via the checkpoint parameter
- Custom evaluation metrics can be provided via the metrics parameter
Step 4: Execute Training
Call the HFTrainer pipeline with the base model, training data, and configuration. The trainer handles the full training loop: data tokenization, forward/backward passes, gradient updates, evaluation, checkpointing, and final model saving. The trained model and tokenizer are returned as a tuple.
Key considerations:
- Training automatically uses available GPUs with device placement
- The returned tuple contains (model, tokenizer) ready for inference or export
- Training progress and metrics are logged during execution
- The model is saved to the specified output directory
Step 5: Export to ONNX (Optional)
Convert the trained model to ONNX format using the HFOnnx pipeline for optimized inference. ONNX models can run on the ONNX Runtime with hardware-specific optimizations (CPU, GPU, quantized). The export supports various task types including text classification, question answering, and sequence-to-sequence models.
Key considerations:
- ONNX export accepts a model path or a (model, tokenizer) tuple
- The task parameter must match the model's task type for correct export
- Quantization during export further reduces model size and inference latency
- Exported models can be registered with txtai's custom ONNX model classes for seamless integration