Workflow:Neuml Txtai Model Training

Knowledge Sources	txtai txtai Trainer Docs txtai ONNX Export Docs Hugging Face Trainer
Domains	Model_Training, Fine_Tuning, ONNX_Export
Last Updated	2026-02-09 18:00 GMT

Overview

End-to-end process for fine-tuning Hugging Face Transformer models on custom datasets and optionally exporting them to ONNX format for optimized inference using txtai's training pipelines.

Description

This workflow covers the training and model export capabilities in txtai. The HFTrainer pipeline wraps the Hugging Face Trainer framework to fine-tune models for text classification, question answering, sequence-to-sequence tasks, language modeling, and causal language modeling. It supports quantization via BitsAndBytes and parameter-efficient fine-tuning via LoRA/PEFT. After training, models can be exported to ONNX format using the HFOnnx pipeline for faster inference, or scikit-learn models can be exported via MLOnnx. The training pipeline handles data tokenization, training loop execution, evaluation metrics, checkpointing, and model saving.

Usage

Execute this workflow when you need to adapt a pre-trained transformer model to a domain-specific task. This includes training text classifiers on custom label sets, fine-tuning question-answering models on domain-specific corpora, training sequence-to-sequence models for custom generation tasks, or creating optimized ONNX models for production deployment with reduced latency.

Execution Steps

Step 1: Prepare Training Data

Format the training dataset according to the target task type. For text classification, provide text-label pairs. For question answering, provide question-context-answer triples. For sequence-to-sequence tasks, provide source-target text pairs. For language modeling, provide raw text. Data can be provided as lists, Hugging Face datasets, or file paths.

Key considerations:

The columns parameter maps dataset columns to the expected format (text, context, label)
Validation data is optional but recommended for monitoring training quality
Data is automatically tokenized using the base model's tokenizer
Maximum sequence length (maxlength) should match the base model's capacity

Step 2: Configure the Base Model

Select a pre-trained model from Hugging Face Hub as the starting point. The task parameter determines which model architecture to load (text-classification, question-answering, sequence-sequence, language-generation, etc.). Optionally configure quantization for memory-efficient training on consumer hardware.

Key considerations:

The task parameter must match the training data format and intended use
Quantization (via BitsAndBytes) enables training larger models on limited GPU memory
LoRA configuration enables parameter-efficient fine-tuning that trains only small adapter matrices
A pre-existing (model, tokenizer) tuple can be passed instead of a model path

Step 3: Configure Training Arguments

Set the training hyperparameters: learning rate, batch size, number of epochs, output directory, evaluation strategy, and other Hugging Face TrainingArguments. These are passed as keyword arguments to the trainer call.

Key considerations:

Key parameters include learning_rate, per_device_train_batch_size, num_train_epochs, and output_dir
Gradient accumulation steps can simulate larger batch sizes on limited memory
Checkpoint resumption is supported via the checkpoint parameter
Custom evaluation metrics can be provided via the metrics parameter

Step 4: Execute Training

Call the HFTrainer pipeline with the base model, training data, and configuration. The trainer handles the full training loop: data tokenization, forward/backward passes, gradient updates, evaluation, checkpointing, and final model saving. The trained model and tokenizer are returned as a tuple.

Key considerations:

Training automatically uses available GPUs with device placement
The returned tuple contains (model, tokenizer) ready for inference or export
Training progress and metrics are logged during execution
The model is saved to the specified output directory

Step 5: Export to ONNX (Optional)

Convert the trained model to ONNX format using the HFOnnx pipeline for optimized inference. ONNX models can run on the ONNX Runtime with hardware-specific optimizations (CPU, GPU, quantized). The export supports various task types including text classification, question answering, and sequence-to-sequence models.

Key considerations:

ONNX export accepts a model path or a (model, tokenizer) tuple
The task parameter must match the model's task type for correct export
Quantization during export further reduces model size and inference latency
Exported models can be registered with txtai's custom ONNX model classes for seamless integration

Execution Diagram

GitHub URL

Workflow Repository