Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Neuml Txtai HFTrainer Parse

From Leeroopedia


Knowledge Sources
Domains Training, NLP
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for configuring training hyperparameters provided by the txtai library. This method wraps HuggingFace TrainingArguments with sensible defaults optimized for txtai's typical fine-tuning workflow.

Description

HFTrainer.parse() merges a dictionary of user-provided training argument overrides with a set of txtai-specific defaults, then returns a TrainingArguments instance. The defaults are designed for ephemeral, in-memory training runs where the model is used immediately rather than saved to disk.

txtai also defines a custom TrainingArguments subclass (at line 384-399 of the same file) that extends the standard HuggingFace class. This subclass overrides the should_save property to return False when output_dir is empty, preventing the Trainer from attempting to write checkpoint files during transient training.

Defaults applied by parse():

  • output_dir="" -- empty string; no directory created for saving.
  • save_strategy="no" -- disables periodic checkpoint saving.
  • report_to="none" -- disables experiment tracker integrations.
  • log_level="warning" -- suppresses verbose training output.
  • use_cpu=True/False -- automatically set based on GPU/accelerator availability via Models.hasaccelerator().

Usage

This method is called internally as the first step of HFTrainer.__call__(). It can also be called directly to inspect or debug the resulting TrainingArguments before starting training.

Code Reference

Source Location

  • Repository: txtai
  • File: src/python/txtai/pipeline/train/hftrainer.py (Lines 146-163 for parse(), Lines 384-399 for TrainingArguments subclass)

Signature

def parse(self, updates):
    """
    Parses and merges custom arguments with defaults.

    Args:
        updates: custom arguments dict

    Returns:
        TrainingArguments
    """

The custom TrainingArguments subclass:

class TrainingArguments(HFTrainingArguments):
    """
    Extends standard TrainingArguments to make the output directory optional
    for transient models.
    """

    @property
    def should_save(self):
        return super().should_save if self.output_dir else False

Import

from txtai.pipeline import HFTrainer

I/O Contract

Inputs

Name Type Required Description
updates dict Yes Dictionary of training argument overrides. Keys must match valid HFTrainingArguments field names. Common keys include num_train_epochs, learning_rate, per_device_train_batch_size, output_dir, fp16, bf16, warmup_steps, weight_decay, seed, etc.

Outputs

Name Type Description
args TrainingArguments A txtai TrainingArguments instance (subclass of HFTrainingArguments) with defaults merged with user overrides. The should_save property returns False when output_dir is empty.

Usage Examples

Basic Example: Use Defaults

from txtai.pipeline import HFTrainer

trainer = HFTrainer()

# No overrides -- all defaults applied
args = trainer.parse({})
print(args.output_dir)      # ""
print(args.save_strategy)   # "no"
print(args.report_to)       # ["none"]

Override Epochs and Learning Rate

from txtai.pipeline import HFTrainer

trainer = HFTrainer()

args = trainer.parse({
    "num_train_epochs": 5,
    "learning_rate": 3e-5,
    "per_device_train_batch_size": 16,
})
print(args.num_train_epochs)              # 5
print(args.learning_rate)                 # 3e-05
print(args.per_device_train_batch_size)   # 16
print(args.save_strategy)                 # "no" (default preserved)

Enable Saving and Logging

from txtai.pipeline import HFTrainer

trainer = HFTrainer()

args = trainer.parse({
    "output_dir": "./my-model",
    "save_strategy": "epoch",
    "report_to": "tensorboard",
    "log_level": "info",
    "num_train_epochs": 3,
})
print(args.should_save)  # True (output_dir is non-empty)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment