Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Shiyu coder Kronos SequentialTrainer Usage

From Leeroopedia


Field Value
Implementation Name SequentialTrainer_Usage
Repository Shiyu_coder_Kronos
Repository URL https://github.com/shiyu-coder/Kronos
Type API Doc
Source File finetune_csv/train_sequential.py
Lines L18-316
Class SequentialTrainer
Implements Principle Principle:Shiyu_coder_Kronos_Sequential_Two_Stage_Training
Dependencies torch, torch.distributed (optional DDP), config_loader.CustomFinetuneConfig, finetune_tokenizer.train_tokenizer, finetune_base_model.train_model
Last Updated 2026-02-09 14:00 GMT

Overview

SequentialTrainer orchestrates the two-phase Kronos finetuning pipeline: first training the tokenizer (VQ-VAE reconstruction), then training the predictor (next-token prediction with frozen tokenizer). It supports skip/resume logic, pretrained or random initialization, and optional DDP distributed training.

API

from train_sequential import SequentialTrainer

# 1. Constructor
trainer = SequentialTrainer(config_path: str = None) -> SequentialTrainer

# 2. Run both phases sequentially
success = trainer.run_training() -> bool

# 3. Run individual phases
success = trainer.train_tokenizer_phase() -> bool
success = trainer.train_basemodel_phase() -> bool

Import

from train_sequential import SequentialTrainer

Constructor Parameters

Parameter Type Default Description
config_path str None Path to YAML config file. Passed to CustomFinetuneConfig. If None, defaults to config.yaml in the same directory.

Instance Attributes

Attribute Type Description
config CustomFinetuneConfig Parsed configuration object
rank int Process rank (from RANK env var, default 0)
world_size int Total processes (from WORLD_SIZE env var, default 1)
local_rank int Local GPU rank (from LOCAL_RANK env var, default config.device_id)
device torch.device Computed device (CUDA or CPU)

Methods

run_training() -> bool

Orchestrates the full two-phase training pipeline:

  1. Calls _setup_distributed() to initialize DDP if applicable
  2. Calls _create_directories() to ensure output paths exist
  3. Calls _check_existing_models() to detect pre-existing checkpoints
  4. If config.train_tokenizer is True, runs train_tokenizer_phase()
  5. If config.train_basemodel is True, runs train_basemodel_phase()
  6. Returns True on success, False on failure

train_tokenizer_phase() -> bool

Executes Phase 1 (tokenizer finetuning):

  1. Checks if tokenizer model already exists and skip_existing is True; if so, returns early
  2. Sets up logging
  3. Loads pretrained tokenizer via KronosTokenizer.from_pretrained() or randomly initializes from architecture config
  4. Moves tokenizer to device
  5. Calls train_tokenizer() (imported from finetune_tokenizer)
  6. Saves best model based on validation loss
  7. Returns True on success

train_basemodel_phase() -> bool

Executes Phase 2 (predictor finetuning):

  1. Validates that finetuned tokenizer exists (if using pretrained tokenizer)
  2. Checks if basemodel already exists and skip_existing is True; if so, returns early
  3. Sets up logging
  4. Loads finetuned tokenizer (from Phase 1 output) or randomly initializes
  5. Loads pretrained predictor via Kronos.from_pretrained() or randomly initializes from architecture config
  6. Moves both models to device
  7. Calls train_model() (imported from finetune_base_model)
  8. Saves best model based on validation loss
  9. Returns True on success

CLI Usage

# Standard usage
python train_sequential.py --config path/to/config.yaml

# Skip tokenizer phase (use existing finetuned tokenizer)
python train_sequential.py --config path/to/config.yaml --skip-tokenizer

# Skip basemodel phase (train only tokenizer)
python train_sequential.py --config path/to/config.yaml --skip-basemodel

# Skip training for phases where models already exist on disk
python train_sequential.py --config path/to/config.yaml --skip-existing

CLI Arguments

Argument Type Default Description
--config str 'config.yaml' Path to YAML configuration file
--skip-tokenizer flag False Skip tokenizer training phase
--skip-basemodel flag False Skip basemodel training phase
--skip-existing flag False Skip training for models that already exist

Output

  • Finetuned tokenizer: Saved to config.tokenizer_best_model_path (e.g., .../tokenizer/best_model)
  • Finetuned predictor: Saved to config.basemodel_best_model_path (e.g., .../basemodel/best_model)
  • Training logs: Written to config.base_save_path/logs/

Key Implementation Details

Random Initialization Support

When pre_trained_tokenizer or pre_trained_predictor is set to False, the trainer reads the architecture configuration from config.json in the pretrained model directory and constructs a fresh model with random weights:

# Example: random tokenizer initialization
cfg_path = os.path.join(config.pretrained_tokenizer_path, 'config.json')
with open(cfg_path, 'r') as f:
    arch = json.load(f)
tokenizer = KronosTokenizer(
    d_in=arch.get('d_in', 6),
    d_model=arch.get('d_model', 256),
    n_heads=arch.get('n_heads', 4),
    # ... additional architecture params
)

DDP Distribution

Distributed training is initialized when WORLD_SIZE > 1 and CUDA is available:

# Automatically detected from environment
self.rank = int(os.environ.get("RANK", "0"))
self.world_size = int(os.environ.get("WORLD_SIZE", "1"))
self.local_rank = int(os.environ.get("LOCAL_RANK", "0"))

# DDP initialization in _setup_distributed()
dist.init_process_group(backend="nccl")

Usage Example (Python API)

from train_sequential import SequentialTrainer

# Create trainer with config
trainer = SequentialTrainer("configs/config_ali09988_candle-5min.yaml")

# Override settings programmatically
trainer.config.train_tokenizer = True
trainer.config.train_basemodel = True
trainer.config.skip_existing = False

# Run full pipeline
success = trainer.run_training()
if success:
    print(f"Tokenizer saved to: {trainer.config.tokenizer_best_model_path}")
    print(f"Predictor saved to: {trainer.config.basemodel_best_model_path}")

See Also

Environment & Heuristic Links

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment