Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Axolotl ai cloud Axolotl DPO SwanLab Full Featured Config

From Leeroopedia


Knowledge Sources
Domains Training, Experiment_Tracking, RLHF
Last Updated 2026-02-07 00:00 GMT

Overview

Production-grade YAML configuration example demonstrating all SwanLab integration features combined with DPO (Direct Preference Optimization) training.

Description

The dpo-swanlab-full-featured.yml is a comprehensive reference configuration that combines DPO preference alignment training with the full suite of SwanLab experiment tracking features. It configures Llama-3-8B-Instruct with 8-bit LoRA for DPO training, and demonstrates: (1) basic SwanLab cloud experiment tracking with project/experiment naming, (2) team workspace collaboration, (3) RLHF completion table logging for qualitative analysis of chosen vs rejected responses, (4) Lark/Feishu team notifications with HMAC authentication, (5) performance profiling metrics under the profiling/ namespace, and (6) optional private deployment and model checkpointing settings. The file also serves as operational documentation with sections on expected training behavior, production checklists, and troubleshooting guides.

Usage

Use this configuration as a reference template when setting up production DPO training with SwanLab monitoring. Copy and modify the SwanLab-specific sections (use_swanlab, swanlab_project, swanlab_log_completions, Lark webhook settings) into your own training configs. Set SWANLAB_API_KEY and SWANLAB_LARK_WEBHOOK_URL as environment variables rather than in the config file.

Code Reference

Source Location

Signature

# Key SwanLab integration fields:
plugins:
  - axolotl.integrations.swanlab.SwanLabPlugin

use_swanlab: true
swanlab_project: dpo-production
swanlab_experiment_name: llama-3-dpo-full-featured-v1
swanlab_mode: cloud
swanlab_workspace: ml-research-team
swanlab_log_completions: true
swanlab_completion_log_interval: 100
swanlab_completion_max_buffer: 256
swanlab_log_model: false

Import

# Run with Axolotl CLI:
export SWANLAB_API_KEY=your-api-key
accelerate launch -m axolotl.cli.train examples/swanlab/dpo-swanlab-full-featured.yml

I/O Contract

Inputs

Name Type Required Description
SWANLAB_API_KEY str (env var) Yes SwanLab authentication token
SWANLAB_LARK_WEBHOOK_URL str (env var) No Lark bot webhook for team notifications
SWANLAB_LARK_SECRET str (env var) No HMAC secret for Lark webhook authentication
DPO dataset HuggingFace dataset Yes Preference pairs with chosen/rejected fields

Outputs

Name Type Description
Trained model Model files Saved to output_dir with LoRA adapters
SwanLab dashboard Web UI Training metrics, completion tables, profiling data
Lark notifications Messages Training start/complete/error notifications to team chat

Usage Examples

Run Full Featured DPO Training

# Set required environment variables
export SWANLAB_API_KEY=your-api-key
export SWANLAB_LARK_WEBHOOK_URL=https://open.feishu.cn/open-apis/bot/v2/hook/xxx
export SWANLAB_LARK_SECRET=your-webhook-secret

# Launch training
accelerate launch -m axolotl.cli.train examples/swanlab/dpo-swanlab-full-featured.yml

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment