Implementation:Hpcaitech ColossalAI AccumulativeMeanMeter
| Knowledge Sources | |
|---|---|
| Domains | Training Utilities, Metrics, RLHF |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Accumulative mean meter utility for tracking running averages of named metrics during training.
Description
This module provides two classes for computing running averages. AccumulativeMeanVariable is a single-variable tracker that maintains a cumulative sum and count, computing the mean on demand. AccumulativeMeanMeter wraps a dictionary of AccumulativeMeanVariable instances, allowing you to track multiple named metrics simultaneously. Both classes support adding values with custom count increments and resetting all tracked statistics.
Usage
Use this utility during ColossalChat training loops to track metrics such as loss, reward, KL divergence, and other scalar values across batches and epochs. It provides a simple API for accumulating and retrieving running averages.
Code Reference
Source Location
- Repository: Hpcaitech_ColossalAI
- File: applications/ColossalChat/coati/utils/accumulative_meter.py
- Lines: 1-69
Signature
class AccumulativeMeanVariable:
def __init__(self):
def add(self, value, count_update=1):
def get(self):
def reset(self):
class AccumulativeMeanMeter:
def __init__(self):
def add(self, name, value, count_update=1):
def get(self, name):
def reset(self):
Import
from coati.utils.accumulative_meter import AccumulativeMeanMeter, AccumulativeMeanVariable
I/O Contract
Inputs (AccumulativeMeanMeter.add)
| Name | Type | Required | Description |
|---|---|---|---|
| name | str | Yes | Name of the metric to track |
| value | float | Yes | Value to add to the accumulator |
| count_update | int | No | Number of samples this value represents, defaults to 1 |
Outputs (AccumulativeMeanMeter.get)
| Name | Type | Description |
|---|---|---|
| return | float | The current accumulative mean of the named metric; 0 if count is 0 |
Usage Examples
from coati.utils.accumulative_meter import AccumulativeMeanMeter
meter = AccumulativeMeanMeter()
# Track loss across batches
for batch in dataloader:
loss = train_step(batch)
meter.add("loss", loss.item())
meter.add("reward", reward.item())
# Retrieve running averages
avg_loss = meter.get("loss")
avg_reward = meter.get("reward")
print(f"Average loss: {avg_loss}, Average reward: {avg_reward}")
# Reset at epoch end
meter.reset()