Implementation:Lm sys FastChat Clean Battle Data
| Knowledge Sources | |
|---|---|
| Domains | Model_Evaluation, Data_Cleaning |
| Last Updated | 2026-02-07 06:00 GMT |
Overview
clean_battle_data filters, validates, and deduplicates raw arena battle log files to produce a clean DataFrame suitable for Elo rating computation and statistical analysis.
Description
The clean_battle_data.py module is a critical preprocessing step in the Chatbot Arena analytics pipeline. Raw battle logs collected from the live arena contain noise from various sources: bot traffic, duplicate submissions, banned users, malformed conversations, and invalid model pairings. This module systematically removes these artifacts to produce a reliable dataset for downstream rating computations.
The primary function, clean_battle_data, accepts a list of log file paths along with filtering parameters and returns a pandas DataFrame of validated battle records. The cleaning process applies multiple filters in sequence: it removes battles involving excluded model names, filters out requests from banned IP addresses, deduplicates battles based on conversation content hashes, validates that each battle contains properly formatted conversation turns, and checks for minimum conversation length requirements.
The module supports two operating modes controlled by the mode parameter. In the default mode, it applies standard filtering suitable for general leaderboard computation. An alternative strict mode applies additional constraints for research-quality datasets, such as requiring longer conversations and enforcing stricter deduplication thresholds. The sanitize_ip flag controls whether IP addresses are hashed in the output for privacy compliance.
Usage
Use this module as the first step in any arena data analysis workflow. It should be called before computing Elo ratings, generating leaderboard tables, or performing any statistical analysis on battle data. The cleaned output is consumed by elo_analysis.py, rating_systems.py, and the monitor dashboard.
Code Reference
Source Location
- Repository: Lm_sys_FastChat
- File: fastchat/serve/monitor/clean_battle_data.py
- Lines: 1-423
Signature
def clean_battle_data(
log_files: list[str],
exclude_model_names: list[str] = None,
ban_ip_list: list[str] = None,
sanitize_ip: bool = False,
mode: str = "default",
) -> pd.DataFrame:
"""Clean and validate arena battle log data for Elo rating computation.
Args:
log_files: Paths to raw battle log JSON files.
exclude_model_names: Model names to exclude from results.
ban_ip_list: IP addresses to filter out.
sanitize_ip: Whether to hash IP addresses in output.
mode: Cleaning mode, either "default" or "strict".
Returns:
A pandas DataFrame of cleaned battle records.
"""
...
Import
from fastchat.serve.monitor.clean_battle_data import clean_battle_data
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| log_files | list[str] |
Yes | List of file paths to raw arena battle log JSON files |
| exclude_model_names | list[str] |
No | Model names to exclude from the cleaned dataset (default: None)
|
| ban_ip_list | list[str] |
No | IP addresses to filter out from battle records (default: None)
|
| sanitize_ip | bool |
No | If True, hash IP addresses in the output for privacy (default: False)
|
| mode | str |
No | Cleaning mode: "default" for standard filtering, "strict" for research-quality constraints (default: "default")
|
Outputs
| Name | Type | Description |
|---|---|---|
| battles_df | pd.DataFrame |
Cleaned DataFrame with columns: model_a, model_b, winner, conversation_a, conversation_b, judge, turn, tstamp, ip (hashed if sanitized)
|
Usage Examples
from fastchat.serve.monitor.clean_battle_data import clean_battle_data
# Clean battle data with default settings
log_files = [
"logs/battles_2024_01.json",
"logs/battles_2024_02.json",
]
battles_df = clean_battle_data(log_files)
print(f"Cleaned battles: {len(battles_df)}")
print(battles_df.head())
# Clean with strict mode and IP sanitization
battles_strict = clean_battle_data(
log_files,
exclude_model_names=["deprecated-model-v1"],
ban_ip_list=["192.168.1.100"],
sanitize_ip=True,
mode="strict",
)
print(f"Strict cleaned battles: {len(battles_strict)}")
# Check model distribution in cleaned data
print(battles_strict["model_a"].value_counts().head(10))
Related Pages
- Principle:Lm_sys_FastChat_Battle_Data_Cleaning
- Implements: Principle:Lm_sys_FastChat_Battle_Data_Cleaning
- Environment:Lm_sys_FastChat_GPU_CUDA_Inference
- Lm_sys_FastChat_Elo_Analysis - Consumes cleaned battle data for Elo computation
- Lm_sys_FastChat_Rating_Systems - Statistical rating systems applied to cleaned data
- Lm_sys_FastChat_Monitor_Dashboard - Displays results derived from cleaned data
- Lm_sys_FastChat_Category_Label_Pipeline - Labels cleaned conversations with categories