Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Lm sys FastChat Clean Battle Data

From Leeroopedia
Revision as of 15:33, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Lm_sys_FastChat_Clean_Battle_Data.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Model_Evaluation, Data_Cleaning
Last Updated 2026-02-07 06:00 GMT

Overview

clean_battle_data filters, validates, and deduplicates raw arena battle log files to produce a clean DataFrame suitable for Elo rating computation and statistical analysis.

Description

The clean_battle_data.py module is a critical preprocessing step in the Chatbot Arena analytics pipeline. Raw battle logs collected from the live arena contain noise from various sources: bot traffic, duplicate submissions, banned users, malformed conversations, and invalid model pairings. This module systematically removes these artifacts to produce a reliable dataset for downstream rating computations.

The primary function, clean_battle_data, accepts a list of log file paths along with filtering parameters and returns a pandas DataFrame of validated battle records. The cleaning process applies multiple filters in sequence: it removes battles involving excluded model names, filters out requests from banned IP addresses, deduplicates battles based on conversation content hashes, validates that each battle contains properly formatted conversation turns, and checks for minimum conversation length requirements.

The module supports two operating modes controlled by the mode parameter. In the default mode, it applies standard filtering suitable for general leaderboard computation. An alternative strict mode applies additional constraints for research-quality datasets, such as requiring longer conversations and enforcing stricter deduplication thresholds. The sanitize_ip flag controls whether IP addresses are hashed in the output for privacy compliance.

Usage

Use this module as the first step in any arena data analysis workflow. It should be called before computing Elo ratings, generating leaderboard tables, or performing any statistical analysis on battle data. The cleaned output is consumed by elo_analysis.py, rating_systems.py, and the monitor dashboard.

Code Reference

Source Location

Signature

def clean_battle_data(
    log_files: list[str],
    exclude_model_names: list[str] = None,
    ban_ip_list: list[str] = None,
    sanitize_ip: bool = False,
    mode: str = "default",
) -> pd.DataFrame:
    """Clean and validate arena battle log data for Elo rating computation.

    Args:
        log_files: Paths to raw battle log JSON files.
        exclude_model_names: Model names to exclude from results.
        ban_ip_list: IP addresses to filter out.
        sanitize_ip: Whether to hash IP addresses in output.
        mode: Cleaning mode, either "default" or "strict".

    Returns:
        A pandas DataFrame of cleaned battle records.
    """
    ...

Import

from fastchat.serve.monitor.clean_battle_data import clean_battle_data

I/O Contract

Inputs

Name Type Required Description
log_files list[str] Yes List of file paths to raw arena battle log JSON files
exclude_model_names list[str] No Model names to exclude from the cleaned dataset (default: None)
ban_ip_list list[str] No IP addresses to filter out from battle records (default: None)
sanitize_ip bool No If True, hash IP addresses in the output for privacy (default: False)
mode str No Cleaning mode: "default" for standard filtering, "strict" for research-quality constraints (default: "default")

Outputs

Name Type Description
battles_df pd.DataFrame Cleaned DataFrame with columns: model_a, model_b, winner, conversation_a, conversation_b, judge, turn, tstamp, ip (hashed if sanitized)

Usage Examples

from fastchat.serve.monitor.clean_battle_data import clean_battle_data

# Clean battle data with default settings
log_files = [
    "logs/battles_2024_01.json",
    "logs/battles_2024_02.json",
]
battles_df = clean_battle_data(log_files)
print(f"Cleaned battles: {len(battles_df)}")
print(battles_df.head())

# Clean with strict mode and IP sanitization
battles_strict = clean_battle_data(
    log_files,
    exclude_model_names=["deprecated-model-v1"],
    ban_ip_list=["192.168.1.100"],
    sanitize_ip=True,
    mode="strict",
)
print(f"Strict cleaned battles: {len(battles_strict)}")

# Check model distribution in cleaned data
print(battles_strict["model_a"].value_counts().head(10))

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment