Principle:Lm sys FastChat Arena Usage Statistics
| Field | Value |
|---|---|
| Page Type | Principle |
| Title | Arena Usage Statistics |
| Repository | lm-sys/FastChat |
| Workflow | Arena_Data_Analysis |
| Domains | Data_Analysis, Statistics |
| Knowledge Sources | fastchat/serve/monitor/basic_stats.py |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
This principle governs the computation and reporting of aggregate usage statistics from Chatbot Arena battle logs and general chat logs. By systematically summarizing platform engagement metrics -- total battles, unique users, model usage frequencies, time-series trends, and geographic distribution -- this principle provides the analytical foundation for understanding how the Arena is used, which models attract the most traffic, and how usage patterns evolve over time.
Description
Battle and Conversation Counting
The most fundamental statistic is the total count of battles (pairwise model comparisons where a user votes for a winner) and chat conversations. These raw counts serve as the primary measure of platform scale and are typically segmented by time period (daily, weekly, monthly) and by model pair. Counting must account for incomplete battles (where a user navigated away before voting) and distinguish between anonymous and authenticated sessions.
Unique User Estimation
Unique user counts are derived from session identifiers or hashed IP addresses recorded in the battle logs. Because Arena users do not always authenticate, uniqueness estimation relies on a combination of browser fingerprints and IP-based heuristics. Care must be taken to avoid double-counting users who access the platform from multiple devices or networks, while also respecting privacy constraints by never storing raw IP addresses in published statistics.
Model Usage Frequencies
Per-model usage frequencies record how often each model appears in battles and conversations. These frequencies inform capacity planning decisions (allocating more GPU resources to high-demand models) and reveal user preferences. Frequency tables are typically presented as both absolute counts and relative proportions, with optional filtering by time window to highlight trending or declining models.
Time-Series Aggregation
Raw event logs are aggregated into time-series data at configurable granularities (hourly, daily, weekly). Time-series representations enable trend analysis, anomaly detection (e.g., traffic spikes from media coverage), and seasonality identification (e.g., lower weekend usage). Standard aggregation methods include simple binning by timestamp and rolling window averages for smoothed trend lines.
Geographic Distribution
When IP metadata is available, geographic distribution statistics summarize user locations at the country or region level. These statistics help the platform team understand the global reach of the Arena, identify regions with poor model latency (informing infrastructure decisions), and ensure that evaluation data represents diverse user populations.
Theoretical Basis
Descriptive statistics provide the foundational summary of platform engagement metrics. Measures of central tendency (mean battles per day) and dispersion (variance in daily usage) characterize the typical and extreme operating conditions of the platform. Time-series decomposition -- separating observed usage into trend, seasonal, and residual components -- reveals underlying growth trajectories and cyclical patterns that simple aggregate counts obscure. Per-model usage frequencies, when viewed as empirical probability distributions, inform model allocation and capacity planning through principles of queueing theory: models with higher arrival rates require proportionally more serving capacity to maintain acceptable latency. Geographic aggregation relies on the assumption that IP geolocation databases provide sufficiently accurate country-level resolution for infrastructure planning purposes, even though individual-level accuracy may be limited.