Implementation:Truera Trulens Compare Tab

Knowledge Sources	Truera_Trulens TruLens Docs
Domains	Dashboard, Visualization
Last Updated	2026-02-14 08:00 GMT

Overview

Streamlit dashboard tab that enables side-by-side comparison of multiple app versions, including aggregate feedback metrics, per-record difference grids, and detailed trace views for overlapping records.

Description

The Compare Tab module (Compare.py) implements the full comparison workflow for the TruLens dashboard. It allows users to select between 2 and 5 app versions belonging to the same application, fetches records and feedback data for each, and renders:

App Metric Comparison -- a grouped bar chart (and DataFrame) showing mean feedback scores per version, with a variance or diff column highlighting the largest discrepancies.
Overlapping Records -- an interactive AgGrid (or native Streamlit dataframe fallback) that joins records across versions by input text, computes per-metric absolute differences (2 versions) or standard deviations (3+ versions), and sorts by aggregate divergence.
Advanced Filters -- a dynamic form-based filter builder that lets users add comparison clauses (e.g. metric_A of version_1 > metric_A of version_2) to narrow the shared record set.
Record Comparison -- once a row is selected, renders per-record feedback charts, feedback pill selectors with call details, and trace viewers (standard JSON, OTEL spans, or SIS-compatible JSON) side-by-side for each version.

The module uses Streamlit session state extensively to persist selected app IDs, column data caches, and filter results across reruns. Query parameters are also synchronized so that comparison URLs can be shared.

Usage

Use the Compare Tab when you need to evaluate how different versions of the same LLM application differ in feedback metric scores and individual record-level behavior. It is especially useful for A/B testing prompt variations, model upgrades, or pipeline changes. The tab is accessible from the Leaderboard page by selecting 2--5 app versions and clicking the "Compare" button, or by navigating directly to the Compare page with query-parameter-specified app IDs.

Code Reference

Source Location

Repository: Truera_Trulens
File: src/dashboard/trulens/dashboard/tabs/Compare.py
Lines: 1-865

Signature

MAX_COMPARATORS = 5
MIN_COMPARATORS = 2
DEFAULT_COMPARATORS = MIN_COMPARATORS

def init_page_state() -> None: ...

def _preprocess_df(records_df: pd.DataFrame) -> pd.DataFrame: ...

def _feedback_cols_intersect(
    col_data: Dict[str, Dict[str, List[str]]],
) -> List[str]: ...

def _render_all_app_feedback_plot(
    col_data: Dict[str, Dict[str, pd.DataFrame]],
    feedback_cols: List[str],
) -> None: ...

def _highlight_variance(row: pd.Series) -> List[str]: ...

def _render_advanced_filters(
    query_col: pd.DataFrame,
    feedback_cols: List[str],
) -> None: ...

def _build_grid_options(
    df: pd.DataFrame,
    agg_diff_col: str,
    diff_cols: List[str],
    record_id_cols: List[str],
    num_comparators: int,
) -> dict: ...

def _render_grid(
    df: pd.DataFrame,
    agg_diff_col: str,
    diff_cols: List[str],
    record_id_cols: List[str],
    num_comparators: int,
    grid_key: Optional[str] = None,
) -> pd.DataFrame: ...

def _render_shared_records(
    col_data: Dict[str, Dict[str, pd.DataFrame]],
    feedback_cols: List[str],
) -> Optional[pd.DataFrame]: ...

def _lookup_app_version(
    versions_df: pd.DataFrame,
    app_version: Optional[str] = None,
    app_id: Optional[str] = None,
) -> Optional[pd.Series]: ...

def _render_version_selectors(
    app_name: str,
    versions_df: pd.DataFrame,
) -> None: ...

def _reset_page_state() -> None: ...

def render_app_comparison(app_name: str) -> None: ...

def compare_main() -> None: ...

Import

from trulens.dashboard.tabs.Compare import render_app_comparison
from trulens.dashboard.tabs.Compare import compare_main
from trulens.dashboard.tabs.Compare import init_page_state

I/O Contract

Inputs

init_page_state

Name	Type	Required	Description
(none)	--	--	Reads `page_name.app_ids` from session state and query parameters. If no app IDs are present, initializes them to `[None, None]`.

render_app_comparison

Name	Type	Required	Description
app_name	str	yes	The name of the application whose versions will be compared.

_preprocess_df

Name	Type	Required	Description
records_df	pd.DataFrame	yes	DataFrame of records with "input" and "output" columns to be UTF-8 re-encoded.

_feedback_cols_intersect

Name	Type	Required	Description
col_data	Dict[str, Dict[str, List[str]]]	yes	Per-app-id dictionary, each containing a "feedback_cols" list. Returns the intersection of feedback column names across all apps.

_render_all_app_feedback_plot

Name	Type	Required	Description
col_data	Dict[str, Dict[str, pd.DataFrame]]	yes	Per-app-id dictionary with "records" (DataFrame) and "version" (str) keys.
feedback_cols	List[str]	yes	List of feedback column names to plot.

_render_advanced_filters

Name	Type	Required	Description
query_col	pd.DataFrame	yes	Merged DataFrame with per-version feedback columns, indexed by input.
feedback_cols	List[str]	yes	List of base feedback column names available for filtering.

_build_grid_options

Name	Type	Required	Description
df	pd.DataFrame	yes	DataFrame to be displayed in the AgGrid.
agg_diff_col	str	yes	Column name for the aggregate difference/variance metric (e.g. "Mean Diff").
diff_cols	List[str]	yes	Per-feedback difference column names.
record_id_cols	List[str]	yes	Per-version record ID columns to hide.
num_comparators	int	yes	Number of app versions being compared (affects tooltip text).

_render_grid

Name	Type	Required	Description
df	pd.DataFrame	yes	DataFrame to render in the grid.
agg_diff_col	str	yes	Aggregate difference column name.
diff_cols	List[str]	yes	Individual difference column names.
record_id_cols	List[str]	yes	Hidden record ID column names.
num_comparators	int	yes	Number of app versions being compared.
grid_key	Optional[str]	no	Streamlit key for the AgGrid widget. Defaults to None.

_lookup_app_version

Name	Type	Required	Description
versions_df	pd.DataFrame	yes	DataFrame containing app_version and app_id columns.
app_version	Optional[str]	no	App version string to look up. Mutually exclusive with app_id.
app_id	Optional[str]	no	App ID string to look up. Mutually exclusive with app_version.

compare_main

Name	Type	Required	Description
(none)	--	--	Entry point function. Calls set_page_config, init_page_state, render_sidebar, and render_app_comparison.

Outputs

render_app_comparison

Name	Type	Description
(none -- renders to Streamlit)	None	Renders the complete comparison UI to the Streamlit page.

_feedback_cols_intersect

Name	Type	Description
feedback_cols	List[str]	The intersection of feedback column names across all compared app versions.

_render_shared_records

Name	Type	Description
selected_record_ids	Optional[pd.DataFrame]	DataFrame with per-version record_id columns for the user-selected row, or None if no row is selected.

_render_grid

Name	Type	Description
selected_rows	pd.DataFrame	DataFrame of user-selected rows from the grid (may be empty).

_lookup_app_version

Name	Type	Description
row	Optional[pd.Series]	The matching row from versions_df, or None if not found.

Usage Examples

# Example 1: Running the compare tab as a standalone Streamlit page
# (typically invoked as: streamlit run tabs/Compare.py)
from trulens.dashboard.tabs.Compare import compare_main

if __name__ == "__main__":
    compare_main()

# Example 2: Programmatically navigating to the Compare tab from the Leaderboard
import streamlit as st
from trulens.dashboard.constants import COMPARE_PAGE_NAME

# After selecting app versions on the Leaderboard tab:
selected_app_ids = ["app_id_v1", "app_id_v2", "app_id_v3"]
st.session_state[f"{COMPARE_PAGE_NAME}.app_ids"] = selected_app_ids
st.switch_page("tabs/Compare.py")

# Example 3: Rendering the comparison view within a custom Streamlit layout
from trulens.dashboard.tabs.Compare import init_page_state, render_app_comparison

init_page_state()
render_app_comparison(app_name="my_chatbot_app")

Related Pages

Environment:Truera_Trulens_Streamlit_Dashboard_Environment
Environment:Truera_Trulens_Python_Core_Environment
Truera_Trulens_Leaderboard_Tab -- the Leaderboard page from which comparisons are typically initiated.
Truera_Trulens_Records_Tab -- the Records page for examining individual records.
Truera_Trulens_Display_Utils -- shared display utility functions used for rendering feedback results.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment