Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Arize ai Phoenix Annotation Querying

From Leeroopedia
Knowledge Sources
Domains AI Observability, Data Retrieval, Span Evaluation
Last Updated 2026-02-14 00:00 GMT

Overview

Annotation querying is the practice of retrieving previously recorded span annotations from an observability platform using filters such as span IDs, annotation names, and time ranges.

Description

After annotations have been logged -- whether by human reviewers, LLM judges, or code-based evaluators -- the next step in an observability workflow is to query and retrieve those annotations for analysis, reporting, or downstream processing. Annotation querying provides a structured way to fetch annotations from the server with flexible filtering options:

  • By Span IDs: Retrieve annotations for a specific set of spans, typically obtained from a prior span query.
  • By Annotation Name: Filter to include or exclude specific annotation dimensions (e.g., only "relevance" annotations, or all annotations except "note").
  • By Time Range: Restrict results to annotations created within a specific time window.
  • By Project: Scope queries to a particular project identifier (name or ID).

The results can be returned in two formats depending on the use case:

  • DataFrame format: Ideal for data analysis workflows where results will be aggregated, pivoted, or joined with span data using pandas.
  • List format: Ideal for programmatic consumption where individual annotation objects are processed in application logic.

Both formats use cursor-based pagination internally and batch span IDs into groups (up to 100 per request) to handle large-scale queries efficiently.

Usage

Use annotation querying when:

  • Analyzing evaluation results after an automated or manual annotation pipeline has run.
  • Building dashboards that display annotation score distributions, label frequencies, or agreement metrics.
  • Comparing annotations across different annotator kinds (e.g., human vs. LLM judge agreement).
  • Exporting data for downstream training, fine-tuning, or reporting workflows.
  • Filtering spans based on annotation outcomes (e.g., finding all spans scored below a quality threshold).

Theoretical Basis

Annotation querying implements a filtered retrieval pattern with server-side pagination. The query model can be expressed as:

Query(
    project_identifier: str,
    span_ids: set[str]?,
    include_annotation_names: list[str]?,
    exclude_annotation_names: list[str]?,
    limit: int = 1000
) -> list[SpanAnnotation]

The server returns paginated results via a cursor mechanism:

Response = {
    "data": [SpanAnnotation, ...],
    "next_cursor": str?  # null when no more pages
}

The client transparently handles pagination by following next_cursor links until all matching annotations are retrieved. When the number of span IDs exceeds the per-request maximum (100), the client partitions them into batches and issues separate paginated queries for each batch, merging the results.

The exclude_annotation_names parameter defaults to ["note"] to filter out note-type annotations, which are typically added via the UI and are not relevant to programmatic evaluation analysis.

The DataFrame output flattens the nested result object so that label, score, and explanation appear as top-level columns alongside span_id, annotation_name, annotator_kind, metadata, created_at, and updated_at.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment