Principle:Arize ai Phoenix Annotation Querying
| Knowledge Sources | |
|---|---|
| Domains | AI Observability, Data Retrieval, Span Evaluation |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
Annotation querying is the practice of retrieving previously recorded span annotations from an observability platform using filters such as span IDs, annotation names, and time ranges.
Description
After annotations have been logged -- whether by human reviewers, LLM judges, or code-based evaluators -- the next step in an observability workflow is to query and retrieve those annotations for analysis, reporting, or downstream processing. Annotation querying provides a structured way to fetch annotations from the server with flexible filtering options:
- By Span IDs: Retrieve annotations for a specific set of spans, typically obtained from a prior span query.
- By Annotation Name: Filter to include or exclude specific annotation dimensions (e.g., only "relevance" annotations, or all annotations except "note").
- By Time Range: Restrict results to annotations created within a specific time window.
- By Project: Scope queries to a particular project identifier (name or ID).
The results can be returned in two formats depending on the use case:
- DataFrame format: Ideal for data analysis workflows where results will be aggregated, pivoted, or joined with span data using pandas.
- List format: Ideal for programmatic consumption where individual annotation objects are processed in application logic.
Both formats use cursor-based pagination internally and batch span IDs into groups (up to 100 per request) to handle large-scale queries efficiently.
Usage
Use annotation querying when:
- Analyzing evaluation results after an automated or manual annotation pipeline has run.
- Building dashboards that display annotation score distributions, label frequencies, or agreement metrics.
- Comparing annotations across different annotator kinds (e.g., human vs. LLM judge agreement).
- Exporting data for downstream training, fine-tuning, or reporting workflows.
- Filtering spans based on annotation outcomes (e.g., finding all spans scored below a quality threshold).
Theoretical Basis
Annotation querying implements a filtered retrieval pattern with server-side pagination. The query model can be expressed as:
Query(
project_identifier: str,
span_ids: set[str]?,
include_annotation_names: list[str]?,
exclude_annotation_names: list[str]?,
limit: int = 1000
) -> list[SpanAnnotation]
The server returns paginated results via a cursor mechanism:
Response = {
"data": [SpanAnnotation, ...],
"next_cursor": str? # null when no more pages
}
The client transparently handles pagination by following next_cursor links until all matching annotations are retrieved. When the number of span IDs exceeds the per-request maximum (100), the client partitions them into batches and issues separate paginated queries for each batch, merging the results.
The exclude_annotation_names parameter defaults to ["note"] to filter out note-type annotations, which are typically added via the UI and are not relevant to programmatic evaluation analysis.
The DataFrame output flattens the nested result object so that label, score, and explanation appear as top-level columns alongside span_id, annotation_name, annotator_kind, metadata, created_at, and updated_at.