Principle:Arize ai Phoenix Batch Span Annotation
| Knowledge Sources | |
|---|---|
| Domains | AI Observability, Batch Processing, Span Evaluation |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
Batch span annotation is the practice of submitting multiple structured quality assessments to an observability platform in a single operation, enabling efficient annotation of spans at scale.
Description
While individual span annotation is suitable for one-at-a-time human review or real-time evaluation callbacks, many practical workflows require annotating hundreds or thousands of spans in a single operation. Batch span annotation addresses this need by accepting collections of annotations -- either as typed dictionaries or as pandas DataFrames -- and submitting them to the server in optimized chunks.
Key characteristics of batch annotation include:
- Bulk Submission: Multiple annotations are sent in a single HTTP request, reducing network overhead compared to individual calls.
- Chunked Processing: Large collections (especially DataFrames) are automatically split into manageable chunks (typically 100 rows) to avoid exceeding server limits and to provide incremental progress.
- Flexible Input Formats: Annotations can be provided as iterables of typed dictionaries (
SpanAnnotationData) for programmatic pipelines, or as pandas DataFrames for data-science-oriented workflows. - Synchronous and Asynchronous Modes: Batch operations support both fire-and-forget (async) and synchronous modes, where the latter returns the IDs of all inserted annotations.
- Document-Level Annotations: Beyond span-level annotations, batch operations also support annotating individual retrieved documents within a span (e.g., scoring each document in a RAG retrieval step).
Usage
Use batch span annotation when:
- Running automated evaluation pipelines that produce annotations for an entire project or time window at once.
- Importing pre-computed evaluation results from offline analysis (e.g., LLM judge scores computed in a notebook).
- Processing DataFrames that combine span IDs with evaluation scores, labels, and explanations.
- Annotating retrieved documents within retrieval spans to assess individual document relevance.
- Migrating or backfilling annotations from an external system into Phoenix.
Theoretical Basis
Batch annotation follows the bulk write pattern common in database and API design. The core data structure is:
SpanAnnotationData = {
"span_id": str, # Target span
"name": str, # Annotation dimension
"annotator_kind": str, # "HUMAN" | "LLM" | "CODE"
"result": { # Assessment values
"label": str?,
"score": float?,
"explanation": str?
},
"metadata": dict?, # Optional key-value metadata
"identifier": str? # Optional dedup key
}
The server processes each annotation using the same upsert semantics as individual annotations: the composite key (span_id, name, identifier) determines whether to insert a new record or update an existing one.
For DataFrame inputs, the mapping from DataFrame columns to SpanAnnotationData fields is:
| DataFrame Column | SpanAnnotationData Field | Notes |
|---|---|---|
span_id (column or index) |
span_id |
Required; can come from column or DataFrame index |
name or annotation_name |
name |
Required unless global annotation_name is provided
|
annotator_kind |
annotator_kind |
Required unless global annotator_kind is provided
|
label |
result.label |
Optional |
score |
result.score |
Optional |
explanation |
result.explanation |
Optional |
metadata |
metadata |
Optional |
identifier |
identifier |
Optional |
The chunking strategy (100 rows per request) balances throughput against server memory pressure and request timeout constraints.