Implementation:Arize ai Phoenix RAG Span Helpers
| Knowledge Sources | |
|---|---|
| Domains | AI_Observability, Client_SDK |
| Last Updated | 2026-02-14 05:30 GMT |
Overview
The RAG Span Helpers module provides synchronous and asynchronous functions for extracting retrieved documents and Q&A context from OpenTelemetry spans for Retrieval-Augmented Generation (RAG) evaluation.
Description
This module provides four public functions that query Phoenix span data and produce pandas DataFrames formatted for use with the phoenix.evals evaluation framework. It uses the OpenInference semantic conventions for span attributes (SpanAttributes, DocumentAttributes) to identify retriever spans and extract structured data.
get_retrieved_documents() queries retriever spans (where span_kind == 'RETRIEVER') and returns a DataFrame with each row representing a single retrieved document, including the document content, relevance score, metadata, and the input query. The DataFrame uses a multi-index of (context.span_id, document_position).
get_input_output_context() combines data from root spans (Q&A pairs) with concatenated retrieved document content to produce a DataFrame suitable for hallucination and Q&A correctness evaluation. It queries root spans (where parent_id is None) for input/output pairs and retriever spans for document context, then joins them by trace_id. Returns None if no spans or retrieval documents are found.
Both functions have async counterparts (async_get_retrieved_documents() and async_get_input_output_context()) that accept an AsyncClient and use await for the span queries.
All functions support optional time-range filtering via start_time and end_time parameters, and project identification via project_name or project_identifier (falling back to the PHOENIX_PROJECT_NAME environment variable).
Usage
Use get_retrieved_documents() to extract retrieval data for RAG retrieval evaluation (e.g., relevance scoring). Use get_input_output_context() to assemble the full Q&A-with-context DataFrame needed for hallucination detection and Q&A correctness evaluation with phoenix.evals.
Code Reference
Source Location
- Repository: Arize_ai_Phoenix
- File: packages/phoenix-client/src/phoenix/client/helpers/spans/rag.py
- Lines: 428
Signature
def get_retrieved_documents(
client: Client,
*,
start_time: Optional[datetime] = None,
end_time: Optional[datetime] = None,
project_name: Optional[str] = None,
project_identifier: Optional[str] = None,
timeout: Optional[int] = 5,
) -> pd.DataFrame: ...
def get_input_output_context(
client: Client,
*,
start_time: Optional[datetime] = None,
end_time: Optional[datetime] = None,
project_name: Optional[str] = None,
project_identifier: Optional[str] = None,
timeout: Optional[int] = 5,
) -> Optional[pd.DataFrame]: ...
async def async_get_retrieved_documents(
client: AsyncClient,
*,
start_time: Optional[datetime] = None,
end_time: Optional[datetime] = None,
project_name: Optional[str] = None,
project_identifier: Optional[str] = None,
timeout: Optional[int] = 5,
) -> pd.DataFrame: ...
async def async_get_input_output_context(
client: AsyncClient,
*,
start_time: Optional[datetime] = None,
end_time: Optional[datetime] = None,
project_name: Optional[str] = None,
project_identifier: Optional[str] = None,
timeout: Optional[int] = 5,
) -> Optional[pd.DataFrame]: ...
Import
from phoenix.client.helpers.spans import (
get_retrieved_documents,
get_input_output_context,
async_get_retrieved_documents,
async_get_input_output_context,
)
# or directly:
from phoenix.client.helpers.spans.rag import (
get_retrieved_documents,
get_input_output_context,
)
I/O Contract
get_retrieved_documents() / async_get_retrieved_documents()
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| client | Client / AsyncClient |
Yes | Phoenix client instance |
| start_time | Optional[datetime] |
No | Inclusive lower bound for filtering spans by time |
| end_time | Optional[datetime] |
No | Exclusive upper bound for filtering spans by time |
| project_name | Optional[str] |
No | Project name (alias for project_identifier); falls back to PHOENIX_PROJECT_NAME env var |
| project_identifier | Optional[str] |
No | Project identifier (name or ID); takes precedence over project_name |
| timeout | Optional[int] |
No | Request timeout in seconds (default: 5) |
Outputs
| Name | Type | Description |
|---|---|---|
| return | pd.DataFrame |
DataFrame with multi-index (context.span_id, document_position) and columns: context.trace_id, input, document, document_score, document_metadata |
get_input_output_context() / async_get_input_output_context()
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| client | Client / AsyncClient |
Yes | Phoenix client instance |
| start_time | Optional[datetime] |
No | Inclusive lower bound for filtering spans by time |
| end_time | Optional[datetime] |
No | Exclusive upper bound for filtering spans by time |
| project_name | Optional[str] |
No | Project name; falls back to PHOENIX_PROJECT_NAME env var |
| project_identifier | Optional[str] |
No | Project identifier (name or ID); takes precedence over project_name |
| timeout | Optional[int] |
No | Request timeout in seconds (default: 5) |
Outputs
| Name | Type | Description |
|---|---|---|
| return | Optional[pd.DataFrame] |
DataFrame with index context.span_id and columns: context.trace_id, input, output, context, metadata. Returns None if no spans or retrieval documents found |
Usage Examples
from phoenix.client import Client
from phoenix.client.helpers.spans import (
get_retrieved_documents,
get_input_output_context,
)
client = Client()
# Extract retrieved documents for RAG retrieval evaluation
docs_df = get_retrieved_documents(client, project_name="my-rag-app")
print(docs_df[["input", "document", "document_score"]].head())
# Extract Q&A with context for hallucination/correctness evaluation
qa_df = get_input_output_context(client, project_name="my-rag-app")
if qa_df is not None:
# Use with phoenix.evals evaluators
from phoenix.evals import HallucinationEvaluator, QAEvaluator, run_evals
hallucination, qa_correctness = run_evals(
evaluators=[
HallucinationEvaluator(eval_model),
QAEvaluator(eval_model),
],
dataframe=qa_df,
)
# With time filtering
from datetime import datetime, timedelta
docs_df = get_retrieved_documents(
client,
project_name="my-rag-app",
start_time=datetime.now() - timedelta(days=1),
)