Implementation:Arize ai Phoenix RAG Span Helpers

Knowledge Sources	Arize_ai_Phoenix
Domains	AI_Observability, Client_SDK
Last Updated	2026-02-14 05:30 GMT

Overview

The RAG Span Helpers module provides synchronous and asynchronous functions for extracting retrieved documents and Q&A context from OpenTelemetry spans for Retrieval-Augmented Generation (RAG) evaluation.

Description

This module provides four public functions that query Phoenix span data and produce pandas DataFrames formatted for use with the phoenix.evals evaluation framework. It uses the OpenInference semantic conventions for span attributes (SpanAttributes, DocumentAttributes) to identify retriever spans and extract structured data.

get_retrieved_documents() queries retriever spans (where span_kind == 'RETRIEVER') and returns a DataFrame with each row representing a single retrieved document, including the document content, relevance score, metadata, and the input query. The DataFrame uses a multi-index of (context.span_id, document_position).

get_input_output_context() combines data from root spans (Q&A pairs) with concatenated retrieved document content to produce a DataFrame suitable for hallucination and Q&A correctness evaluation. It queries root spans (where parent_id is None) for input/output pairs and retriever spans for document context, then joins them by trace_id. Returns None if no spans or retrieval documents are found.

Both functions have async counterparts (async_get_retrieved_documents() and async_get_input_output_context()) that accept an AsyncClient and use await for the span queries.

All functions support optional time-range filtering via start_time and end_time parameters, and project identification via project_name or project_identifier (falling back to the PHOENIX_PROJECT_NAME environment variable).

Usage

Use get_retrieved_documents() to extract retrieval data for RAG retrieval evaluation (e.g., relevance scoring). Use get_input_output_context() to assemble the full Q&A-with-context DataFrame needed for hallucination detection and Q&A correctness evaluation with phoenix.evals.

Code Reference

Source Location

Repository: Arize_ai_Phoenix
File: packages/phoenix-client/src/phoenix/client/helpers/spans/rag.py
Lines: 428

Signature

def get_retrieved_documents(
    client: Client,
    *,
    start_time: Optional[datetime] = None,
    end_time: Optional[datetime] = None,
    project_name: Optional[str] = None,
    project_identifier: Optional[str] = None,
    timeout: Optional[int] = 5,
) -> pd.DataFrame: ...

def get_input_output_context(
    client: Client,
    *,
    start_time: Optional[datetime] = None,
    end_time: Optional[datetime] = None,
    project_name: Optional[str] = None,
    project_identifier: Optional[str] = None,
    timeout: Optional[int] = 5,
) -> Optional[pd.DataFrame]: ...

async def async_get_retrieved_documents(
    client: AsyncClient,
    *,
    start_time: Optional[datetime] = None,
    end_time: Optional[datetime] = None,
    project_name: Optional[str] = None,
    project_identifier: Optional[str] = None,
    timeout: Optional[int] = 5,
) -> pd.DataFrame: ...

async def async_get_input_output_context(
    client: AsyncClient,
    *,
    start_time: Optional[datetime] = None,
    end_time: Optional[datetime] = None,
    project_name: Optional[str] = None,
    project_identifier: Optional[str] = None,
    timeout: Optional[int] = 5,
) -> Optional[pd.DataFrame]: ...

Import

from phoenix.client.helpers.spans import (
    get_retrieved_documents,
    get_input_output_context,
    async_get_retrieved_documents,
    async_get_input_output_context,
)
# or directly:
from phoenix.client.helpers.spans.rag import (
    get_retrieved_documents,
    get_input_output_context,
)

I/O Contract

`get_retrieved_documents()` / `async_get_retrieved_documents()`

Inputs

Name	Type	Required	Description
client	`Client` / `AsyncClient`	Yes	Phoenix client instance
start_time	`Optional[datetime]`	No	Inclusive lower bound for filtering spans by time
end_time	`Optional[datetime]`	No	Exclusive upper bound for filtering spans by time
project_name	`Optional[str]`	No	Project name (alias for project_identifier); falls back to PHOENIX_PROJECT_NAME env var
project_identifier	`Optional[str]`	No	Project identifier (name or ID); takes precedence over project_name
timeout	`Optional[int]`	No	Request timeout in seconds (default: 5)

Outputs

Name	Type	Description
return	`pd.DataFrame`	DataFrame with multi-index (context.span_id, document_position) and columns: context.trace_id, input, document, document_score, document_metadata

`get_input_output_context()` / `async_get_input_output_context()`

Inputs

Name	Type	Required	Description
client	`Client` / `AsyncClient`	Yes	Phoenix client instance
start_time	`Optional[datetime]`	No	Inclusive lower bound for filtering spans by time
end_time	`Optional[datetime]`	No	Exclusive upper bound for filtering spans by time
project_name	`Optional[str]`	No	Project name; falls back to PHOENIX_PROJECT_NAME env var
project_identifier	`Optional[str]`	No	Project identifier (name or ID); takes precedence over project_name
timeout	`Optional[int]`	No	Request timeout in seconds (default: 5)

Outputs

Name	Type	Description
return	`Optional[pd.DataFrame]`	DataFrame with index context.span_id and columns: context.trace_id, input, output, context, metadata. Returns None if no spans or retrieval documents found

Usage Examples

from phoenix.client import Client
from phoenix.client.helpers.spans import (
    get_retrieved_documents,
    get_input_output_context,
)

client = Client()

# Extract retrieved documents for RAG retrieval evaluation
docs_df = get_retrieved_documents(client, project_name="my-rag-app")
print(docs_df[["input", "document", "document_score"]].head())

# Extract Q&A with context for hallucination/correctness evaluation
qa_df = get_input_output_context(client, project_name="my-rag-app")
if qa_df is not None:
    # Use with phoenix.evals evaluators
    from phoenix.evals import HallucinationEvaluator, QAEvaluator, run_evals

    hallucination, qa_correctness = run_evals(
        evaluators=[
            HallucinationEvaluator(eval_model),
            QAEvaluator(eval_model),
        ],
        dataframe=qa_df,
    )

# With time filtering
from datetime import datetime, timedelta

docs_df = get_retrieved_documents(
    client,
    project_name="my-rag-app",
    start_time=datetime.now() - timedelta(days=1),
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment

Overview

Description

Usage

Code Reference

Source Location

Signature

Import

I/O Contract

get_retrieved_documents() / async_get_retrieved_documents()

Inputs

Outputs

get_input_output_context() / async_get_input_output_context()

Inputs

Outputs

Usage Examples

Related Pages

Page Connections

`get_retrieved_documents()` / `async_get_retrieved_documents()`

`get_input_output_context()` / `async_get_input_output_context()`