Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Arize ai Phoenix RAG Span Helpers

From Leeroopedia
Knowledge Sources
Domains AI_Observability, Client_SDK
Last Updated 2026-02-14 05:30 GMT

Overview

The RAG Span Helpers module provides synchronous and asynchronous functions for extracting retrieved documents and Q&A context from OpenTelemetry spans for Retrieval-Augmented Generation (RAG) evaluation.

Description

This module provides four public functions that query Phoenix span data and produce pandas DataFrames formatted for use with the phoenix.evals evaluation framework. It uses the OpenInference semantic conventions for span attributes (SpanAttributes, DocumentAttributes) to identify retriever spans and extract structured data.

get_retrieved_documents() queries retriever spans (where span_kind == 'RETRIEVER') and returns a DataFrame with each row representing a single retrieved document, including the document content, relevance score, metadata, and the input query. The DataFrame uses a multi-index of (context.span_id, document_position).

get_input_output_context() combines data from root spans (Q&A pairs) with concatenated retrieved document content to produce a DataFrame suitable for hallucination and Q&A correctness evaluation. It queries root spans (where parent_id is None) for input/output pairs and retriever spans for document context, then joins them by trace_id. Returns None if no spans or retrieval documents are found.

Both functions have async counterparts (async_get_retrieved_documents() and async_get_input_output_context()) that accept an AsyncClient and use await for the span queries.

All functions support optional time-range filtering via start_time and end_time parameters, and project identification via project_name or project_identifier (falling back to the PHOENIX_PROJECT_NAME environment variable).

Usage

Use get_retrieved_documents() to extract retrieval data for RAG retrieval evaluation (e.g., relevance scoring). Use get_input_output_context() to assemble the full Q&A-with-context DataFrame needed for hallucination detection and Q&A correctness evaluation with phoenix.evals.

Code Reference

Source Location

Signature

def get_retrieved_documents(
    client: Client,
    *,
    start_time: Optional[datetime] = None,
    end_time: Optional[datetime] = None,
    project_name: Optional[str] = None,
    project_identifier: Optional[str] = None,
    timeout: Optional[int] = 5,
) -> pd.DataFrame: ...

def get_input_output_context(
    client: Client,
    *,
    start_time: Optional[datetime] = None,
    end_time: Optional[datetime] = None,
    project_name: Optional[str] = None,
    project_identifier: Optional[str] = None,
    timeout: Optional[int] = 5,
) -> Optional[pd.DataFrame]: ...

async def async_get_retrieved_documents(
    client: AsyncClient,
    *,
    start_time: Optional[datetime] = None,
    end_time: Optional[datetime] = None,
    project_name: Optional[str] = None,
    project_identifier: Optional[str] = None,
    timeout: Optional[int] = 5,
) -> pd.DataFrame: ...

async def async_get_input_output_context(
    client: AsyncClient,
    *,
    start_time: Optional[datetime] = None,
    end_time: Optional[datetime] = None,
    project_name: Optional[str] = None,
    project_identifier: Optional[str] = None,
    timeout: Optional[int] = 5,
) -> Optional[pd.DataFrame]: ...

Import

from phoenix.client.helpers.spans import (
    get_retrieved_documents,
    get_input_output_context,
    async_get_retrieved_documents,
    async_get_input_output_context,
)
# or directly:
from phoenix.client.helpers.spans.rag import (
    get_retrieved_documents,
    get_input_output_context,
)

I/O Contract

get_retrieved_documents() / async_get_retrieved_documents()

Inputs

Name Type Required Description
client Client / AsyncClient Yes Phoenix client instance
start_time Optional[datetime] No Inclusive lower bound for filtering spans by time
end_time Optional[datetime] No Exclusive upper bound for filtering spans by time
project_name Optional[str] No Project name (alias for project_identifier); falls back to PHOENIX_PROJECT_NAME env var
project_identifier Optional[str] No Project identifier (name or ID); takes precedence over project_name
timeout Optional[int] No Request timeout in seconds (default: 5)

Outputs

Name Type Description
return pd.DataFrame DataFrame with multi-index (context.span_id, document_position) and columns: context.trace_id, input, document, document_score, document_metadata

get_input_output_context() / async_get_input_output_context()

Inputs

Name Type Required Description
client Client / AsyncClient Yes Phoenix client instance
start_time Optional[datetime] No Inclusive lower bound for filtering spans by time
end_time Optional[datetime] No Exclusive upper bound for filtering spans by time
project_name Optional[str] No Project name; falls back to PHOENIX_PROJECT_NAME env var
project_identifier Optional[str] No Project identifier (name or ID); takes precedence over project_name
timeout Optional[int] No Request timeout in seconds (default: 5)

Outputs

Name Type Description
return Optional[pd.DataFrame] DataFrame with index context.span_id and columns: context.trace_id, input, output, context, metadata. Returns None if no spans or retrieval documents found

Usage Examples

from phoenix.client import Client
from phoenix.client.helpers.spans import (
    get_retrieved_documents,
    get_input_output_context,
)

client = Client()

# Extract retrieved documents for RAG retrieval evaluation
docs_df = get_retrieved_documents(client, project_name="my-rag-app")
print(docs_df[["input", "document", "document_score"]].head())

# Extract Q&A with context for hallucination/correctness evaluation
qa_df = get_input_output_context(client, project_name="my-rag-app")
if qa_df is not None:
    # Use with phoenix.evals evaluators
    from phoenix.evals import HallucinationEvaluator, QAEvaluator, run_evals

    hallucination, qa_correctness = run_evals(
        evaluators=[
            HallucinationEvaluator(eval_model),
            QAEvaluator(eval_model),
        ],
        dataframe=qa_df,
    )

# With time filtering
from datetime import datetime, timedelta

docs_df = get_retrieved_documents(
    client,
    project_name="my-rag-app",
    start_time=datetime.now() - timedelta(days=1),
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment