Implementation:Evidentlyai Evidently Legacy OpenAI Feature
| Knowledge Sources | |
|---|---|
| Domains | ML Monitoring, LLM Evaluation, OpenAI Integration |
| Last Updated | 2026-02-14 12:00 GMT |
Overview
Provides a generated feature that uses OpenAI models to evaluate text data via customizable prompts, supporting both legacy completion and chat completion APIs with configurable response post-processing.
Description
The OpenAIFeature class extends both FeatureTypeFieldMixin and GeneratedFeature to produce features by sending text data to OpenAI models with custom prompts. It supports two API modes based on the model:
- Legacy completions API (for models: gpt-3.5-turbo-instruct, babbage-002, davinci-002): Uses client.completions.create with a single formatted prompt string.
- Chat completions API (all other models): Uses client.chat.completions.create with a system message (context) and a user message (formatted prompt).
Key features of the class:
- Prompt templating: The prompt string contains a prompt_replace_string (default: "REPLACE") that is substituted with the actual text value, and a context_replace_string (default: "CONTEXT") that is substituted with optional context.
- Context support: Context can be provided as a static string (context) or from a DataFrame column (context_column). These are mutually exclusive.
- Response post-processing: The _postprocess_response function processes the LLM's response based on check_mode and possible_values. It supports modes like "any_line" (checks each line) and "contains" (checks if a possible value is contained in the line). If possible_values is set, the response is matched against these values.
- Feature type handling: For categorical features, the post-processed string is returned directly. For numerical features, the post-processed response is cast to a float, with None returned on failure.
- Unique feature IDs: Each instance generates a unique feature_id via new_id() to ensure column name uniqueness.
Usage
Use this feature when you need to evaluate or classify text data using OpenAI models within Evidently monitoring pipelines. This is a legacy feature class; for newer implementations, consider using LLMJudge which provides a more structured template-based approach.
Code Reference
Source Location
- Repository: Evidentlyai_Evidently
- File: src/evidently/legacy/features/openai_feature.py
Signature
class OpenAIFeature(FeatureTypeFieldMixin, GeneratedFeature):
class Config:
type_alias = "evidently:feature:OpenAIFeature"
column_name: str
feature_id: str
prompt: str
prompt_replace_string: str
context: Optional[str]
context_column: Optional[str]
context_replace_string: str
openai_params: dict
model: str
check_mode: str
possible_values: Optional[List[str]]
def __init__(
self,
column_name: str,
model: str,
prompt: str,
feature_type: str,
context: Optional[str] = None,
context_column: Optional[str] = None,
prompt_replace_string: str = "REPLACE",
context_replace_string: str = "CONTEXT",
check_mode: str = "any_line",
possible_values: Optional[List[str]] = None,
openai_params: Optional[dict] = None,
display_name: Optional[str] = None,
): ...
def generate_feature(self, data: pd.DataFrame, data_definition: DataDefinition) -> pd.DataFrame: ...
def _as_column(self) -> ColumnName: ...
def _feature_column_name(self) -> str: ...
Import
from evidently.legacy.features.openai_feature import OpenAIFeature
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| column_name | str | Yes | Name of the text column in the DataFrame to evaluate |
| model | str | Yes | OpenAI model name (e.g., "gpt-4", "gpt-3.5-turbo") |
| prompt | str | Yes | Prompt template string with a placeholder for the text value |
| feature_type | str | Yes | Output feature type: "cat" for categorical, anything else for numerical |
| context | Optional[str] | No | Static context string (mutually exclusive with context_column) |
| context_column | Optional[str] | No | Name of a DataFrame column to use as per-row context (mutually exclusive with context) |
| prompt_replace_string | str | No | Placeholder in the prompt to replace with the text value (default: "REPLACE") |
| context_replace_string | str | No | Placeholder in the prompt to replace with context (default: "CONTEXT") |
| check_mode | str | No | Response parsing mode: "any_line", "any_line_contains", "first_line", "first_line_contains" (default: "any_line") |
| possible_values | Optional[List[str]] | No | List of valid response values to match against (case-insensitive) |
| openai_params | Optional[dict] | No | Additional parameters to pass to the OpenAI API call |
| display_name | Optional[str] | No | Custom display name for the feature |
Outputs
| Name | Type | Description |
|---|---|---|
| return | pd.DataFrame | A single-column DataFrame with string values (categorical) or float values (numerical), or None for unparseable responses |
Usage Examples
from evidently.legacy.features.openai_feature import OpenAIFeature
# Categorical classification with possible values
sentiment_feature = OpenAIFeature(
column_name="review",
model="gpt-4",
prompt="Classify the sentiment of the following text as positive, negative, or neutral: REPLACE",
feature_type="cat",
possible_values=["positive", "negative", "neutral"],
context="You are a sentiment analysis expert.",
display_name="Sentiment"
)
# Numerical scoring
quality_feature = OpenAIFeature(
column_name="response",
model="gpt-4",
prompt="Rate the quality of the following response on a scale of 1-10: REPLACE",
feature_type="num",
display_name="Quality Score"
)
# With context column
relevance_feature = OpenAIFeature(
column_name="answer",
model="gpt-4",
prompt="Given the context CONTEXT, is the following answer relevant? REPLACE",
feature_type="cat",
context_column="question",
possible_values=["yes", "no"],
)