Overview
Concrete type for evaluation creation parameters provided by the openai-python SDK.
Description
The EvalCreateParams TypedDict defines the parameters for creating a new evaluation via client.evals.create(). It requires a data_source_config (a Union of DataSourceConfigCustom, DataSourceConfigLogs, or DataSourceConfigStoredCompletions) that dictates the schema of evaluation data, and testing_criteria (an iterable of TestingCriterion -- a Union of LabelModel, StringCheckGrader, TextSimilarity, Python, and ScoreModel grader types). Optional fields include metadata and name. The module also defines numerous nested TypedDicts for grader input messages, eval item content types, and criterion-specific configurations.
Usage
Import this type when constructing or type-hinting parameters for client.evals.create(). The complex nested types allow you to define custom data schemas, log-based or stored-completions data sources, and multiple grading criteria.
Code Reference
Source Location
Signature
class EvalCreateParams(TypedDict, total=False):
data_source_config: Required[DataSourceConfig]
testing_criteria: Required[Iterable[TestingCriterion]]
metadata: Optional[Metadata]
name: str
DataSourceConfig = Union[DataSourceConfigCustom, DataSourceConfigLogs, DataSourceConfigStoredCompletions]
class DataSourceConfigCustom(TypedDict, total=False):
item_schema: Required[Dict[str, object]]
type: Required[Literal["custom"]]
include_sample_schema: bool
class DataSourceConfigLogs(TypedDict, total=False):
type: Required[Literal["logs"]]
metadata: Dict[str, object]
class DataSourceConfigStoredCompletions(TypedDict, total=False):
type: Required[Literal["stored_completions"]]
metadata: Dict[str, object]
TestingCriterion = Union[
TestingCriterionLabelModel,
StringCheckGraderParam,
TestingCriterionTextSimilarity,
TestingCriterionPython,
TestingCriterionScoreModel,
]
Import
from openai.types import EvalCreateParams
I/O Contract
Fields
| Name |
Type |
Required |
Description
|
| data_source_config |
DataSourceConfig |
Yes |
Data source configuration (custom, logs, or stored_completions)
|
| testing_criteria |
Iterable[TestingCriterion] |
Yes |
List of graders; can reference Template:Item.variable and Template:Sample.output text
|
| metadata |
Optional[Metadata] |
No |
Up to 16 key-value pairs for additional information
|
| name |
str |
No |
Name of the evaluation
|
DataSourceConfigCustom Fields
| Name |
Type |
Required |
Description
|
| item_schema |
Dict[str, object] |
Yes |
JSON schema for each row in the data source
|
| type |
Literal["custom"] |
Yes |
Always "custom"
|
| include_sample_schema |
bool |
No |
Whether to populate the sample namespace
|
TestingCriterion Variants
| Variant |
Description
|
| TestingCriterionLabelModel |
Model-based label grader with input messages, labels, and passing_labels
|
| StringCheckGraderParam |
String matching grader
|
| TestingCriterionTextSimilarity |
Text similarity grader with pass_threshold
|
| TestingCriterionPython |
Python script grader with optional pass_threshold
|
| TestingCriterionScoreModel |
Model scoring grader with optional pass_threshold
|
Usage Examples
from openai import OpenAI
client = OpenAI()
eval_obj = client.evals.create(
name="Chatbot Quality Check",
data_source_config={
"type": "custom",
"item_schema": {
"type": "object",
"properties": {
"input": {"type": "string"},
"expected": {"type": "string"},
},
"required": ["input", "expected"],
},
},
testing_criteria=[
{
"type": "label_model",
"name": "correctness",
"model": "gpt-4o",
"input": [
{"role": "user", "content": "Is '{{sample.output_text}}' correct for '{{item.input}}'?"}
],
"labels": ["correct", "incorrect"],
"passing_labels": ["correct"],
}
],
)
print(eval_obj.id)
Related Pages