Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Openai Openai python Eval Create Params

From Leeroopedia
Knowledge Sources
Domains API_Types, Python
Last Updated 2026-02-15 00:00 GMT

Overview

Concrete type for evaluation creation parameters provided by the openai-python SDK.

Description

The EvalCreateParams TypedDict defines the parameters for creating a new evaluation via client.evals.create(). It requires a data_source_config (a Union of DataSourceConfigCustom, DataSourceConfigLogs, or DataSourceConfigStoredCompletions) that dictates the schema of evaluation data, and testing_criteria (an iterable of TestingCriterion -- a Union of LabelModel, StringCheckGrader, TextSimilarity, Python, and ScoreModel grader types). Optional fields include metadata and name. The module also defines numerous nested TypedDicts for grader input messages, eval item content types, and criterion-specific configurations.

Usage

Import this type when constructing or type-hinting parameters for client.evals.create(). The complex nested types allow you to define custom data schemas, log-based or stored-completions data sources, and multiple grading criteria.

Code Reference

Source Location

Signature

class EvalCreateParams(TypedDict, total=False):
    data_source_config: Required[DataSourceConfig]
    testing_criteria: Required[Iterable[TestingCriterion]]
    metadata: Optional[Metadata]
    name: str

DataSourceConfig = Union[DataSourceConfigCustom, DataSourceConfigLogs, DataSourceConfigStoredCompletions]

class DataSourceConfigCustom(TypedDict, total=False):
    item_schema: Required[Dict[str, object]]
    type: Required[Literal["custom"]]
    include_sample_schema: bool

class DataSourceConfigLogs(TypedDict, total=False):
    type: Required[Literal["logs"]]
    metadata: Dict[str, object]

class DataSourceConfigStoredCompletions(TypedDict, total=False):
    type: Required[Literal["stored_completions"]]
    metadata: Dict[str, object]

TestingCriterion = Union[
    TestingCriterionLabelModel,
    StringCheckGraderParam,
    TestingCriterionTextSimilarity,
    TestingCriterionPython,
    TestingCriterionScoreModel,
]

Import

from openai.types import EvalCreateParams

I/O Contract

Fields

Name Type Required Description
data_source_config DataSourceConfig Yes Data source configuration (custom, logs, or stored_completions)
testing_criteria Iterable[TestingCriterion] Yes List of graders; can reference Template:Item.variable and Template:Sample.output text
metadata Optional[Metadata] No Up to 16 key-value pairs for additional information
name str No Name of the evaluation

DataSourceConfigCustom Fields

Name Type Required Description
item_schema Dict[str, object] Yes JSON schema for each row in the data source
type Literal["custom"] Yes Always "custom"
include_sample_schema bool No Whether to populate the sample namespace

TestingCriterion Variants

Variant Description
TestingCriterionLabelModel Model-based label grader with input messages, labels, and passing_labels
StringCheckGraderParam String matching grader
TestingCriterionTextSimilarity Text similarity grader with pass_threshold
TestingCriterionPython Python script grader with optional pass_threshold
TestingCriterionScoreModel Model scoring grader with optional pass_threshold

Usage Examples

from openai import OpenAI

client = OpenAI()

eval_obj = client.evals.create(
    name="Chatbot Quality Check",
    data_source_config={
        "type": "custom",
        "item_schema": {
            "type": "object",
            "properties": {
                "input": {"type": "string"},
                "expected": {"type": "string"},
            },
            "required": ["input", "expected"],
        },
    },
    testing_criteria=[
        {
            "type": "label_model",
            "name": "correctness",
            "model": "gpt-4o",
            "input": [
                {"role": "user", "content": "Is '{{sample.output_text}}' correct for '{{item.input}}'?"}
            ],
            "labels": ["correct", "incorrect"],
            "passing_labels": ["correct"],
        }
    ],
)
print(eval_obj.id)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment