Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Evidentlyai Evidently Custom Descriptors

From Leeroopedia
Knowledge Sources
Domains Descriptors, Data Processing, Extensibility
Last Updated 2026-02-14 12:00 GMT

Overview

Descriptor classes that allow users to apply custom Python functions to individual columns or entire datasets for computing derived features.

Description

This module provides two custom descriptor classes that enable user-defined transformations:

CustomColumnDescriptor applies a custom callable to a single named column. The constructor accepts either a string (fully qualified function name) or a callable. When a callable is provided, it is cached internally and the string representation is derived from func.__module__.__name__. The generate_data method retrieves the column from the dataset and applies the stored callable, returning a DatasetColumn.

CustomDescriptor applies a custom callable to the entire Dataset object, enabling transformations that may depend on multiple columns or complex logic. Like CustomColumnDescriptor, it accepts either a string or a callable, with the same caching and naming logic. The generate_data method passes the full dataset to the callable.

Both classes:

  • Store the function reference as a string (func field) for serialization, with the actual callable held in a Pydantic PrivateAttr
  • Generate default aliases from the function path (e.g., custom_column_descriptor:module.func_name)
  • Support optional tests via the tests parameter
  • Raise ValueError if invoked without a configured callable (e.g., after deserialization without restoring the callable)

Usage

Use CustomColumnDescriptor when your transformation logic operates on a single column (e.g., text preprocessing, feature extraction from one field). Use CustomDescriptor when the transformation requires access to multiple columns or the full dataset context.

Code Reference

Source Location

Signature

CustomColumnCallable = Callable[[DatasetColumn], DatasetColumn]

class CustomColumnDescriptor(Descriptor):
    column_name: str
    func: str
    _func: Optional[CustomColumnCallable] = PrivateAttr(None)

    def __init__(
        self, column_name: str, func: Union[str, CustomColumnCallable],
        alias: Optional[str] = None, tests: Optional[List[AnyDescriptorTest]] = None,
    ):
    def generate_data(self, dataset: Dataset, options: Options) -> Union[DatasetColumn, Dict[str, DatasetColumn]]:
    def list_input_columns(self) -> Optional[List[str]]:

CustomDescriptorCallable = Callable[[Dataset], Union[DatasetColumn, Dict[str, DatasetColumn]]]

class CustomDescriptor(Descriptor):
    func: str
    _func: Optional[CustomDescriptorCallable] = PrivateAttr(None)

    def __init__(
        self, func: Union[str, CustomDescriptorCallable],
        alias: Optional[str] = None, tests: Optional[List[AnyDescriptorTest]] = None,
    ):
    def generate_data(self, dataset: "Dataset", options: Options) -> Union[DatasetColumn, Dict[str, DatasetColumn]]:

Import

from evidently.descriptors._custom_descriptors import CustomColumnDescriptor, CustomDescriptor

I/O Contract

Inputs

Name Type Required Description
column_name str Yes (CustomColumnDescriptor) Name of the column to apply the function to
func Union[str, Callable] Yes Custom function or fully qualified function name string
alias Optional[str] No Custom display name for the descriptor; auto-generated from func if not provided
tests Optional[List[AnyDescriptorTest]] No Tests to apply to the computed descriptor values

Outputs

Name Type Description
CustomColumnDescriptor.generate_data return Union[DatasetColumn, Dict[str, DatasetColumn]] Transformed column data from the custom function
CustomDescriptor.generate_data return Union[DatasetColumn, Dict[str, DatasetColumn]] Transformed data from the custom function applied to the full dataset

Usage Examples

from evidently.descriptors._custom_descriptors import CustomColumnDescriptor, CustomDescriptor
from evidently.core.datasets import DatasetColumn

# Custom column descriptor with a lambda-like function
def uppercase_transform(column: DatasetColumn) -> DatasetColumn:
    return DatasetColumn(
        type=column.type,
        data=column.data.str.upper(),
    )

col_descriptor = CustomColumnDescriptor(
    column_name="text",
    func=uppercase_transform,
    alias="uppercase_text",
)

# Custom descriptor operating on the full dataset
def word_ratio(dataset):
    df = dataset.as_dataframe()
    ratio = df["response"].str.len() / df["question"].str.len()
    return DatasetColumn(type="numerical", data=ratio)

full_descriptor = CustomDescriptor(
    func=word_ratio,
    alias="response_question_length_ratio",
)

# Add descriptors to dataset
dataset.add_descriptors([col_descriptor, full_descriptor])

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment