Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datajuicer Data juicer PythonLambdaMapper

From Leeroopedia
Knowledge Sources
Domains Data_Processing, Mapping
Last Updated 2026-02-14 16:00 GMT

Overview

Concrete tool for applying inline Python lambda functions to data samples provided by Data-Juicer.

Description

PythonLambdaMapper is a mapper operator that enables users to define custom transformations using a Python lambda function string. The lambda is parsed into an AST, validated to be a proper lambda with exactly one argument, compiled and evaluated safely, then applied to each sample or batch. The function must return a dictionary. If no lambda string is provided, the identity function is used.

Usage

Use when you need a quick, lightweight inline transformation directly in the pipeline configuration without requiring an external file.

Code Reference

Source Location

Signature

@OPERATORS.register_module("python_lambda_mapper")
class PythonLambdaMapper(Mapper):
    def __init__(self, lambda_str: str = "", batched: bool = False, **kwargs):

Import

from data_juicer.ops.mapper.python_lambda_mapper import PythonLambdaMapper

I/O Contract

Inputs

Name Type Required Description
lambda_str str No String representation of the lambda function (default: empty, uses identity)
batched bool No Whether to process data in batches (default: False)

Outputs

Name Type Description
samples Dict Transformed samples returned by the lambda function

Usage Examples

process:
  - python_lambda_mapper:
      lambda_str: 'lambda sample: {**sample, "text": sample["text"].lower()}'
      batched: false

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment