Implementation:Datajuicer Data juicer PythonLambdaMapper

Knowledge Sources	Datajuicer_Data_juicer
Domains	Data_Processing, Mapping
Last Updated	2026-02-14 16:00 GMT

Overview

Concrete tool for applying inline Python lambda functions to data samples provided by Data-Juicer.

Description

PythonLambdaMapper is a mapper operator that enables users to define custom transformations using a Python lambda function string. The lambda is parsed into an AST, validated to be a proper lambda with exactly one argument, compiled and evaluated safely, then applied to each sample or batch. The function must return a dictionary. If no lambda string is provided, the identity function is used.

Usage

Use when you need a quick, lightweight inline transformation directly in the pipeline configuration without requiring an external file.

Code Reference

Source Location

Repository: Datajuicer_Data_juicer
File: data_juicer/ops/mapper/python_lambda_mapper.py

Signature

@OPERATORS.register_module("python_lambda_mapper")
class PythonLambdaMapper(Mapper):
    def __init__(self, lambda_str: str = "", batched: bool = False, **kwargs):

Import

from data_juicer.ops.mapper.python_lambda_mapper import PythonLambdaMapper

I/O Contract

Inputs

Name	Type	Required	Description
lambda_str	str	No	String representation of the lambda function (default: empty, uses identity)
batched	bool	No	Whether to process data in batches (default: False)

Outputs

Name	Type	Description
samples	Dict	Transformed samples returned by the lambda function

Usage Examples

process:
  - python_lambda_mapper:
      lambda_str: 'lambda sample: {**sample, "text": sample["text"].lower()}'
      batched: false

Related Pages

Environment:Datajuicer_Data_juicer_Python_Runtime_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment