Implementation:Datajuicer Data juicer PythonLambdaMapper
| Knowledge Sources | |
|---|---|
| Domains | Data_Processing, Mapping |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for applying inline Python lambda functions to data samples provided by Data-Juicer.
Description
PythonLambdaMapper is a mapper operator that enables users to define custom transformations using a Python lambda function string. The lambda is parsed into an AST, validated to be a proper lambda with exactly one argument, compiled and evaluated safely, then applied to each sample or batch. The function must return a dictionary. If no lambda string is provided, the identity function is used.
Usage
Use when you need a quick, lightweight inline transformation directly in the pipeline configuration without requiring an external file.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File: data_juicer/ops/mapper/python_lambda_mapper.py
Signature
@OPERATORS.register_module("python_lambda_mapper")
class PythonLambdaMapper(Mapper):
def __init__(self, lambda_str: str = "", batched: bool = False, **kwargs):
Import
from data_juicer.ops.mapper.python_lambda_mapper import PythonLambdaMapper
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| lambda_str | str | No | String representation of the lambda function (default: empty, uses identity) |
| batched | bool | No | Whether to process data in batches (default: False) |
Outputs
| Name | Type | Description |
|---|---|---|
| samples | Dict | Transformed samples returned by the lambda function |
Usage Examples
process:
- python_lambda_mapper:
lambda_str: 'lambda sample: {**sample, "text": sample["text"].lower()}'
batched: false