Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datajuicer Data juicer PythonFileMapper

From Leeroopedia
Knowledge Sources
Domains Data_Processing, Mapping
Last Updated 2026-02-14 16:00 GMT

Overview

Concrete tool for applying custom Python functions from external files to data samples provided by Data-Juicer.

Description

PythonFileMapper is a mapper operator that dynamically loads a Python function from a specified external .py file and applies it to input data samples. The function is loaded using importlib.util, validated to be callable with exactly one argument, and must return a dictionary. It supports both single-sample and batched processing modes. If no file path is provided, the operator acts as an identity function.

Usage

Use when you need to inject custom transformation logic into the Data-Juicer pipeline without modifying the source code, by specifying an external Python file containing your processing function.

Code Reference

Source Location

Signature

@OPERATORS.register_module("python_file_mapper")
class PythonFileMapper(Mapper):
    def __init__(self, file_path: str = "", function_name: str = "process_single", batched: bool = False, **kwargs):

Import

from data_juicer.ops.mapper.python_file_mapper import PythonFileMapper

I/O Contract

Inputs

Name Type Required Description
file_path str No Path to the Python file containing the function (default: empty string, acts as identity)
function_name str No Name of the function to execute (default: "process_single")
batched bool No Whether to process data in batches (default: False)

Outputs

Name Type Description
samples Dict Transformed samples returned by the user-defined function

Usage Examples

process:
  - python_file_mapper:
      file_path: '/path/to/custom_transform.py'
      function_name: 'process_single'
      batched: false

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment