Implementation:Datajuicer Data juicer PythonFileMapper
| Knowledge Sources | |
|---|---|
| Domains | Data_Processing, Mapping |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for applying custom Python functions from external files to data samples provided by Data-Juicer.
Description
PythonFileMapper is a mapper operator that dynamically loads a Python function from a specified external .py file and applies it to input data samples. The function is loaded using importlib.util, validated to be callable with exactly one argument, and must return a dictionary. It supports both single-sample and batched processing modes. If no file path is provided, the operator acts as an identity function.
Usage
Use when you need to inject custom transformation logic into the Data-Juicer pipeline without modifying the source code, by specifying an external Python file containing your processing function.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File: data_juicer/ops/mapper/python_file_mapper.py
Signature
@OPERATORS.register_module("python_file_mapper")
class PythonFileMapper(Mapper):
def __init__(self, file_path: str = "", function_name: str = "process_single", batched: bool = False, **kwargs):
Import
from data_juicer.ops.mapper.python_file_mapper import PythonFileMapper
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| file_path | str | No | Path to the Python file containing the function (default: empty string, acts as identity) |
| function_name | str | No | Name of the function to execute (default: "process_single") |
| batched | bool | No | Whether to process data in batches (default: False) |
Outputs
| Name | Type | Description |
|---|---|---|
| samples | Dict | Transformed samples returned by the user-defined function |
Usage Examples
process:
- python_file_mapper:
file_path: '/path/to/custom_transform.py'
function_name: 'process_single'
batched: false