Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datajuicer Data juicer ReplaceContentMapper

From Leeroopedia
Knowledge Sources
Domains Data_Processing, Mapping
Last Updated 2026-02-14 16:00 GMT

Overview

Concrete tool for regex-based content replacement in text samples provided by Data-Juicer.

Description

ReplaceContentMapper is a mapper operator that performs regex-based find-and-replace operations on text samples. It supports single or multiple pattern-replacement pairs for flexible content transformation. Patterns are pre-compiled with the re.DOTALL flag for multiline matching, and raw string notation is automatically stripped. Each pattern-replacement pair is applied sequentially to each text sample. A ValueError is raised if the lengths of patterns and replacements do not match. Operates in batched mode.

Usage

Use when you need pattern-based content cleaning, redaction, or reformatting of text within the data processing pipeline.

Code Reference

Source Location

Signature

@OPERATORS.register_module("replace_content_mapper")
class ReplaceContentMapper(Mapper):
    def __init__(self, pattern: Union[str, List[str], None] = None, repl: Union[str, List[str]] = "", *args, **kwargs):

Import

from data_juicer.ops.mapper.replace_content_mapper import ReplaceContentMapper

I/O Contract

Inputs

Name Type Required Description
pattern Union[str, List[str], None] No Regular expression pattern(s) to search for within text (default: None)
repl Union[str, List[str]] No Replacement string(s) (default: empty string)

Outputs

Name Type Description
samples Dict Transformed samples with content replaced

Usage Examples

process:
  - replace_content_mapper:
      pattern: '\s+'
      repl: ' '

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment