Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datajuicer Data juicer CleanIpMapper

From Leeroopedia
Revision as of 12:20, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Datajuicer_Data_juicer_CleanIpMapper.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Data_Processing, Mapping
Last Updated 2026-02-14 16:00 GMT

Overview

Concrete tool for removing or replacing IPv4 and IPv6 addresses from text samples provided by Data-Juicer.

Description

CleanIpMapper is a mapper operator that cleans IP addresses from text samples using regular expression matching. It applies a regex pattern that matches both IPv4 addresses (e.g., 192.168.1.1) and IPv6 addresses (colon-separated hex groups) to each text in a batch, replacing matches with a configurable replacement string (empty by default, effectively removing IPs). A custom regex pattern can be provided for specialized needs. It operates in batched mode for efficiency. It extends the Mapper base class.

Usage

Import when you need to anonymize datasets by removing IP addresses that could identify users or systems.

Code Reference

Source Location

Signature

@OPERATORS.register_module("clean_ip_mapper")
class CleanIpMapper(Mapper):
    def __init__(self,
                 pattern: Optional[str] = None,
                 repl: str = "",
                 *args, **kwargs):

Import

from data_juicer.ops.mapper.clean_ip_mapper import CleanIpMapper

I/O Contract

Inputs

Name Type Required Description
pattern Optional[str] No Regular expression pattern to search for within text. Default: pattern matching IPv4 and IPv6 addresses
repl str No Replacement string for matched patterns. Default: "" (removes IP addresses)

Outputs

Name Type Description
samples Dict Transformed samples with IP addresses removed or replaced in text

Usage Examples

YAML Configuration

process:
  - clean_ip_mapper:
      repl: "<IP>"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment