Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datajuicer Data juicer TagsSpecifiedFieldSelector

From Leeroopedia
Knowledge Sources
Domains Data_Processing, Selection
Last Updated 2026-02-14 16:00 GMT

Overview

Concrete tool for filtering samples based on matching field values to target tags provided by Data-Juicer.

Description

TagsSpecifiedFieldSelector extends Selector and filters dataset samples by keeping only those whose specified field value matches one of a predefined set of target tags. It iterates over all samples, extracts the value at the dot-separated multi-level field key, and checks membership against a set of target tags. Samples with matching values have their indices collected and used to select the filtered subset via dataset.select(). The selection is case-sensitive. The field value must be a string, number, or None type. If the dataset has fewer than two samples or if field_key is empty, the dataset is returned unchanged.

Usage

Use when you need tag-based or category-based filtering for data processing pipelines, selecting data belonging to specific categories, labels, or groups.

Code Reference

Source Location

Signature

@OPERATORS.register_module("tags_specified_field_selector")
class TagsSpecifiedFieldSelector(Selector):
    def __init__(self, field_key: str = "",
                 target_tags: List[str] = None,
                 *args, **kwargs):

Import

from data_juicer.ops.selector.tags_specified_field_selector import TagsSpecifiedFieldSelector

I/O Contract

Inputs

Name Type Required Description
field_key str No Target field key. Multi-level fields separated by '.'. Default: ""
target_tags List[str] Yes List of tags to match against the field value

Outputs

Name Type Description
dataset Dataset Filtered dataset containing only samples whose field values match a target tag

Usage Examples

process:
  - tags_specified_field_selector:
      field_key: "__dj__stats__.lang"
      target_tags: ["en", "zh"]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment