Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datajuicer Data juicer NaiveReverseGrouper

From Leeroopedia
Knowledge Sources
Domains Data_Processing, Grouping
Last Updated 2026-02-14 16:00 GMT

Overview

Concrete tool for splitting batched samples back into individual samples provided by Data-Juicer.

Description

NaiveReverseGrouper extends Grouper and serves as the inverse operation of NaiveGrouper. It iterates over batched samples, separates any batch_meta field for optional JSON export, then uses convert_dict_list_to_list_dict to split each batched dict-of-lists back into a list of individual dictionaries. If batch_meta_export_path is specified, batch metadata is written as JSON lines to the given file path, creating directories as needed. If no samples are present in the dataset, the original dataset is returned unchanged.

Usage

Use when you need to unbatch grouped data after aggregation operations have been applied, returning the pipeline to per-sample processing, with optional export of batch-level metadata.

Code Reference

Source Location

Signature

@OPERATORS.register_module("naive_reverse_grouper")
class NaiveReverseGrouper(Grouper):
    def __init__(self, batch_meta_export_path=None,
                 *args, **kwargs):

Import

from data_juicer.ops.grouper.naive_reverse_grouper import NaiveReverseGrouper

I/O Contract

Inputs

Name Type Required Description
batch_meta_export_path str No Path to export batch metadata as JSON lines. Default: None (batch meta is dropped)

Outputs

Name Type Description
samples list of dict Individual samples split from batched input, with batch_meta separated

Usage Examples

process:
  - naive_reverse_grouper:
      batch_meta_export_path: "./output/batch_meta.jsonl"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment