Implementation:Datajuicer Data juicer NaiveReverseGrouper
| Knowledge Sources | |
|---|---|
| Domains | Data_Processing, Grouping |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for splitting batched samples back into individual samples provided by Data-Juicer.
Description
NaiveReverseGrouper extends Grouper and serves as the inverse operation of NaiveGrouper. It iterates over batched samples, separates any batch_meta field for optional JSON export, then uses convert_dict_list_to_list_dict to split each batched dict-of-lists back into a list of individual dictionaries. If batch_meta_export_path is specified, batch metadata is written as JSON lines to the given file path, creating directories as needed. If no samples are present in the dataset, the original dataset is returned unchanged.
Usage
Use when you need to unbatch grouped data after aggregation operations have been applied, returning the pipeline to per-sample processing, with optional export of batch-level metadata.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File: data_juicer/ops/grouper/naive_reverse_grouper.py
Signature
@OPERATORS.register_module("naive_reverse_grouper")
class NaiveReverseGrouper(Grouper):
def __init__(self, batch_meta_export_path=None,
*args, **kwargs):
Import
from data_juicer.ops.grouper.naive_reverse_grouper import NaiveReverseGrouper
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| batch_meta_export_path | str | No | Path to export batch metadata as JSON lines. Default: None (batch meta is dropped) |
Outputs
| Name | Type | Description |
|---|---|---|
| samples | list of dict | Individual samples split from batched input, with batch_meta separated |
Usage Examples
process:
- naive_reverse_grouper:
batch_meta_export_path: "./output/batch_meta.jsonl"