Implementation:Datajuicer Data juicer NaiveGrouper
| Knowledge Sources | |
|---|---|
| Domains | Data_Processing, Grouping |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for grouping all samples into a single batch provided by Data-Juicer.
Description
NaiveGrouper extends Grouper and provides the simplest grouper implementation that combines all samples in a dataset into a single batched sample. It uses convert_list_dict_to_dict_list to transform the entire dataset (a list of dictionaries) into a single dictionary of lists, where each key maps to a list of all values across samples. The result is returned wrapped in a single-element list. If the dataset is empty, it is returned unchanged. This operator is used as a building block both independently and internally by KeyValueGrouper.
Usage
Use when all samples need to be processed together as a single group for global aggregation operations, or as a simple batching step before applying aggregators.
Code Reference
Source Location
- Repository: Datajuicer_Data_juicer
- File: data_juicer/ops/grouper/naive_grouper.py
Signature
@OPERATORS.register_module("naive_grouper")
class NaiveGrouper(Grouper):
def __init__(self, *args, **kwargs):
Import
from data_juicer.ops.grouper.naive_grouper import NaiveGrouper
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| dataset | Dataset | Yes | The dataset containing samples to group into a single batch |
Outputs
| Name | Type | Description |
|---|---|---|
| batched_samples | list of dict | Single-element list containing one batched sample with all dataset records combined |
Usage Examples
process:
- naive_grouper: