Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datajuicer Data juicer NaiveGrouper

From Leeroopedia
Knowledge Sources
Domains Data_Processing, Grouping
Last Updated 2026-02-14 16:00 GMT

Overview

Concrete tool for grouping all samples into a single batch provided by Data-Juicer.

Description

NaiveGrouper extends Grouper and provides the simplest grouper implementation that combines all samples in a dataset into a single batched sample. It uses convert_list_dict_to_dict_list to transform the entire dataset (a list of dictionaries) into a single dictionary of lists, where each key maps to a list of all values across samples. The result is returned wrapped in a single-element list. If the dataset is empty, it is returned unchanged. This operator is used as a building block both independently and internally by KeyValueGrouper.

Usage

Use when all samples need to be processed together as a single group for global aggregation operations, or as a simple batching step before applying aggregators.

Code Reference

Source Location

Signature

@OPERATORS.register_module("naive_grouper")
class NaiveGrouper(Grouper):
    def __init__(self, *args, **kwargs):

Import

from data_juicer.ops.grouper.naive_grouper import NaiveGrouper

I/O Contract

Inputs

Name Type Required Description
dataset Dataset Yes The dataset containing samples to group into a single batch

Outputs

Name Type Description
batched_samples list of dict Single-element list containing one batched sample with all dataset records combined

Usage Examples

process:
  - naive_grouper:

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment