Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Wandb Weave Dataset

From Leeroopedia
Knowledge Sources
Domains Data_Engineering, Evaluation
Last Updated 2026-02-14 00:00 GMT

Overview

Concrete tool for creating versioned evaluation datasets provided by the Wandb Weave library.

Description

The Dataset class is a Weave Object that stores tabular evaluation data as a list of dictionaries. It supports construction from raw lists, pandas DataFrames, HuggingFace datasets, and Weave Call objects. It provides iteration, indexing, row addition, column projection, and conversion back to pandas or HuggingFace formats.

As a registered Weave object, datasets can be published and versioned using weave.publish().

Usage

Use this class to create evaluation datasets that will be consumed by Evaluation.evaluate(). Datasets can be created inline or loaded from external sources.

Code Reference

Source Location

  • Repository: wandb/weave
  • File: weave/dataset/dataset.py
  • Lines: L26-239

Signature

@register_object
class Dataset(Object):
    """Dataset object with easy saving and automatic versioning."""

    rows: Table | WeaveTable

    @classmethod
    def from_pandas(cls, df: "pd.DataFrame") -> Self:
        """Construct a Dataset from a pandas DataFrame."""

    @classmethod
    def from_hf(cls, hf_dataset: Union["HFDataset", "HFDatasetDict"]) -> Self:
        """Construct a Dataset from a HuggingFace Dataset or DatasetDict."""

    @classmethod
    def from_calls(cls, calls: Iterable[Call]) -> Self:
        """Construct a Dataset from an iterable of Weave Call objects."""

    def add_rows(self, rows: Iterable[dict]) -> "Dataset":
        """Create new dataset version by appending rows."""

    def select(self, indices: Iterable[int]) -> Self:
        """Select rows by indices, returning a new Dataset."""

    def to_pandas(self) -> "pd.DataFrame":
        """Convert Dataset to pandas DataFrame."""

    def to_hf(self) -> "HFDataset":
        """Convert Dataset to HuggingFace Dataset."""

Import

import weave
# or
from weave import Dataset

I/O Contract

Inputs

Name Type Required Description
rows Table | WeaveTable Yes Tabular data as list of dicts or Table
name None No Name for the dataset (inherited from Object)
description None No Description text

Outputs

Name Type Description
Dataset Dataset Iterable, indexable collection with properties: columns_names, num_rows
to_pandas() pd.DataFrame Pandas DataFrame representation
to_hf() HFDataset HuggingFace Dataset representation

Usage Examples

From List of Dicts

import weave

dataset = weave.Dataset(
    name="grammar",
    rows=[
        {"id": "0", "sentence": "He no likes ice cream.", "correction": "He doesn't like ice cream."},
        {"id": "1", "sentence": "She goed to the store.", "correction": "She went to the store."},
    ],
)

# Publish for versioning
weave.init("my-team/my-project")
weave.publish(dataset)

From Pandas DataFrame

import pandas as pd
from weave import Dataset

df = pd.DataFrame({"input": ["hello", "world"], "expected": ["HELLO", "WORLD"]})
dataset = Dataset.from_pandas(df)

From HuggingFace

from datasets import load_dataset
from weave import Dataset

hf_ds = load_dataset("glue", "mrpc", split="test")
dataset = Dataset.from_hf(hf_ds)

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment