Implementation:Wandb Weave Dataset
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Evaluation |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
Concrete tool for creating versioned evaluation datasets provided by the Wandb Weave library.
Description
The Dataset class is a Weave Object that stores tabular evaluation data as a list of dictionaries. It supports construction from raw lists, pandas DataFrames, HuggingFace datasets, and Weave Call objects. It provides iteration, indexing, row addition, column projection, and conversion back to pandas or HuggingFace formats.
As a registered Weave object, datasets can be published and versioned using weave.publish().
Usage
Use this class to create evaluation datasets that will be consumed by Evaluation.evaluate(). Datasets can be created inline or loaded from external sources.
Code Reference
Source Location
- Repository: wandb/weave
- File: weave/dataset/dataset.py
- Lines: L26-239
Signature
@register_object
class Dataset(Object):
"""Dataset object with easy saving and automatic versioning."""
rows: Table | WeaveTable
@classmethod
def from_pandas(cls, df: "pd.DataFrame") -> Self:
"""Construct a Dataset from a pandas DataFrame."""
@classmethod
def from_hf(cls, hf_dataset: Union["HFDataset", "HFDatasetDict"]) -> Self:
"""Construct a Dataset from a HuggingFace Dataset or DatasetDict."""
@classmethod
def from_calls(cls, calls: Iterable[Call]) -> Self:
"""Construct a Dataset from an iterable of Weave Call objects."""
def add_rows(self, rows: Iterable[dict]) -> "Dataset":
"""Create new dataset version by appending rows."""
def select(self, indices: Iterable[int]) -> Self:
"""Select rows by indices, returning a new Dataset."""
def to_pandas(self) -> "pd.DataFrame":
"""Convert Dataset to pandas DataFrame."""
def to_hf(self) -> "HFDataset":
"""Convert Dataset to HuggingFace Dataset."""
Import
import weave
# or
from weave import Dataset
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| rows | Table | WeaveTable | Yes | Tabular data as list of dicts or Table |
| name | None | No | Name for the dataset (inherited from Object) |
| description | None | No | Description text |
Outputs
| Name | Type | Description |
|---|---|---|
| Dataset | Dataset | Iterable, indexable collection with properties: columns_names, num_rows |
| to_pandas() | pd.DataFrame | Pandas DataFrame representation |
| to_hf() | HFDataset | HuggingFace Dataset representation |
Usage Examples
From List of Dicts
import weave
dataset = weave.Dataset(
name="grammar",
rows=[
{"id": "0", "sentence": "He no likes ice cream.", "correction": "He doesn't like ice cream."},
{"id": "1", "sentence": "She goed to the store.", "correction": "She went to the store."},
],
)
# Publish for versioning
weave.init("my-team/my-project")
weave.publish(dataset)
From Pandas DataFrame
import pandas as pd
from weave import Dataset
df = pd.DataFrame({"input": ["hello", "world"], "expected": ["HELLO", "WORLD"]})
dataset = Dataset.from_pandas(df)
From HuggingFace
from datasets import load_dataset
from weave import Dataset
hf_ds = load_dataset("glue", "mrpc", split="test")
dataset = Dataset.from_hf(hf_ds)