Implementation:Cohere ai Cohere python Dataset Model

Knowledge Sources	Cohere Python SDK
Domains	SDK, Datasets, Fine-Tuning
Last Updated	2026-02-15 14:00 GMT

Overview

Dataset is a Pydantic model representing a dataset entity in the Cohere platform, used for embeddings, classification fine-tuning, chat fine-tuning, and batch operations.

Description

The Dataset model captures metadata and validation state for a dataset registered in the Cohere platform. It includes:

id -- The unique identifier for the dataset.
name -- The human-readable name of the dataset.
created_at and updated_at -- Timestamps tracking the dataset lifecycle.
dataset_type -- The type of dataset, specified as a DatasetType. Accepted values include: "embed-input", "embed-result", "cluster-result", "cluster-outliers", "reranker-finetune-input", "single-label-classification-finetune-input", "chat-finetune-input", "multi-label-classification-finetune-input", "batch-chat-input", "batch-openai-chat-input", "batch-embed-v2-input", "batch-chat-v2-input".
validation_status -- The current validation state of the dataset (DatasetValidationStatus).
validation_error -- An optional string containing errors found during validation.
validation_warnings -- An optional list of warnings found during validation.
schema_ -- The Avro schema of the dataset (aliased from schema).
required_fields -- An optional list of required field names.
preserve_fields -- An optional list of fields to preserve.
dataset_parts -- An optional list of DatasetPart objects representing the underlying files that make up the dataset.

Usage

Use Dataset when working with the Cohere datasets API to upload, list, inspect, or manage datasets. Datasets are used as inputs for embedding jobs, fine-tuning runs, batch inference, and clustering operations.

Code Reference

Source Location

Repository: Cohere Python SDK
File: src/cohere/types/dataset.py

Signature

class Dataset(UncheckedBaseModel):
    id: str = pydantic.Field()
    name: str = pydantic.Field()
    created_at: dt.datetime = pydantic.Field()
    updated_at: dt.datetime = pydantic.Field()
    dataset_type: DatasetType
    validation_status: DatasetValidationStatus
    validation_error: typing.Optional[str] = pydantic.Field(default=None)
    schema_: typing_extensions.Annotated[
        typing.Optional[str],
        FieldMetadata(alias="schema"),
        pydantic.Field(alias="schema", description="the avro schema of the dataset"),
    ] = None
    required_fields: typing.Optional[typing.List[str]] = None
    preserve_fields: typing.Optional[typing.List[str]] = None
    dataset_parts: typing.Optional[typing.List[DatasetPart]] = pydantic.Field(default=None)
    validation_warnings: typing.Optional[typing.List[str]] = pydantic.Field(default=None)

Import

from cohere.types import Dataset

I/O Contract

Fields

Field	Type	Required	Description
`id`	`str`	Yes	The unique dataset ID.
`name`	`str`	Yes	The name of the dataset.
`created_at`	`datetime`	Yes	The creation date of the dataset.
`updated_at`	`datetime`	Yes	The last update date of the dataset.
`dataset_type`	`DatasetType`	Yes	The type of dataset (e.g., `"embed-input"`, `"chat-finetune-input"`).
`validation_status`	`DatasetValidationStatus`	Yes	The current validation status of the dataset.
`validation_error`	`Optional[str]`	No	Errors found during validation.
`schema_`	`Optional[str]`	No	The Avro schema of the dataset (JSON alias: `schema`).
`required_fields`	`Optional[List[str]]`	No	List of required field names in the dataset.
`preserve_fields`	`Optional[List[str]]`	No	List of fields to preserve in the dataset.
`dataset_parts`	`Optional[List[DatasetPart]]`	No	The underlying files that make up the dataset.
`validation_warnings`	`Optional[List[str]]`	No	Warnings found during validation.

DatasetType Values

Value	Description
`"embed-input"`	Input data for embedding jobs.
`"embed-result"`	Result data from embedding jobs.
`"cluster-result"`	Result data from clustering operations.
`"cluster-outliers"`	Outlier data from clustering operations.
`"reranker-finetune-input"`	Input data for reranker fine-tuning.
`"single-label-classification-finetune-input"`	Input data for single-label classification fine-tuning.
`"chat-finetune-input"`	Input data for chat model fine-tuning.
`"multi-label-classification-finetune-input"`	Input data for multi-label classification fine-tuning.
`"batch-chat-input"`	Input data for batch chat inference.
`"batch-openai-chat-input"`	Input data for batch OpenAI-compatible chat inference.
`"batch-embed-v2-input"`	Input data for batch V2 embedding inference.
`"batch-chat-v2-input"`	Input data for batch V2 chat inference.

Usage Examples

from cohere.types import Dataset

# List all datasets
datasets = client.datasets.list()
for dataset in datasets.datasets:
    print(f"ID: {dataset.id}")
    print(f"Name: {dataset.name}")
    print(f"Type: {dataset.dataset_type}")
    print(f"Status: {dataset.validation_status}")
    if dataset.validation_error:
        print(f"Error: {dataset.validation_error}")
    if dataset.validation_warnings:
        for warning in dataset.validation_warnings:
            print(f"Warning: {warning}")

# Get a specific dataset
dataset = client.datasets.get(id="my-dataset-id")
print(f"Dataset: {dataset.name}")
print(f"Created: {dataset.created_at}")
print(f"Updated: {dataset.updated_at}")
print(f"Schema: {dataset.schema_}")
if dataset.dataset_parts:
    for part in dataset.dataset_parts:
        print(f"Part: {part}")

Related Pages

Environment:Cohere_ai_Cohere_python_Python_SDK_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment