Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Cohere ai Cohere python Dataset Model

From Leeroopedia
Revision as of 12:17, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Cohere_ai_Cohere_python_Dataset_Model.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains SDK, Datasets, Fine-Tuning
Last Updated 2026-02-15 14:00 GMT

Overview

Dataset is a Pydantic model representing a dataset entity in the Cohere platform, used for embeddings, classification fine-tuning, chat fine-tuning, and batch operations.

Description

The Dataset model captures metadata and validation state for a dataset registered in the Cohere platform. It includes:

  • id -- The unique identifier for the dataset.
  • name -- The human-readable name of the dataset.
  • created_at and updated_at -- Timestamps tracking the dataset lifecycle.
  • dataset_type -- The type of dataset, specified as a DatasetType. Accepted values include: "embed-input", "embed-result", "cluster-result", "cluster-outliers", "reranker-finetune-input", "single-label-classification-finetune-input", "chat-finetune-input", "multi-label-classification-finetune-input", "batch-chat-input", "batch-openai-chat-input", "batch-embed-v2-input", "batch-chat-v2-input".
  • validation_status -- The current validation state of the dataset (DatasetValidationStatus).
  • validation_error -- An optional string containing errors found during validation.
  • validation_warnings -- An optional list of warnings found during validation.
  • schema_ -- The Avro schema of the dataset (aliased from schema).
  • required_fields -- An optional list of required field names.
  • preserve_fields -- An optional list of fields to preserve.
  • dataset_parts -- An optional list of DatasetPart objects representing the underlying files that make up the dataset.

Usage

Use Dataset when working with the Cohere datasets API to upload, list, inspect, or manage datasets. Datasets are used as inputs for embedding jobs, fine-tuning runs, batch inference, and clustering operations.

Code Reference

Source Location

Signature

class Dataset(UncheckedBaseModel):
    id: str = pydantic.Field()
    name: str = pydantic.Field()
    created_at: dt.datetime = pydantic.Field()
    updated_at: dt.datetime = pydantic.Field()
    dataset_type: DatasetType
    validation_status: DatasetValidationStatus
    validation_error: typing.Optional[str] = pydantic.Field(default=None)
    schema_: typing_extensions.Annotated[
        typing.Optional[str],
        FieldMetadata(alias="schema"),
        pydantic.Field(alias="schema", description="the avro schema of the dataset"),
    ] = None
    required_fields: typing.Optional[typing.List[str]] = None
    preserve_fields: typing.Optional[typing.List[str]] = None
    dataset_parts: typing.Optional[typing.List[DatasetPart]] = pydantic.Field(default=None)
    validation_warnings: typing.Optional[typing.List[str]] = pydantic.Field(default=None)

Import

from cohere.types import Dataset

I/O Contract

Fields

Field Type Required Description
id str Yes The unique dataset ID.
name str Yes The name of the dataset.
created_at datetime Yes The creation date of the dataset.
updated_at datetime Yes The last update date of the dataset.
dataset_type DatasetType Yes The type of dataset (e.g., "embed-input", "chat-finetune-input").
validation_status DatasetValidationStatus Yes The current validation status of the dataset.
validation_error Optional[str] No Errors found during validation.
schema_ Optional[str] No The Avro schema of the dataset (JSON alias: schema).
required_fields Optional[List[str]] No List of required field names in the dataset.
preserve_fields Optional[List[str]] No List of fields to preserve in the dataset.
dataset_parts Optional[List[DatasetPart]] No The underlying files that make up the dataset.
validation_warnings Optional[List[str]] No Warnings found during validation.

DatasetType Values

Value Description
"embed-input" Input data for embedding jobs.
"embed-result" Result data from embedding jobs.
"cluster-result" Result data from clustering operations.
"cluster-outliers" Outlier data from clustering operations.
"reranker-finetune-input" Input data for reranker fine-tuning.
"single-label-classification-finetune-input" Input data for single-label classification fine-tuning.
"chat-finetune-input" Input data for chat model fine-tuning.
"multi-label-classification-finetune-input" Input data for multi-label classification fine-tuning.
"batch-chat-input" Input data for batch chat inference.
"batch-openai-chat-input" Input data for batch OpenAI-compatible chat inference.
"batch-embed-v2-input" Input data for batch V2 embedding inference.
"batch-chat-v2-input" Input data for batch V2 chat inference.

Usage Examples

from cohere.types import Dataset

# List all datasets
datasets = client.datasets.list()
for dataset in datasets.datasets:
    print(f"ID: {dataset.id}")
    print(f"Name: {dataset.name}")
    print(f"Type: {dataset.dataset_type}")
    print(f"Status: {dataset.validation_status}")
    if dataset.validation_error:
        print(f"Error: {dataset.validation_error}")
    if dataset.validation_warnings:
        for warning in dataset.validation_warnings:
            print(f"Warning: {warning}")

# Get a specific dataset
dataset = client.datasets.get(id="my-dataset-id")
print(f"Dataset: {dataset.name}")
print(f"Created: {dataset.created_at}")
print(f"Updated: {dataset.updated_at}")
print(f"Schema: {dataset.schema_}")
if dataset.dataset_parts:
    for part in dataset.dataset_parts:
        print(f"Part: {part}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment