Implementation:Cohere ai Cohere python Dataset Model
Appearance
| Knowledge Sources | |
|---|---|
| Domains | SDK, Datasets, Fine-Tuning |
| Last Updated | 2026-02-15 14:00 GMT |
Overview
Dataset is a Pydantic model representing a dataset entity in the Cohere platform, used for embeddings, classification fine-tuning, chat fine-tuning, and batch operations.
Description
The Dataset model captures metadata and validation state for a dataset registered in the Cohere platform. It includes:
- id -- The unique identifier for the dataset.
- name -- The human-readable name of the dataset.
- created_at and updated_at -- Timestamps tracking the dataset lifecycle.
- dataset_type -- The type of dataset, specified as a
DatasetType. Accepted values include:"embed-input","embed-result","cluster-result","cluster-outliers","reranker-finetune-input","single-label-classification-finetune-input","chat-finetune-input","multi-label-classification-finetune-input","batch-chat-input","batch-openai-chat-input","batch-embed-v2-input","batch-chat-v2-input". - validation_status -- The current validation state of the dataset (
DatasetValidationStatus). - validation_error -- An optional string containing errors found during validation.
- validation_warnings -- An optional list of warnings found during validation.
- schema_ -- The Avro schema of the dataset (aliased from
schema). - required_fields -- An optional list of required field names.
- preserve_fields -- An optional list of fields to preserve.
- dataset_parts -- An optional list of
DatasetPartobjects representing the underlying files that make up the dataset.
Usage
Use Dataset when working with the Cohere datasets API to upload, list, inspect, or manage datasets. Datasets are used as inputs for embedding jobs, fine-tuning runs, batch inference, and clustering operations.
Code Reference
Source Location
- Repository: Cohere Python SDK
- File:
src/cohere/types/dataset.py
Signature
class Dataset(UncheckedBaseModel):
id: str = pydantic.Field()
name: str = pydantic.Field()
created_at: dt.datetime = pydantic.Field()
updated_at: dt.datetime = pydantic.Field()
dataset_type: DatasetType
validation_status: DatasetValidationStatus
validation_error: typing.Optional[str] = pydantic.Field(default=None)
schema_: typing_extensions.Annotated[
typing.Optional[str],
FieldMetadata(alias="schema"),
pydantic.Field(alias="schema", description="the avro schema of the dataset"),
] = None
required_fields: typing.Optional[typing.List[str]] = None
preserve_fields: typing.Optional[typing.List[str]] = None
dataset_parts: typing.Optional[typing.List[DatasetPart]] = pydantic.Field(default=None)
validation_warnings: typing.Optional[typing.List[str]] = pydantic.Field(default=None)
Import
from cohere.types import Dataset
I/O Contract
Fields
| Field | Type | Required | Description |
|---|---|---|---|
id |
str |
Yes | The unique dataset ID. |
name |
str |
Yes | The name of the dataset. |
created_at |
datetime |
Yes | The creation date of the dataset. |
updated_at |
datetime |
Yes | The last update date of the dataset. |
dataset_type |
DatasetType |
Yes | The type of dataset (e.g., "embed-input", "chat-finetune-input").
|
validation_status |
DatasetValidationStatus |
Yes | The current validation status of the dataset. |
validation_error |
Optional[str] |
No | Errors found during validation. |
schema_ |
Optional[str] |
No | The Avro schema of the dataset (JSON alias: schema).
|
required_fields |
Optional[List[str]] |
No | List of required field names in the dataset. |
preserve_fields |
Optional[List[str]] |
No | List of fields to preserve in the dataset. |
dataset_parts |
Optional[List[DatasetPart]] |
No | The underlying files that make up the dataset. |
validation_warnings |
Optional[List[str]] |
No | Warnings found during validation. |
DatasetType Values
| Value | Description |
|---|---|
"embed-input" |
Input data for embedding jobs. |
"embed-result" |
Result data from embedding jobs. |
"cluster-result" |
Result data from clustering operations. |
"cluster-outliers" |
Outlier data from clustering operations. |
"reranker-finetune-input" |
Input data for reranker fine-tuning. |
"single-label-classification-finetune-input" |
Input data for single-label classification fine-tuning. |
"chat-finetune-input" |
Input data for chat model fine-tuning. |
"multi-label-classification-finetune-input" |
Input data for multi-label classification fine-tuning. |
"batch-chat-input" |
Input data for batch chat inference. |
"batch-openai-chat-input" |
Input data for batch OpenAI-compatible chat inference. |
"batch-embed-v2-input" |
Input data for batch V2 embedding inference. |
"batch-chat-v2-input" |
Input data for batch V2 chat inference. |
Usage Examples
from cohere.types import Dataset
# List all datasets
datasets = client.datasets.list()
for dataset in datasets.datasets:
print(f"ID: {dataset.id}")
print(f"Name: {dataset.name}")
print(f"Type: {dataset.dataset_type}")
print(f"Status: {dataset.validation_status}")
if dataset.validation_error:
print(f"Error: {dataset.validation_error}")
if dataset.validation_warnings:
for warning in dataset.validation_warnings:
print(f"Warning: {warning}")
# Get a specific dataset
dataset = client.datasets.get(id="my-dataset-id")
print(f"Dataset: {dataset.name}")
print(f"Created: {dataset.created_at}")
print(f"Updated: {dataset.updated_at}")
print(f"Schema: {dataset.schema_}")
if dataset.dataset_parts:
for part in dataset.dataset_parts:
print(f"Part: {part}")
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment