Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Cohere ai Cohere python DatasetsClient Create

From Leeroopedia
Field Value
Type Implementation
Source Cohere Python SDK
Domain Data Ingestion Fine-tuning Dataset Management
Last Updated 2026-02-15
Implements Principle:Cohere_ai_Cohere_python_Dataset_Upload

Overview

Concrete method for uploading and validating datasets to Cohere's managed storage.

Description

DatasetsClient.create() uploads a data file with metadata and type classification. Supports training data and optional eval data in a single call. The wait() utility polls DatasetsClient.get() at configurable intervals until the dataset status indicates validation is complete.

Code Reference

  • src/cohere/datasets/client.py Lines L112-200 (create)
  • src/cohere/datasets/client.py Lines L229-258 (get)
  • src/cohere/utils.py Lines L93-116 (wait)

Signature

def create(
    self, *, name: str, type: DatasetType, data: core.File,
    keep_original_file: typing.Optional[bool] = None,
    skip_malformed_input: typing.Optional[bool] = None,
    keep_fields: typing.Optional[typing.Union[str, typing.Sequence[str]]] = None,
    optional_fields: typing.Optional[typing.Union[str, typing.Sequence[str]]] = None,
    text_separator: typing.Optional[str] = None,
    csv_delimiter: typing.Optional[str] = None,
    eval_data: typing.Optional[core.File] = None,
    request_options: typing.Optional[RequestOptions] = None,
) -> DatasetsCreateResponse:

Import

Access via client.datasets.create() and client.wait()

Inputs

Parameter Type Required Description
name str Yes Name for the dataset
type DatasetType Yes Dataset type (e.g., "chat-finetune-input")
data File Yes Data file in JSONL or CSV format
eval_data Optional[File] No Optional evaluation data file
skip_malformed_input Optional[bool] No Whether to skip malformed input rows

Outputs

DatasetsCreateResponse with dataset ID; after wait(): DatasetsGetResponse with validation status.

Example

from cohere import Client
client = Client()
dataset = client.datasets.create(
    name="my-finetune-data",
    type="chat-finetune-input",
    data=open("training_data.jsonl", "rb"),
    eval_data=open("eval_data.jsonl", "rb"),
)
validated = client.wait(dataset)
print(f"Dataset {validated.id} validated: {validated.validation_status}")

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment