Implementation:Cohere ai Cohere python EmbedJob Model
| Knowledge Sources | |
|---|---|
| Domains | SDK, Embeddings, Batch Processing |
| Last Updated | 2026-02-15 14:00 GMT |
Overview
EmbedJob is a Pydantic model representing the metadata and status of a batch embedding job in the Cohere platform, used for asynchronous large-scale embedding operations.
Description
The EmbedJob class models the state of a batch embedding job that processes an entire dataset asynchronously. Instead of embedding texts one request at a time, batch embed jobs allow users to submit a dataset and receive embeddings for all entries once processing completes.
Each embed job tracks:
- job_id: The unique identifier for the job
- name: An optional human-readable name
- status: The current processing state (one of
"processing","complete","cancelling","cancelled", or"failed") - created_at: The datetime when the job was created
- input_dataset_id: The ID of the dataset being embedded
- output_dataset_id: The ID of the resulting dataset containing embeddings (populated upon completion)
- model: The embedding model used (e.g.,
"embed-english-v3.0") - truncate: The truncation strategy applied (
"START"or"END") - meta: Optional API metadata
The class extends UncheckedBaseModel and is auto-generated by the Fern API definition toolchain.
Usage
Use EmbedJob when working with the Cohere batch embed jobs API to create, monitor, and retrieve results from large-scale embedding operations. This model is returned by job creation, listing, and status-checking endpoints.
Code Reference
Source Location
- Repository: Cohere Python SDK
- File:
src/cohere/types/embed_job.py
Signature
class EmbedJob(UncheckedBaseModel):
job_id: str
name: typing.Optional[str] = None
status: EmbedJobStatus # "processing" | "complete" | "cancelling" | "cancelled" | "failed"
created_at: dt.datetime
input_dataset_id: str
output_dataset_id: typing.Optional[str] = None
model: str
truncate: EmbedJobTruncate # "START" | "END"
meta: typing.Optional[ApiMeta] = None
Import
from cohere.types import EmbedJob
I/O Contract
Fields
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
job_id |
str |
Yes | -- | ID of the embed job |
name |
Optional[str] |
No | None |
The name of the embed job |
status |
EmbedJobStatus |
Yes | -- | The status of the embed job: "processing", "complete", "cancelling", "cancelled", or "failed"
|
created_at |
datetime |
Yes | -- | The creation date of the embed job |
input_dataset_id |
str |
Yes | -- | ID of the input dataset |
output_dataset_id |
Optional[str] |
No | None |
ID of the resulting output dataset (available when job is complete) |
model |
str |
Yes | -- | ID of the model used to embed |
truncate |
EmbedJobTruncate |
Yes | -- | The truncation option used: "START" or "END"
|
meta |
Optional[ApiMeta] |
No | None |
API metadata including token counts and warnings |
Usage Examples
Creating and Monitoring an Embed Job
import cohere
import time
co = cohere.Client()
# Create an embed job
job = co.embed_jobs.create(
model="embed-english-v3.0",
dataset_id="my-dataset-id",
input_type="search_document",
truncate="END",
)
print(f"Job ID: {job.job_id}")
print(f"Status: {job.status}")
print(f"Created at: {job.created_at}")
# Poll for completion
while job.status == "processing":
time.sleep(10)
job = co.embed_jobs.get(id=job.job_id)
print(f"Status: {job.status}")
if job.status == "complete":
print(f"Output dataset ID: {job.output_dataset_id}")
elif job.status == "failed":
print("Embed job failed")
Listing Embed Jobs
import cohere
co = cohere.Client()
# List all embed jobs
jobs = co.embed_jobs.list()
for job in jobs.embed_jobs:
print(f"Job: {job.job_id} | Model: {job.model} | Status: {job.status}")
if job.name:
print(f" Name: {job.name}")
print(f" Input dataset: {job.input_dataset_id}")
print(f" Truncate: {job.truncate}")