Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Workflow:Cohere ai Cohere python Chat Completion

From Leeroopedia
Knowledge Sources
Domains LLMs, Text_Generation, API_Client
Last Updated 2026-02-15 14:00 GMT

Overview

End-to-end process for sending chat messages to Cohere language models and receiving non-streaming text responses using the Python SDK.

Description

This workflow covers the standard procedure for generating text completions through Cohere's Chat API (V2). It starts with installing and configuring the SDK client with authentication credentials, constructing a properly formatted message sequence (system, user, and assistant roles), sending the request to a specified model, and processing the structured response including text content, citations, and usage metadata.

Usage

Execute this workflow when you need to send a single prompt or multi-turn conversation to a Cohere model and receive a complete response in one request. This is appropriate for use cases where you do not need incremental token delivery (streaming) and prefer to wait for the full response before processing.

Execution Steps

Step 1: Install SDK and Configure Authentication

Install the Cohere Python package from PyPI and configure API authentication. The client reads the API key from the constructor parameter or from the CO_API_KEY environment variable. Optionally, set a custom base URL via CO_API_URL for private deployments.

Key considerations:

  • The CO_API_KEY environment variable is the recommended approach to avoid hardcoding secrets
  • COHERE_API_KEY is also accepted as a fallback environment variable
  • The client supports both string API keys and callable factories for dynamic key rotation

Step 2: Initialize the Client

Instantiate the ClientV2 class, which provides access to both V1 and V2 API endpoints. The constructor creates the underlying HTTP client (httpx-based), sets up the client wrapper with authentication headers, and configures retry logic with exponential backoff.

Key considerations:

  • ClientV2 combines V1 and V2 API methods via multiple inheritance
  • A _CombinedRawClient proxy resolves attribute collisions between the two API versions
  • The client supports context manager usage for proper resource cleanup
  • Custom httpx clients can be injected for advanced HTTP configuration

Step 3: Construct the Message Sequence

Build the message list following the V2 chat message schema. Each message has a role (user, assistant, system, or tool) and content. The system message sets the model's behavior and personality, while user and assistant messages form the conversation history.

Key considerations:

  • Messages follow the ChatMessageV2 union type with role-based discrimination
  • System messages define the preamble and overall model behavior
  • Multi-turn conversations include alternating user and assistant messages
  • Content can be a plain string or a list of structured content items (text, image)

Step 4: Send the Chat Request

Call the chat method on the V2 client with the model name and message sequence. The request flows through the V2Client to the RawV2Client, which serializes parameters, makes the HTTP POST request, and maps the response to a typed V2ChatResponse object.

Key considerations:

  • The model parameter specifies which Cohere model to use (e.g., command-r-plus-08-2024)
  • Optional parameters include temperature, max_tokens, stop_sequences, frequency_penalty, and presence_penalty
  • The safety_mode parameter controls content filtering behavior
  • Request options allow per-call timeout and header overrides

Step 5: Process the Response

Extract the generated text, citations, and metadata from the V2ChatResponse object. The response contains the assistant's message with content items, a finish reason, and usage statistics including billed units and token counts.

Key considerations:

  • The response includes a message object with role and content fields
  • Content items can be text or thinking blocks (when thinking mode is enabled)
  • The usage field provides input_tokens, output_tokens, and billed_units for cost tracking
  • The finish_reason indicates whether the response completed normally or was truncated

Execution Diagram

GitHub URL

Workflow Repository