Workflow:Cohere ai Cohere python AWS Bedrock Deployment
| Knowledge Sources | |
|---|---|
| Domains | LLMs, Cloud_Deployment, AWS, API_Client |
| Last Updated | 2026-02-15 14:00 GMT |
Overview
End-to-end process for accessing Cohere models deployed on AWS Bedrock through the Cohere Python SDK, using AWS SigV4 authentication and Bedrock-specific request/response transformation.
Description
This workflow demonstrates how to use Cohere models hosted on AWS Bedrock through the same Python SDK used for the Cohere platform. The SDK provides specialized BedrockClient and BedrockClientV2 classes that transparently handle AWS SigV4 request signing, URL rewriting to Bedrock endpoints, and response format transformation. This enables teams to use Cohere models within their existing AWS infrastructure while maintaining the familiar SDK interface.
Usage
Execute this workflow when your organization requires Cohere model access through AWS Bedrock for compliance, data residency, or infrastructure consolidation reasons. This is appropriate when models must run within your AWS account and API calls should route through AWS endpoints rather than the Cohere platform directly.
Execution Steps
Step 1: Configure AWS Credentials
Set up AWS authentication credentials. The SDK accepts explicit credentials (access key, secret key, session token) or falls back to the default boto3 session which reads from environment variables, IAM roles, or AWS configuration files.
Key considerations:
- Explicit credentials: aws_access_key, aws_secret_key, aws_session_token, aws_region
- If credentials are not provided, boto3.Session() uses the default credential chain
- Ensure the IAM role/user has permissions for bedrock:InvokeModel and bedrock:InvokeModelWithResponseStream
- The aws_region determines which Bedrock regional endpoint is used
Step 2: Initialize the Bedrock Client
Create a BedrockClientV2 instance with AWS credentials. The constructor creates a boto3 session, obtains SigV4 signing credentials, and configures httpx event hooks that intercept every outgoing request for AWS-specific transformation.
Key considerations:
- BedrockClientV2 extends AwsClientV2, which extends ClientV2 with AWS-specific httpx event hooks
- BedrockClient (V1) does not support the rerank endpoint; use BedrockClientV2 instead
- The base_url is set to a placeholder; actual Bedrock URLs are computed per-request
- SagemakerClient/SagemakerClientV2 follow the same pattern for SageMaker deployments
Step 3: Make API Calls Using Standard SDK Methods
Use the same chat(), chat_stream(), embed(), and rerank() methods as with the standard Cohere client. The httpx request event hook intercepts each outgoing call, rewrites the URL to the appropriate Bedrock endpoint, applies SigV4 signing, and transforms the request body.
Key considerations:
- The model parameter must specify a Bedrock model ARN or model ID
- The request hook extracts the endpoint type (chat, embed, rerank, generate) from the URL path
- The model field is removed from the request body and encoded into the Bedrock URL
- The stream parameter is removed from the body; streaming is determined by the endpoint URL pattern
Step 4: Handle Response Transformation
The httpx response event hook transforms Bedrock responses back to the standard Cohere API format. For non-streaming responses, the JSON body is parsed, token count headers are extracted, and the response is reconstructed. For streaming responses, the Amazon EventStream format is decoded into SSE-compatible events.
Key considerations:
- Non-streaming responses are mapped through response_mapping (chat, embed, generate, rerank)
- Token counts are extracted from X-Amzn-Bedrock-Input-Token-Count and X-Amzn-Bedrock-Output-Token-Count headers
- Streaming responses use the application/vnd.amazon.eventstream content type
- The stream generator decodes base64-encoded event payloads into typed event objects
- The rerank endpoint includes api_version mapping (v1=1, v2=2) for Bedrock compatibility
Step 5: Process Results
Process the response using the same code as for standard Cohere API responses. The SDK's transparent transformation layer ensures that downstream code does not need to handle Bedrock-specific formats.
Key considerations:
- Response types (V2ChatResponse, EmbedResponse, RerankResponse) are identical to platform responses
- Token usage metadata is populated from Bedrock headers rather than the response body
- Error handling follows the same pattern (ApiError subclasses for HTTP status codes)
- The SageMaker client follows an identical pattern with different URL formatting