Implementation:Cohere ai Cohere python AwsGeneration

Knowledge Sources	Cohere Python SDK
Domains	SDK, AWS, Text Generation
Last Updated	2026-02-15 14:00 GMT

Overview

Implements text generation result structures and a streaming response handler for Cohere generation models deployed on AWS.

Description

The AwsGeneration module provides TokenLikelihood, Generation, Generations, StreamingText, and StreamingGenerations classes for handling text generation outputs from Cohere models running on AWS SageMaker or Amazon Bedrock. The Generations class includes a from_dict factory method for deserializing raw API responses into structured objects. The StreamingGenerations class handles chunked streaming responses from both SageMaker and Bedrock endpoints, reassembling partial JSON payloads and yielding StreamingText named tuples as text fragments arrive.

Usage

Use these classes when consuming text generation results from Cohere models deployed on AWS. The non-streaming classes (Generation, Generations) are used for synchronous invocations, while StreamingGenerations is used when streaming is enabled, allowing incremental processing of generated text as it arrives from the endpoint.

Code Reference

Source Location

Repository: Cohere Python SDK
File: src/cohere/manually_maintained/cohere_aws/generation.py

Signature

class TokenLikelihood(CohereObject):
    def __init__(self, token: str, likelihood: float) -> None: ...

class Generation(CohereObject):
    def __init__(self, text: str, token_likelihoods: List[TokenLikelihood]) -> None: ...

class Generations(CohereObject):
    def __init__(self, generations: List[Generation]) -> None: ...
    @classmethod
    def from_dict(cls, response: Dict[str, Any]) -> "Generations": ...
    def __iter__(self) -> iter: ...
    def __next__(self) -> next: ...

StreamingText = NamedTuple("StreamingText", [
    ("index", Optional[int]),
    ("text", str),
    ("is_finished", bool),
])

class StreamingGenerations(CohereObject):
    def __init__(self, stream, mode: Mode) -> None: ...
    def _make_response_item(self, streaming_item) -> Optional[StreamingText]: ...
    def __iter__(self) -> Generator[StreamingText, None, None]: ...

Import

from cohere.manually_maintained.cohere_aws.generation import (
    TokenLikelihood,
    Generation,
    Generations,
    StreamingText,
    StreamingGenerations,
)

I/O Contract

TokenLikelihood

Parameter	Type	Description
`token`	`str`	The text token.
`likelihood`	`float`	The log-likelihood score for this token.

Generation

Parameter	Type	Description
`text`	`str`	The generated text content.
`token_likelihoods`	`List[TokenLikelihood]`	Per-token likelihood scores for the generated text. May be `None` if not requested.

Generations

Parameter	Type	Description
`generations`	`List[Generation]`	A list of Generation objects.

Method	Return Type	Description
`from_dict(response)`	`Generations`	Class method that parses a raw API response dictionary (with a `"generations"` key) into a Generations instance.
`__iter__()`	`Iterator`	Returns an iterator over the contained Generation objects.
`__next__()`	`Generation`	Returns the next Generation from the internal iterator.

StreamingText (NamedTuple)

Field	Type	Description
`index`	`Optional[int]`	The index of the generation stream this text belongs to.
`text`	`str`	The text fragment received from the stream.
`is_finished`	`bool`	Whether this is the final text chunk in the stream.

StreamingGenerations

Parameter	Type	Description
`stream`	(iterable)	The raw streaming response from the AWS endpoint (SageMaker or Bedrock).
`mode`	`Mode`	The deployment mode, either `Mode.SAGEMAKER` or `Mode.BEDROCK`. Determines the payload and byte keys used to parse stream chunks.

Attribute	Type	Description
`id`	`str` or `None`	The response ID, populated once the stream completes.
`generations`	`Generations` or `None`	The full Generations object, populated from the final stream item.
`finish_reason`	`str` or `None`	The reason the generation stream ended (e.g., `"COMPLETE"`).

Usage Examples

from cohere.manually_maintained.cohere_aws.generation import (
    Generations,
    StreamingGenerations,
)
from cohere.manually_maintained.cohere_aws.mode import Mode

# Parse a synchronous generation response
response_dict = {
    "generations": [
        {
            "text": "The capital of France is Paris.",
            "token_likelihoods": [
                {"token": "The", "likelihood": -0.5},
                {"token": " capital", "likelihood": -0.3},
            ]
        }
    ]
}
generations = Generations.from_dict(response_dict)
for gen in generations:
    print(gen.text)  # "The capital of France is Paris."
    for tl in gen.token_likelihoods:
        print(f"  {tl.token}: {tl.likelihood}")

# Streaming usage (conceptual example with a SageMaker endpoint)
# stream = sagemaker_client.invoke_endpoint_with_response_stream(...)["Body"]
# streaming_gens = StreamingGenerations(stream, mode=Mode.SAGEMAKER)
# for text_chunk in streaming_gens:
#     print(text_chunk.text, end="")
# print()
# print(f"Finish reason: {streaming_gens.finish_reason}")

Related Pages

Environment:Cohere_ai_Cohere_python_AWS_Integration_Dependencies

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment