Overview
Implements text generation result structures and a streaming response handler for Cohere generation models deployed on AWS.
Description
The AwsGeneration module provides TokenLikelihood, Generation, Generations, StreamingText, and StreamingGenerations classes for handling text generation outputs from Cohere models running on AWS SageMaker or Amazon Bedrock. The Generations class includes a from_dict factory method for deserializing raw API responses into structured objects. The StreamingGenerations class handles chunked streaming responses from both SageMaker and Bedrock endpoints, reassembling partial JSON payloads and yielding StreamingText named tuples as text fragments arrive.
Usage
Use these classes when consuming text generation results from Cohere models deployed on AWS. The non-streaming classes (Generation, Generations) are used for synchronous invocations, while StreamingGenerations is used when streaming is enabled, allowing incremental processing of generated text as it arrives from the endpoint.
Code Reference
Source Location
- Repository: Cohere Python SDK
- File:
src/cohere/manually_maintained/cohere_aws/generation.py
Signature
class TokenLikelihood(CohereObject):
def __init__(self, token: str, likelihood: float) -> None: ...
class Generation(CohereObject):
def __init__(self, text: str, token_likelihoods: List[TokenLikelihood]) -> None: ...
class Generations(CohereObject):
def __init__(self, generations: List[Generation]) -> None: ...
@classmethod
def from_dict(cls, response: Dict[str, Any]) -> "Generations": ...
def __iter__(self) -> iter: ...
def __next__(self) -> next: ...
StreamingText = NamedTuple("StreamingText", [
("index", Optional[int]),
("text", str),
("is_finished", bool),
])
class StreamingGenerations(CohereObject):
def __init__(self, stream, mode: Mode) -> None: ...
def _make_response_item(self, streaming_item) -> Optional[StreamingText]: ...
def __iter__(self) -> Generator[StreamingText, None, None]: ...
Import
from cohere.manually_maintained.cohere_aws.generation import (
TokenLikelihood,
Generation,
Generations,
StreamingText,
StreamingGenerations,
)
I/O Contract
TokenLikelihood
| Parameter |
Type |
Description
|
token |
str |
The text token.
|
likelihood |
float |
The log-likelihood score for this token.
|
Generation
| Parameter |
Type |
Description
|
text |
str |
The generated text content.
|
token_likelihoods |
List[TokenLikelihood] |
Per-token likelihood scores for the generated text. May be None if not requested.
|
Generations
| Parameter |
Type |
Description
|
generations |
List[Generation] |
A list of Generation objects.
|
| Method |
Return Type |
Description
|
from_dict(response) |
Generations |
Class method that parses a raw API response dictionary (with a "generations" key) into a Generations instance.
|
__iter__() |
Iterator |
Returns an iterator over the contained Generation objects.
|
__next__() |
Generation |
Returns the next Generation from the internal iterator.
|
StreamingText (NamedTuple)
| Field |
Type |
Description
|
index |
Optional[int] |
The index of the generation stream this text belongs to.
|
text |
str |
The text fragment received from the stream.
|
is_finished |
bool |
Whether this is the final text chunk in the stream.
|
StreamingGenerations
| Parameter |
Type |
Description
|
stream |
(iterable) |
The raw streaming response from the AWS endpoint (SageMaker or Bedrock).
|
mode |
Mode |
The deployment mode, either Mode.SAGEMAKER or Mode.BEDROCK. Determines the payload and byte keys used to parse stream chunks.
|
| Attribute |
Type |
Description
|
id |
str or None |
The response ID, populated once the stream completes.
|
generations |
Generations or None |
The full Generations object, populated from the final stream item.
|
finish_reason |
str or None |
The reason the generation stream ended (e.g., "COMPLETE").
|
Usage Examples
from cohere.manually_maintained.cohere_aws.generation import (
Generations,
StreamingGenerations,
)
from cohere.manually_maintained.cohere_aws.mode import Mode
# Parse a synchronous generation response
response_dict = {
"generations": [
{
"text": "The capital of France is Paris.",
"token_likelihoods": [
{"token": "The", "likelihood": -0.5},
{"token": " capital", "likelihood": -0.3},
]
}
]
}
generations = Generations.from_dict(response_dict)
for gen in generations:
print(gen.text) # "The capital of France is Paris."
for tl in gen.token_likelihoods:
print(f" {tl.token}: {tl.likelihood}")
# Streaming usage (conceptual example with a SageMaker endpoint)
# stream = sagemaker_client.invoke_endpoint_with_response_stream(...)["Body"]
# streaming_gens = StreamingGenerations(stream, mode=Mode.SAGEMAKER)
# for text_chunk in streaming_gens:
# print(text_chunk.text, end="")
# print()
# print(f"Finish reason: {streaming_gens.finish_reason}")
Related Pages