Implementation:Togethercomputer Together python ChatCompletionResponse Handling
Appearance
| Attribute | Value |
|---|---|
| Implementation Name | ChatCompletionResponse_Handling |
| Overview | Pydantic models for non-streaming and streaming chat completion responses, including choices, usage data, and chunk deltas. |
| Source File | src/together/types/chat_completions.py |
| Lines | L163-211 (response types), L12-18 (common types imported from together.types.common) |
| Domain | NLP, API_Client, Inference |
| Repository | togethercomputer/together-python |
| Last Updated | 2026-02-15 16:00 GMT |
Code Reference
ChatCompletionResponse (L171-185)
class ChatCompletionResponse(BaseModel):
# request id
id: str | None = None
# object type
object: ObjectType | None = None
# created timestamp
created: int | None = None
# model name
model: str | None = None
# choices list
choices: List[ChatCompletionChoicesData] | None = None
# prompt list
prompt: List[PromptPart] | List[None] | None = None
# token usage data
usage: UsageData | None = None
ChatCompletionChoicesData (L163-168)
class ChatCompletionChoicesData(BaseModel):
index: int | None = None
logprobs: LogprobsPart | None = None
seed: int | None = None
finish_reason: FinishReason | None = None
message: ChatCompletionMessage | None = None
ChatCompletionChunk (L196-210)
class ChatCompletionChunk(BaseModel):
# request id
id: str | None = None
# object type
object: ObjectType | None = None
# created timestamp
created: int | None = None
# model name
model: str | None = None
# delta content
choices: List[ChatCompletionChoicesChunk] | None = None
# finish reason
finish_reason: FinishReason | None = None
# token usage data
usage: UsageData | None = None
ChatCompletionChoicesChunk (L188-193)
class ChatCompletionChoicesChunk(BaseModel):
index: int | None = None
logprobs: float | None = None
seed: int | None = None
finish_reason: FinishReason | None = None
delta: DeltaContent | None = None
Supporting Types (from together.types.common)
class FinishReason(str, Enum):
Length = "length"
StopSequence = "stop"
EOS = "eos"
ToolCalls = "tool_calls"
Error = "error"
Null = ""
class UsageData(BaseModel):
prompt_tokens: int
completion_tokens: int
total_tokens: int
class DeltaContent(BaseModel):
content: str | None = None
class LogprobsPart(BaseModel):
tokens: List[str | None] | None = None
token_logprobs: List[float | None] | None = None
class PromptPart(BaseModel):
text: str | None = None
logprobs: LogprobsPart | None = None
Import
from together.types import ChatCompletionResponse, ChatCompletionChunk
from together.types.chat_completions import (
ChatCompletionChoicesData,
ChatCompletionChoicesChunk,
)
from together.types.common import UsageData, FinishReason, DeltaContent, LogprobsPart
I/O Contract
ChatCompletionResponse Fields
| Field | Type | Description |
|---|---|---|
id |
None | Unique request identifier. |
object |
None | Object type, typically "chat.completion".
|
created |
None | Unix timestamp of response creation. |
model |
None | Model identifier that generated the response. |
choices |
None | List of generated completions. |
prompt |
List[None] | None | Prompt data (when echo is enabled). |
usage |
None | Token usage statistics. |
ChatCompletionChoicesData Fields
| Field | Type | Description |
|---|---|---|
index |
None | Choice index (0-based). |
logprobs |
None | Token log probabilities (when requested). |
seed |
None | Random seed used for this choice. |
finish_reason |
None | Reason generation stopped: "stop", "length", "eos", "tool_calls", or "error". |
message |
None | The generated message with role, content, and optional tool_calls. |
ChatCompletionChunk Fields
| Field | Type | Description |
|---|---|---|
id |
None | Unique request identifier (same across all chunks in a stream). |
object |
None | Object type, typically "chat.completion.chunk".
|
created |
None | Unix timestamp of chunk creation. |
model |
None | Model identifier. |
choices |
None | List of chunk choices with delta content. |
finish_reason |
None | Finish reason (set on the final chunk). |
usage |
None | Token usage (may appear on the final chunk). |
ChatCompletionChoicesChunk Fields
| Field | Type | Description |
|---|---|---|
index |
None | Choice index (0-based). |
logprobs |
None | Log probability value for the token. |
seed |
None | Random seed used. |
finish_reason |
None | Finish reason (set on the final chunk for this choice). |
delta |
None | Incremental content with a content string field.
|
UsageData Fields
| Field | Type | Description |
|---|---|---|
prompt_tokens |
int |
Number of tokens in the input prompt. |
completion_tokens |
int |
Number of tokens generated in the completion. |
total_tokens |
int |
Sum of prompt_tokens and completion_tokens. |
Usage Examples
Accessing Non-Streaming Response Content
from together import Together
client = Together()
response = client.chat.completions.create(
model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
messages=[{"role": "user", "content": "What is 2+2?"}],
)
# Extract generated text
text = response.choices[0].message.content
print(f"Response: {text}")
# Check finish reason
finish_reason = response.choices[0].finish_reason
print(f"Finish reason: {finish_reason}") # e.g., "stop", "length", "eos"
# Read token usage
print(f"Prompt tokens: {response.usage.prompt_tokens}")
print(f"Completion tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")
# Access metadata
print(f"Request ID: {response.id}")
print(f"Model: {response.model}")
print(f"Created: {response.created}")
Iterating Over Streaming Chunks
from together import Together
client = Together()
stream = client.chat.completions.create(
model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
messages=[{"role": "user", "content": "Tell me a short story."}],
stream=True,
)
full_response = ""
for chunk in stream:
if chunk.choices and chunk.choices[0].delta and chunk.choices[0].delta.content:
token = chunk.choices[0].delta.content
full_response += token
print(token, end="", flush=True)
# Check for finish reason on final chunk
if chunk.choices and chunk.choices[0].finish_reason:
print(f"\nFinished: {chunk.choices[0].finish_reason}")
print(f"\nFull response: {full_response}")
Handling Tool Calls in Response
from together import Together
import json
client = Together()
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a city.",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
},
},
}
]
response = client.chat.completions.create(
model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
messages=[{"role": "user", "content": "What's the weather in Paris?"}],
tools=tools,
tool_choice="auto",
)
choice = response.choices[0]
if choice.finish_reason == "tool_calls" and choice.message.tool_calls:
for tool_call in choice.message.tool_calls:
func_name = tool_call.function.name
func_args = json.loads(tool_call.function.arguments)
print(f"Tool call: {func_name}({func_args})")
else:
print(f"Text response: {choice.message.content}")
Handling Multiple Choices
from together import Together
client = Together()
response = client.chat.completions.create(
model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
messages=[{"role": "user", "content": "Give me a creative name for a cat."}],
n=3,
temperature=1.0,
)
for i, choice in enumerate(response.choices):
print(f"Choice {choice.index}: {choice.message.content}")
print(f" Finish reason: {choice.finish_reason}")
if choice.seed:
print(f" Seed: {choice.seed}")
Async Streaming
import asyncio
from together import AsyncTogether
async def stream_response():
client = AsyncTogether()
stream = await client.chat.completions.create(
model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
messages=[{"role": "user", "content": "Count to 10."}],
stream=True,
)
async for chunk in stream:
if chunk.choices and chunk.choices[0].delta and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
print()
asyncio.run(stream_response())
Key Implementation Details
- All response model fields are optional (
Nonedefaults) to gracefully handle partial or unexpected API responses. ChatCompletionResponseusesChatCompletionChoicesDatawith a fullmessagefield, whileChatCompletionChunkusesChatCompletionChoicesChunkwith adeltafield containing only incremental content.- The
FinishReasonenum includes aNull = ""variant to handle empty strings in streaming responses before generation completes. LogprobsPartinChatCompletionChoicesDataprovides structured token-level probabilities, whileChatCompletionChoicesChunkuses a simplefloatfor the logprob value.- The
DeltaContentmodel only contains acontent: str | Nonefield -- it does not include role or tool_calls deltas. - Response objects are constructed in
src/together/resources/chat/completions.py: non-streaming responses are built asChatCompletionResponse(**response.data), while streaming chunks are built asChatCompletionChunk(**line.data)in a generator expression.
Related
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment