Implementation:Togethercomputer Together python ChatCompletionResponse Handling

Attribute	Value
Implementation Name	ChatCompletionResponse_Handling
Overview	Pydantic models for non-streaming and streaming chat completion responses, including choices, usage data, and chunk deltas.
Source File	src/together/types/chat_completions.py
Lines	L163-211 (response types), L12-18 (common types imported from together.types.common)
Domain	NLP, API_Client, Inference
Repository	togethercomputer/together-python
Last Updated	2026-02-15 16:00 GMT

Code Reference

ChatCompletionResponse (L171-185)

class ChatCompletionResponse(BaseModel):
    # request id
    id: str | None = None
    # object type
    object: ObjectType | None = None
    # created timestamp
    created: int | None = None
    # model name
    model: str | None = None
    # choices list
    choices: List[ChatCompletionChoicesData] | None = None
    # prompt list
    prompt: List[PromptPart] | List[None] | None = None
    # token usage data
    usage: UsageData | None = None

ChatCompletionChoicesData (L163-168)

class ChatCompletionChoicesData(BaseModel):
    index: int | None = None
    logprobs: LogprobsPart | None = None
    seed: int | None = None
    finish_reason: FinishReason | None = None
    message: ChatCompletionMessage | None = None

ChatCompletionChunk (L196-210)

class ChatCompletionChunk(BaseModel):
    # request id
    id: str | None = None
    # object type
    object: ObjectType | None = None
    # created timestamp
    created: int | None = None
    # model name
    model: str | None = None
    # delta content
    choices: List[ChatCompletionChoicesChunk] | None = None
    # finish reason
    finish_reason: FinishReason | None = None
    # token usage data
    usage: UsageData | None = None

ChatCompletionChoicesChunk (L188-193)

class ChatCompletionChoicesChunk(BaseModel):
    index: int | None = None
    logprobs: float | None = None
    seed: int | None = None
    finish_reason: FinishReason | None = None
    delta: DeltaContent | None = None

Supporting Types (from together.types.common)

class FinishReason(str, Enum):
    Length = "length"
    StopSequence = "stop"
    EOS = "eos"
    ToolCalls = "tool_calls"
    Error = "error"
    Null = ""

class UsageData(BaseModel):
    prompt_tokens: int
    completion_tokens: int
    total_tokens: int

class DeltaContent(BaseModel):
    content: str | None = None

class LogprobsPart(BaseModel):
    tokens: List[str | None] | None = None
    token_logprobs: List[float | None] | None = None

class PromptPart(BaseModel):
    text: str | None = None
    logprobs: LogprobsPart | None = None

Import

from together.types import ChatCompletionResponse, ChatCompletionChunk
from together.types.chat_completions import (
    ChatCompletionChoicesData,
    ChatCompletionChoicesChunk,
)
from together.types.common import UsageData, FinishReason, DeltaContent, LogprobsPart

I/O Contract

ChatCompletionResponse Fields

Field	Type	Description
`id`	None	Unique request identifier.
`object`	None	Object type, typically `"chat.completion"`.
`created`	None	Unix timestamp of response creation.
`model`	None	Model identifier that generated the response.
`choices`	None	List of generated completions.
`prompt`	List[None] \| None	Prompt data (when echo is enabled).
`usage`	None	Token usage statistics.

ChatCompletionChoicesData Fields

Field	Type	Description
`index`	None	Choice index (0-based).
`logprobs`	None	Token log probabilities (when requested).
`seed`	None	Random seed used for this choice.
`finish_reason`	None	Reason generation stopped: "stop", "length", "eos", "tool_calls", or "error".
`message`	None	The generated message with role, content, and optional tool_calls.

ChatCompletionChunk Fields

Field	Type	Description
`id`	None	Unique request identifier (same across all chunks in a stream).
`object`	None	Object type, typically `"chat.completion.chunk"`.
`created`	None	Unix timestamp of chunk creation.
`model`	None	Model identifier.
`choices`	None	List of chunk choices with delta content.
`finish_reason`	None	Finish reason (set on the final chunk).
`usage`	None	Token usage (may appear on the final chunk).

ChatCompletionChoicesChunk Fields

Field	Type	Description
`index`	None	Choice index (0-based).
`logprobs`	None	Log probability value for the token.
`seed`	None	Random seed used.
`finish_reason`	None	Finish reason (set on the final chunk for this choice).
`delta`	None	Incremental content with a `content` string field.

UsageData Fields

Field	Type	Description
`prompt_tokens`	`int`	Number of tokens in the input prompt.
`completion_tokens`	`int`	Number of tokens generated in the completion.
`total_tokens`	`int`	Sum of prompt_tokens and completion_tokens.

Usage Examples

Accessing Non-Streaming Response Content

from together import Together

client = Together()

response = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
    messages=[{"role": "user", "content": "What is 2+2?"}],
)

# Extract generated text
text = response.choices[0].message.content
print(f"Response: {text}")

# Check finish reason
finish_reason = response.choices[0].finish_reason
print(f"Finish reason: {finish_reason}")  # e.g., "stop", "length", "eos"

# Read token usage
print(f"Prompt tokens: {response.usage.prompt_tokens}")
print(f"Completion tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")

# Access metadata
print(f"Request ID: {response.id}")
print(f"Model: {response.model}")
print(f"Created: {response.created}")

Iterating Over Streaming Chunks

from together import Together

client = Together()

stream = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Tell me a short story."}],
    stream=True,
)

full_response = ""
for chunk in stream:
    if chunk.choices and chunk.choices[0].delta and chunk.choices[0].delta.content:
        token = chunk.choices[0].delta.content
        full_response += token
        print(token, end="", flush=True)

    # Check for finish reason on final chunk
    if chunk.choices and chunk.choices[0].finish_reason:
        print(f"\nFinished: {chunk.choices[0].finish_reason}")

print(f"\nFull response: {full_response}")

Handling Tool Calls in Response

from together import Together
import json

client = Together()

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather for a city.",
            "parameters": {
                "type": "object",
                "properties": {"city": {"type": "string"}},
                "required": ["city"],
            },
        },
    }
]

response = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=tools,
    tool_choice="auto",
)

choice = response.choices[0]

if choice.finish_reason == "tool_calls" and choice.message.tool_calls:
    for tool_call in choice.message.tool_calls:
        func_name = tool_call.function.name
        func_args = json.loads(tool_call.function.arguments)
        print(f"Tool call: {func_name}({func_args})")
else:
    print(f"Text response: {choice.message.content}")

Handling Multiple Choices

from together import Together

client = Together()

response = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Give me a creative name for a cat."}],
    n=3,
    temperature=1.0,
)

for i, choice in enumerate(response.choices):
    print(f"Choice {choice.index}: {choice.message.content}")
    print(f"  Finish reason: {choice.finish_reason}")
    if choice.seed:
        print(f"  Seed: {choice.seed}")

Async Streaming

import asyncio
from together import AsyncTogether

async def stream_response():
    client = AsyncTogether()

    stream = await client.chat.completions.create(
        model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
        messages=[{"role": "user", "content": "Count to 10."}],
        stream=True,
    )

    async for chunk in stream:
        if chunk.choices and chunk.choices[0].delta and chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)
    print()

asyncio.run(stream_response())

Key Implementation Details

All response model fields are optional (None defaults) to gracefully handle partial or unexpected API responses.
ChatCompletionResponse uses ChatCompletionChoicesData with a full message field, while ChatCompletionChunk uses ChatCompletionChoicesChunk with a delta field containing only incremental content.
The FinishReason enum includes a Null = "" variant to handle empty strings in streaming responses before generation completes.
LogprobsPart in ChatCompletionChoicesData provides structured token-level probabilities, while ChatCompletionChoicesChunk uses a simple float for the logprob value.
The DeltaContent model only contains a content: str | None field -- it does not include role or tool_calls deltas.
Response objects are constructed in src/together/resources/chat/completions.py: non-streaming responses are built as ChatCompletionResponse(**response.data), while streaming chunks are built as ChatCompletionChunk(**line.data) in a generator expression.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment

Code Reference

ChatCompletionResponse (L171-185)

ChatCompletionChoicesData (L163-168)

ChatCompletionChunk (L196-210)

ChatCompletionChoicesChunk (L188-193)

Supporting Types (from together.types.common)

Import

I/O Contract

ChatCompletionResponse Fields

ChatCompletionChoicesData Fields

ChatCompletionChunk Fields

ChatCompletionChoicesChunk Fields

UsageData Fields

Usage Examples

Accessing Non-Streaming Response Content

Iterating Over Streaming Chunks

Handling Tool Calls in Response

Handling Multiple Choices

Async Streaming

Key Implementation Details

Related

Page Connections