Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Togethercomputer Together python ChatCompletionResponse Handling

From Leeroopedia
Revision as of 13:55, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Togethercomputer_Together_python_ChatCompletionResponse_Handling.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Attribute Value
Implementation Name ChatCompletionResponse_Handling
Overview Pydantic models for non-streaming and streaming chat completion responses, including choices, usage data, and chunk deltas.
Source File src/together/types/chat_completions.py
Lines L163-211 (response types), L12-18 (common types imported from together.types.common)
Domain NLP, API_Client, Inference
Repository togethercomputer/together-python
Last Updated 2026-02-15 16:00 GMT

Code Reference

ChatCompletionResponse (L171-185)

class ChatCompletionResponse(BaseModel):
    # request id
    id: str | None = None
    # object type
    object: ObjectType | None = None
    # created timestamp
    created: int | None = None
    # model name
    model: str | None = None
    # choices list
    choices: List[ChatCompletionChoicesData] | None = None
    # prompt list
    prompt: List[PromptPart] | List[None] | None = None
    # token usage data
    usage: UsageData | None = None

ChatCompletionChoicesData (L163-168)

class ChatCompletionChoicesData(BaseModel):
    index: int | None = None
    logprobs: LogprobsPart | None = None
    seed: int | None = None
    finish_reason: FinishReason | None = None
    message: ChatCompletionMessage | None = None

ChatCompletionChunk (L196-210)

class ChatCompletionChunk(BaseModel):
    # request id
    id: str | None = None
    # object type
    object: ObjectType | None = None
    # created timestamp
    created: int | None = None
    # model name
    model: str | None = None
    # delta content
    choices: List[ChatCompletionChoicesChunk] | None = None
    # finish reason
    finish_reason: FinishReason | None = None
    # token usage data
    usage: UsageData | None = None

ChatCompletionChoicesChunk (L188-193)

class ChatCompletionChoicesChunk(BaseModel):
    index: int | None = None
    logprobs: float | None = None
    seed: int | None = None
    finish_reason: FinishReason | None = None
    delta: DeltaContent | None = None

Supporting Types (from together.types.common)

class FinishReason(str, Enum):
    Length = "length"
    StopSequence = "stop"
    EOS = "eos"
    ToolCalls = "tool_calls"
    Error = "error"
    Null = ""

class UsageData(BaseModel):
    prompt_tokens: int
    completion_tokens: int
    total_tokens: int

class DeltaContent(BaseModel):
    content: str | None = None

class LogprobsPart(BaseModel):
    tokens: List[str | None] | None = None
    token_logprobs: List[float | None] | None = None

class PromptPart(BaseModel):
    text: str | None = None
    logprobs: LogprobsPart | None = None

Import

from together.types import ChatCompletionResponse, ChatCompletionChunk
from together.types.chat_completions import (
    ChatCompletionChoicesData,
    ChatCompletionChoicesChunk,
)
from together.types.common import UsageData, FinishReason, DeltaContent, LogprobsPart

I/O Contract

ChatCompletionResponse Fields

Field Type Description
id None Unique request identifier.
object None Object type, typically "chat.completion".
created None Unix timestamp of response creation.
model None Model identifier that generated the response.
choices None List of generated completions.
prompt List[None] | None Prompt data (when echo is enabled).
usage None Token usage statistics.

ChatCompletionChoicesData Fields

Field Type Description
index None Choice index (0-based).
logprobs None Token log probabilities (when requested).
seed None Random seed used for this choice.
finish_reason None Reason generation stopped: "stop", "length", "eos", "tool_calls", or "error".
message None The generated message with role, content, and optional tool_calls.

ChatCompletionChunk Fields

Field Type Description
id None Unique request identifier (same across all chunks in a stream).
object None Object type, typically "chat.completion.chunk".
created None Unix timestamp of chunk creation.
model None Model identifier.
choices None List of chunk choices with delta content.
finish_reason None Finish reason (set on the final chunk).
usage None Token usage (may appear on the final chunk).

ChatCompletionChoicesChunk Fields

Field Type Description
index None Choice index (0-based).
logprobs None Log probability value for the token.
seed None Random seed used.
finish_reason None Finish reason (set on the final chunk for this choice).
delta None Incremental content with a content string field.

UsageData Fields

Field Type Description
prompt_tokens int Number of tokens in the input prompt.
completion_tokens int Number of tokens generated in the completion.
total_tokens int Sum of prompt_tokens and completion_tokens.

Usage Examples

Accessing Non-Streaming Response Content

from together import Together

client = Together()

response = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
    messages=[{"role": "user", "content": "What is 2+2?"}],
)

# Extract generated text
text = response.choices[0].message.content
print(f"Response: {text}")

# Check finish reason
finish_reason = response.choices[0].finish_reason
print(f"Finish reason: {finish_reason}")  # e.g., "stop", "length", "eos"

# Read token usage
print(f"Prompt tokens: {response.usage.prompt_tokens}")
print(f"Completion tokens: {response.usage.completion_tokens}")
print(f"Total tokens: {response.usage.total_tokens}")

# Access metadata
print(f"Request ID: {response.id}")
print(f"Model: {response.model}")
print(f"Created: {response.created}")

Iterating Over Streaming Chunks

from together import Together

client = Together()

stream = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Tell me a short story."}],
    stream=True,
)

full_response = ""
for chunk in stream:
    if chunk.choices and chunk.choices[0].delta and chunk.choices[0].delta.content:
        token = chunk.choices[0].delta.content
        full_response += token
        print(token, end="", flush=True)

    # Check for finish reason on final chunk
    if chunk.choices and chunk.choices[0].finish_reason:
        print(f"\nFinished: {chunk.choices[0].finish_reason}")

print(f"\nFull response: {full_response}")

Handling Tool Calls in Response

from together import Together
import json

client = Together()

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather for a city.",
            "parameters": {
                "type": "object",
                "properties": {"city": {"type": "string"}},
                "required": ["city"],
            },
        },
    }
]

response = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=tools,
    tool_choice="auto",
)

choice = response.choices[0]

if choice.finish_reason == "tool_calls" and choice.message.tool_calls:
    for tool_call in choice.message.tool_calls:
        func_name = tool_call.function.name
        func_args = json.loads(tool_call.function.arguments)
        print(f"Tool call: {func_name}({func_args})")
else:
    print(f"Text response: {choice.message.content}")

Handling Multiple Choices

from together import Together

client = Together()

response = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Give me a creative name for a cat."}],
    n=3,
    temperature=1.0,
)

for i, choice in enumerate(response.choices):
    print(f"Choice {choice.index}: {choice.message.content}")
    print(f"  Finish reason: {choice.finish_reason}")
    if choice.seed:
        print(f"  Seed: {choice.seed}")

Async Streaming

import asyncio
from together import AsyncTogether

async def stream_response():
    client = AsyncTogether()

    stream = await client.chat.completions.create(
        model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
        messages=[{"role": "user", "content": "Count to 10."}],
        stream=True,
    )

    async for chunk in stream:
        if chunk.choices and chunk.choices[0].delta and chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="", flush=True)
    print()

asyncio.run(stream_response())

Key Implementation Details

  • All response model fields are optional (None defaults) to gracefully handle partial or unexpected API responses.
  • ChatCompletionResponse uses ChatCompletionChoicesData with a full message field, while ChatCompletionChunk uses ChatCompletionChoicesChunk with a delta field containing only incremental content.
  • The FinishReason enum includes a Null = "" variant to handle empty strings in streaming responses before generation completes.
  • LogprobsPart in ChatCompletionChoicesData provides structured token-level probabilities, while ChatCompletionChoicesChunk uses a simple float for the logprob value.
  • The DeltaContent model only contains a content: str | None field -- it does not include role or tool_calls deltas.
  • Response objects are constructed in src/together/resources/chat/completions.py: non-streaming responses are built as ChatCompletionResponse(**response.data), while streaming chunks are built as ChatCompletionChunk(**line.data) in a generator expression.

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment