Workflow:Groq Groq python Chat Completion

Knowledge Sources	Groq Python SDK Groq API Docs
Domains	LLMs, Inference, API_Client
Last Updated	2026-02-15 16:00 GMT

Overview

End-to-end process for generating chat completions from Groq-hosted LLMs using the synchronous Python client.

Description

This workflow covers the standard procedure for sending conversational prompts to Groq's inference API and receiving complete text responses. It uses the synchronous Groq client to authenticate, construct a message array (system, user, and optionally assistant messages), send the request to a specified model, and parse the returned completion. The workflow supports optional parameters for controlling generation behavior such as temperature, max tokens, top-p sampling, and stop sequences.

Usage

Execute this workflow when you need to generate a single, complete text response from a Groq-hosted language model (e.g., Mixtral, Llama) using a synchronous Python application. This is the most common integration pattern and is appropriate when you do not need incremental token streaming.

Execution Steps

Step 1: Client Initialization

Instantiate the Groq client with authentication credentials. The client reads the API key from the GROQ_API_KEY environment variable by default, or accepts it as an explicit parameter. Optional configuration includes base URL override, timeout settings, retry behavior, and custom HTTP client.

Key considerations:

The API key is required and should be stored securely (environment variable or .env file)
Default timeout is 60 seconds; configurable per-client or per-request
Default retry count is 2 with exponential backoff for transient errors (429, 5xx)

Step 2: Message Construction

Build the messages array that defines the conversation context. Each message is a dictionary with a role (system, user, or assistant) and content. The system message sets behavioral instructions, user messages contain the prompt, and assistant messages provide conversation history for multi-turn interactions.

Key considerations:

System messages are optional but recommended for controlling assistant behavior
Messages are processed in order; conversation history enables multi-turn dialogue
Content can be plain text or, for some models, structured content parts (text + images)

Step 3: Request Execution

Call the chat completions create endpoint with the messages array, model identifier, and optional generation parameters. The client serializes the request, sends it to the Groq API over HTTPS, handles any retries for transient failures, and blocks until the full response is received.

Key considerations:

Model must be a valid Groq-hosted model identifier (e.g., mixtral-8x7b-32768, llama-4-scout)
Temperature (0.0-2.0) controls randomness; lower values produce more deterministic output
max_tokens limits the response length; tokens are shared between prompt and completion
Stop sequences can be a string or array of strings that halt generation

Step 4: Response Parsing

Extract the generated text from the ChatCompletion response object. The response contains a choices array; the primary completion is at index 0. Each choice includes the message content, finish reason (stop, length, or tool_calls), and optional metadata. Token usage statistics are available in the usage field.

Key considerations:

Always access choices[0].message.content for the generated text
Check finish_reason to determine if the response was truncated (length) or complete (stop)
Usage statistics (prompt_tokens, completion_tokens, total_tokens) are useful for monitoring costs

Execution Diagram

GitHub URL

Workflow Repository