Principle:Spcl Graph of thoughts OpenAI Chat Integration

Knowledge Sources	Graph of Thoughts Graph of Thoughts: Solving Elaborate Problems with Large Language Models OpenAI Chat Completion API Reference
Domains	LLM_Integration, API_Design
Implementations	Implementation:Spcl_Graph_of_thoughts_ChatGPT
Last Updated	2026-02-14

Overview

Integration pattern for connecting OpenAI's Chat Completion API as the language model backend in a graph-based reasoning framework.

The Graph of Thoughts framework requires a language model backend to generate, score, and refine thought states. This principle describes how the OpenAI Chat Completion API is integrated as one such backend, covering configuration loading, error resilience, cost accounting, response caching, and multi-response querying.

Core Concepts

Config-Based Initialization

The language model is initialized from a JSON configuration file rather than hardcoded parameters. The configuration specifies:

model_id -- the specific OpenAI model to use (e.g., gpt-4, gpt-3.5-turbo)
prompt_token_cost and response_token_cost -- cost per 1000 tokens for budget tracking
temperature -- controls randomness of the model's output
max_tokens -- maximum number of tokens to generate per completion
stop -- stop sequence(s) that terminate generation
organization -- OpenAI organization identifier
api_key -- API key (can also be sourced from the OPENAI_API_KEY environment variable)

This approach allows the same codebase to target different OpenAI models by simply swapping the configuration file, without any code changes.

Exponential Backoff on Errors

API calls to OpenAI can fail due to rate limits, server errors, or transient network issues. The integration uses exponential backoff to handle these failures gracefully:

On any OpenAIError, the system retries with exponentially increasing delays
Maximum retry time is capped at 10 seconds
Maximum number of retry attempts is 6
This is implemented via the backoff library's on_exception decorator

This pattern ensures that transient failures do not crash the entire reasoning pipeline.

Token Cost Tracking

Every API call accumulates token usage statistics:

prompt_tokens -- total tokens sent to the model across all calls
completion_tokens -- total tokens received from the model across all calls
cost -- running monetary cost computed as: (prompt_tokens / 1000) * prompt_token_cost + (completion_tokens / 1000) * response_token_cost

This enables budget-aware execution where the framework can monitor and limit spending during complex multi-step reasoning.

Response Caching

When caching is enabled, the system stores LLM responses keyed by the query string. Subsequent identical queries return the cached response without making an API call. This is particularly useful during:

Development and debugging of prompt templates
Repeated evaluation runs on the same dataset
Unit testing of downstream parsing logic

Multi-Response Querying

The integration supports requesting multiple responses from a single prompt:

Single response (num_responses=1) -- makes one API call and returns a single ChatCompletion
Multiple responses (num_responses>1) -- attempts to get all responses in one call via the n parameter; if the API rejects the batch size, it halves the request and retries

This adaptive batching strategy maximizes throughput while gracefully handling API limits on the n parameter.

Interaction with the Framework

The OpenAI integration serves as a pluggable backend behind the AbstractLanguageModel interface. The framework's operations (Generate, Score, Improve, etc.) interact with it through two methods:

query(query, num_responses) -- sends a text prompt and returns raw API response(s)
get_response_texts(query_response) -- extracts plain text strings from the API response objects

This abstraction allows the framework to switch between OpenAI and other backends (e.g., local HuggingFace models) without modifying operation logic.

Design Rationale

The separation of configuration from code, combined with automatic cost tracking and error resilience, reflects the principle that LLM integrations in research frameworks must be both reproducible (same config yields same setup) and robust (transient failures do not invalidate long-running experiments).

Related Pages

GitHub URL

graph_of_thoughts/language_models/chatgpt.py

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment