Heuristic:Guardrails ai Guardrails Guard History Memory Management

Knowledge Sources	Guardrails AI Internal pattern
Domains	Optimization, Deployment
Last Updated	2026-02-14 12:00 GMT

Overview

Memory management strategy for Guard call history, which is stored exclusively in-memory and can grow unboundedly in long-running services.

Description

Every Guard instance maintains an in-memory `Stack[Call]` that records the full history of all validation calls, including raw LLM outputs, validation results, and metadata. In long-running services (API servers, background workers), this history accumulates without bound unless explicitly managed. The framework provides two mechanisms for controlling memory usage: the `history_max_length` constructor parameter and the `GUARD_HISTORY_ENABLED` environment variable for remote mode.

Usage

Apply this heuristic when:

Deploying Guardrails as a long-running API server where memory growth is a concern.
Running high-throughput validation pipelines with many Guard calls.
Using remote Guard mode (`Guard.load()`) where each call triggers a history fetch from the server.

The Insight (Rule of Thumb)

Action: For production servers, set `GUARD_HISTORY_ENABLED=false` in remote mode to eliminate unnecessary API calls. For local mode, use the `history_max_length` parameter when constructing Guards to cap memory usage.
Value: `Guard(name="my-guard", history_max_length=10)` for bounded local history. `GUARD_HISTORY_ENABLED=false` for remote mode.
Trade-off: Disabling or limiting history reduces memory usage and API calls, but eliminates the ability to inspect past validation runs programmatically. Developers lose access to `guard.history` for debugging.

Reasoning

The source code contains a TODO comment: "Support a sink for history so that it is not solely held in memory". This indicates the developers recognize the current in-memory-only approach as a limitation. Until an external sink (database, file, message queue) is implemented, the only mitigation is to limit or disable history.

In remote mode (using `Guard.load()`), the problem is compounded: each validation call triggers an additional HTTP request to `get_history()` to sync the server-side history to the client. This doubles the API call count per validation. Setting `GUARD_HISTORY_ENABLED=false` eliminates these extra calls.

The `Stack` class implements a `max_length` parameter that automatically evicts the oldest entries when the limit is reached, providing a bounded-memory approach for local usage.

Evidence from source:

History Stack initialization from `guardrails/guard.py:161-162`:

# TODO: Support a sink for history so that it is not solely held in memory
history: Stack[Call] = Stack(max_length=history_max_length)

Remote history fetch gated by env var from `guardrails/guard.py:1017-1025`:

if os.environ.get("GUARD_HISTORY_ENABLED", "true").lower() == "true":
    guard_history = self._api_client.get_history(
        self.name, validation_output.call_id
    )
    call_log = safe_get(
        [c for c in guard_history if c.id == validation_output.call_id], 0
    )
    self.history.append(Call.from_interface(call_log))

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment