Heuristic:Guardrails ai Guardrails Guard History Memory Management
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Deployment |
| Last Updated | 2026-02-14 12:00 GMT |
Overview
Memory management strategy for Guard call history, which is stored exclusively in-memory and can grow unboundedly in long-running services.
Description
Every Guard instance maintains an in-memory `Stack[Call]` that records the full history of all validation calls, including raw LLM outputs, validation results, and metadata. In long-running services (API servers, background workers), this history accumulates without bound unless explicitly managed. The framework provides two mechanisms for controlling memory usage: the `history_max_length` constructor parameter and the `GUARD_HISTORY_ENABLED` environment variable for remote mode.
Usage
Apply this heuristic when:
- Deploying Guardrails as a long-running API server where memory growth is a concern.
- Running high-throughput validation pipelines with many Guard calls.
- Using remote Guard mode (`Guard.load()`) where each call triggers a history fetch from the server.
The Insight (Rule of Thumb)
- Action: For production servers, set `GUARD_HISTORY_ENABLED=false` in remote mode to eliminate unnecessary API calls. For local mode, use the `history_max_length` parameter when constructing Guards to cap memory usage.
- Value: `Guard(name="my-guard", history_max_length=10)` for bounded local history. `GUARD_HISTORY_ENABLED=false` for remote mode.
- Trade-off: Disabling or limiting history reduces memory usage and API calls, but eliminates the ability to inspect past validation runs programmatically. Developers lose access to `guard.history` for debugging.
Reasoning
The source code contains a TODO comment: "Support a sink for history so that it is not solely held in memory". This indicates the developers recognize the current in-memory-only approach as a limitation. Until an external sink (database, file, message queue) is implemented, the only mitigation is to limit or disable history.
In remote mode (using `Guard.load()`), the problem is compounded: each validation call triggers an additional HTTP request to `get_history()` to sync the server-side history to the client. This doubles the API call count per validation. Setting `GUARD_HISTORY_ENABLED=false` eliminates these extra calls.
The `Stack` class implements a `max_length` parameter that automatically evicts the oldest entries when the limit is reached, providing a bounded-memory approach for local usage.
Evidence from source:
History Stack initialization from `guardrails/guard.py:161-162`:
# TODO: Support a sink for history so that it is not solely held in memory
history: Stack[Call] = Stack(max_length=history_max_length)
Remote history fetch gated by env var from `guardrails/guard.py:1017-1025`:
if os.environ.get("GUARD_HISTORY_ENABLED", "true").lower() == "true":
guard_history = self._api_client.get_history(
self.name, validation_output.call_id
)
call_log = safe_get(
[c for c in guard_history if c.id == validation_output.call_id], 0
)
self.history.append(Call.from_interface(call_log))