Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Openai Openai agents python Agent Level Guardrail Definition

From Leeroopedia

Overview

Agent-level guardrails operate on the agent's input (before the LLM call) and the agent's final output (after the response is produced). Unlike tool-level guardrails which protect individual tool invocations, agent-level guardrails provide a broader validation layer that wraps the entire agent interaction. They comprise two distinct types: InputGuardrail and OutputGuardrail.

Core Theory

Input Guardrails

An InputGuardrail validates the user's input before or in parallel with the first LLM call. Input guardrails run only on the first turn and only for the first (starting) agent in a run. They do not re-execute on subsequent turns or when control is handed off to another agent.

The key design decisions around input guardrails are:

  • Parallel execution -- By default, run_in_parallel=True, meaning input guardrails execute concurrently with the first model call. This avoids adding latency to the critical path. If the guardrail triggers a tripwire, the model response is discarded.
  • Blocking execution -- Setting run_in_parallel=False forces the guardrail to complete before the model call begins. This is useful when the guardrail is expensive and you want to avoid wasting model compute on inputs that will be rejected.
  • First-agent-only scope -- Input guardrails are intentionally limited to the starting agent. In multi-agent handoff chains, only the entry point validates the original user input. Downstream agents trust that input has already been vetted.

Output Guardrails

An OutputGuardrail validates the agent's final output after the agent has produced its response. Output guardrails run only for the final agent in a run -- the agent that actually produces the terminal response rather than handing off to another agent.

Output guardrails are useful for:

  • Ensuring the final response meets quality or safety standards.
  • Checking that structured output conforms to business rules beyond what the schema enforces.
  • Preventing the agent from producing responses that violate content policies.

Tripwire Mechanism

Both input and output guardrails use a tripwire mechanism rather than the allow/reject/raise pattern of tool guardrails. The guardrail function returns a GuardrailFunctionOutput with a tripwire_triggered boolean:

  • tripwire_triggered=False -- The input or output is acceptable. Execution continues normally.
  • tripwire_triggered=True -- The input or output is unacceptable. The framework raises an InputGuardrailTripwireTriggered or OutputGuardrailTripwireTriggered exception, which halts the run immediately.

The tripwire model is binary: either the agent proceeds or the run stops. There is no "soft rejection" equivalent to tool guardrails' reject_content. This reflects the different position in the pipeline -- agent-level guardrails protect the outermost boundary, where partial recovery is not meaningful.

Parallel vs. Sequential Execution

The run_in_parallel flag on InputGuardrail controls a significant performance trade-off:

  • Parallel (default) -- The guardrail and the model call start simultaneously. If the guardrail passes, the model response is used immediately with no added latency. If the guardrail fails, the model response is discarded and the exception is raised. This optimizes for the common case (guardrail passes) at the cost of wasted model compute in the rare case (guardrail triggers).
  • Sequential -- The guardrail must complete before the model call begins. This avoids wasted model compute but adds the guardrail's execution time to every request.

Output guardrails do not have this flag because they inherently run after the model has produced its response.

Scope Rules

Understanding when guardrails execute is critical:

  • Input guardrails run on the first turn only, for the starting agent only. They do not re-run on subsequent turns in a multi-turn conversation, and they do not run when the starting agent hands off to another agent.
  • Output guardrails run after the final agent produces output. If agent A hands off to agent B, only agent B's output guardrails run (assuming B is the terminal agent).

These scope rules prevent redundant validation and ensure guardrails run at the appropriate lifecycle points.

Key Source References

  • Classes and decorators: src/agents/guardrail.py lines 72-185 (classes), lines 224-343 (decorators)

Import

from agents import InputGuardrail, OutputGuardrail, GuardrailFunctionOutput
from agents import input_guardrail, output_guardrail

See Also

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment