Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Openai Openai agents python Tool Output Guardrail Definition

From Leeroopedia
Revision as of 17:49, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/Openai_Openai_agents_python_Tool_Output_Guardrail_Definition.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Overview

Tool output guardrails provide post-execution validation on the values returned by tool functions. After a tool runs and produces its output, each configured output guardrail inspects that output and decides whether it should be passed back to the LLM, replaced with an error message, or whether the entire run should be halted. This enables filtering of sensitive data, detection of problematic outputs, and content moderation after tool execution.

Core Theory

Post-Execution Validation

Tool output guardrails occupy the pipeline position between the completion of a tool function and the delivery of its result back to the LLM. When a tool finishes execution and returns a value, the framework routes that value through every output guardrail attached to the tool before incorporating it into the conversation. Only outputs that pass all guardrails are delivered to the model.

This design means:

  • Tool functions do not need to implement output sanitization or filtering logic themselves.
  • Sensitive data handling can be centralized in reusable guardrail functions.
  • The same tool can have different output guardrails in different contexts without changing the tool code.

Three Behavior Options

Just like input guardrails, every tool output guardrail returns a ToolGuardrailFunctionOutput signaling one of three outcomes:

  • allow -- The tool output is safe and should be sent back to the LLM as-is. This is the normal case where the output passes validation.
  • reject_content -- The tool output is problematic and should not be sent to the LLM. Instead, an error message (specified in the rejection) is sent in place of the actual tool output. The LLM sees only the error message and can decide how to proceed.
  • raise_exception -- The tool output is so dangerous that the entire run must stop immediately. A ToolOutputGuardrailTripwireTriggered exception is raised.

The reject_content behavior is particularly useful for output guardrails because it prevents sensitive data from ever reaching the LLM's context window. If a tool accidentally returns personally identifiable information (PII) or confidential data, a rejection guardrail ensures the model never sees that data and cannot leak it in subsequent responses.

ToolOutputGuardrailData

The output guardrail function receives a ToolOutputGuardrailData instance that extends ToolInputGuardrailData with an additional field:

  • context -- A ToolContext object containing the tool call arguments and contextual information (inherited from ToolInputGuardrailData).
  • agent -- The Agent instance that triggered the tool call (inherited from ToolInputGuardrailData).
  • output -- The actual return value of the tool function. This is the value being validated.

Having access to both the input arguments and the output allows guardrails to implement contextual validation rules. For example, a guardrail could verify that the output is consistent with what was requested, or that a database query tool did not return more data than the query should have produced.

Common Use Cases

Tool output guardrails are particularly well-suited for:

  • PII filtering -- Detecting and blocking outputs that contain Social Security numbers, credit card numbers, email addresses, or other personally identifiable information.
  • Content moderation -- Checking tool outputs for offensive, harmful, or inappropriate content before the LLM incorporates it into a response.
  • Output sanitization -- Ensuring tool outputs do not contain executable code, injection payloads, or other dangerous content that could affect downstream processing.
  • Data loss prevention -- Verifying that tools with access to sensitive systems do not inadvertently expose confidential information.

Sync and Async Support

Output guardrails support both synchronous and asynchronous function signatures. Asynchronous guardrails are especially relevant for output validation because post-processing checks often involve external services (e.g., calling a content moderation API or checking a PII detection service).

Stacking Multiple Guardrails

Multiple output guardrails can be attached to a single tool via the tool_output_guardrails list on FunctionTool. All guardrails are evaluated, and if any rejects or raises an exception, the tool output is blocked from reaching the LLM.

Key Source References

  • Decorator definition: src/agents/tool_guardrails.py lines 264-279
  • Class definition: src/agents/tool_guardrails.py lines 181-206

Import

from agents import tool_output_guardrail, ToolOutputGuardrail

See Also

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment