Principle:Openai Openai agents python Tool Output Guardrail Definition
Overview
Tool output guardrails provide post-execution validation on the values returned by tool functions. After a tool runs and produces its output, each configured output guardrail inspects that output and decides whether it should be passed back to the LLM, replaced with an error message, or whether the entire run should be halted. This enables filtering of sensitive data, detection of problematic outputs, and content moderation after tool execution.
Core Theory
Post-Execution Validation
Tool output guardrails occupy the pipeline position between the completion of a tool function and the delivery of its result back to the LLM. When a tool finishes execution and returns a value, the framework routes that value through every output guardrail attached to the tool before incorporating it into the conversation. Only outputs that pass all guardrails are delivered to the model.
This design means:
- Tool functions do not need to implement output sanitization or filtering logic themselves.
- Sensitive data handling can be centralized in reusable guardrail functions.
- The same tool can have different output guardrails in different contexts without changing the tool code.
Three Behavior Options
Just like input guardrails, every tool output guardrail returns a ToolGuardrailFunctionOutput signaling one of three outcomes:
- allow -- The tool output is safe and should be sent back to the LLM as-is. This is the normal case where the output passes validation.
- reject_content -- The tool output is problematic and should not be sent to the LLM. Instead, an error message (specified in the rejection) is sent in place of the actual tool output. The LLM sees only the error message and can decide how to proceed.
- raise_exception -- The tool output is so dangerous that the entire run must stop immediately. A
ToolOutputGuardrailTripwireTriggeredexception is raised.
The reject_content behavior is particularly useful for output guardrails because it prevents sensitive data from ever reaching the LLM's context window. If a tool accidentally returns personally identifiable information (PII) or confidential data, a rejection guardrail ensures the model never sees that data and cannot leak it in subsequent responses.
ToolOutputGuardrailData
The output guardrail function receives a ToolOutputGuardrailData instance that extends ToolInputGuardrailData with an additional field:
- context -- A
ToolContextobject containing the tool call arguments and contextual information (inherited fromToolInputGuardrailData). - agent -- The
Agentinstance that triggered the tool call (inherited fromToolInputGuardrailData). - output -- The actual return value of the tool function. This is the value being validated.
Having access to both the input arguments and the output allows guardrails to implement contextual validation rules. For example, a guardrail could verify that the output is consistent with what was requested, or that a database query tool did not return more data than the query should have produced.
Common Use Cases
Tool output guardrails are particularly well-suited for:
- PII filtering -- Detecting and blocking outputs that contain Social Security numbers, credit card numbers, email addresses, or other personally identifiable information.
- Content moderation -- Checking tool outputs for offensive, harmful, or inappropriate content before the LLM incorporates it into a response.
- Output sanitization -- Ensuring tool outputs do not contain executable code, injection payloads, or other dangerous content that could affect downstream processing.
- Data loss prevention -- Verifying that tools with access to sensitive systems do not inadvertently expose confidential information.
Sync and Async Support
Output guardrails support both synchronous and asynchronous function signatures. Asynchronous guardrails are especially relevant for output validation because post-processing checks often involve external services (e.g., calling a content moderation API or checking a PII detection service).
Stacking Multiple Guardrails
Multiple output guardrails can be attached to a single tool via the tool_output_guardrails list on FunctionTool. All guardrails are evaluated, and if any rejects or raises an exception, the tool output is blocked from reaching the LLM.
Key Source References
- Decorator definition:
src/agents/tool_guardrails.pylines 264-279 - Class definition:
src/agents/tool_guardrails.pylines 181-206
Import
from agents import tool_output_guardrail, ToolOutputGuardrail
See Also
- Implementation:Openai_Openai_agents_python_Tool_Output_Guardrail_Decorator
- Tool Input Guardrail Definition -- Pre-execution validation on tool inputs
- Agent Level Guardrail Definition -- Guardrails on agent input and output
- Guardrail Attachment -- How to wire guardrails to tools and agents
- Guardrail Execution -- The runtime pipeline that invokes guardrails