Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:Openai Openai agents python Guardrails Secured Agent

From Leeroopedia
Knowledge Sources
Domains AI_Agents, Safety, Guardrails
Last Updated 2026-02-11 14:00 GMT

Overview

End-to-end process for securing an AI agent with input guardrails, output guardrails, and tool-level guardrails that validate and filter content at multiple stages of execution.

Description

This workflow demonstrates the SDK's multi-layered guardrails system. Input guardrails validate user messages before they reach the model. Output guardrails validate the model's final response before it is returned. Tool-level guardrails (both input and output) intercept individual tool calls to block suspicious arguments or sensitive data in tool results. Guardrails can either reject content (sending a message back to the model to retry) or raise exceptions to halt execution entirely. This layered approach enables defense-in-depth for production agent deployments.

Usage

Execute this workflow when deploying agents that handle sensitive data, interact with external systems, or operate in regulated environments. Use guardrails to prevent prompt injection, block sensitive data leakage, enforce content policies, and ensure tool calls comply with business rules.

Execution Steps

Step 1: Define Tool Input Guardrails

Create functions decorated with @tool_input_guardrail that inspect tool call arguments before the tool executes. The guardrail receives a ToolInputGuardrailData object containing the tool context and arguments. Return a ToolGuardrailFunctionOutput to approve, reject, or raise an exception.

Key considerations:

  • Use ToolGuardrailFunctionOutput() for approval (pass-through)
  • Use ToolGuardrailFunctionOutput.reject_content() to block the call and send a message back to the model
  • Use ToolGuardrailFunctionOutput.raise_exception() to halt execution entirely
  • Multiple input guardrails can be applied to a single tool

Step 2: Define Tool Output Guardrails

Create functions decorated with @tool_output_guardrail that inspect tool return values after execution but before results are sent to the model. The guardrail receives a ToolOutputGuardrailData object containing the tool output and context.

Key considerations:

  • Output guardrails run after the tool executes but before the model sees the result
  • Useful for detecting sensitive data (SSNs, credentials) in tool outputs
  • reject_content replaces the tool output with a message for the model
  • raise_exception halts execution and propagates ToolOutputGuardrailTripwireTriggered

Step 3: Define Agent-Level Guardrails (Optional)

Create InputGuardrail and OutputGuardrail instances for agent-level validation. Input guardrails run a secondary agent or function against the user input. Output guardrails validate the agent's final response. These run in parallel with the main agent for efficiency.

Key considerations:

  • Agent-level input guardrails run concurrently with the first model call
  • A guardrail agent can classify input as safe/unsafe
  • InputGuardrailTripwireTriggered and OutputGuardrailTripwireTriggered exceptions are raised on failure
  • Guardrails are defined on the Agent via input_guardrails and output_guardrails lists

Step 4: Attach Guardrails to Tools and Agent

Assign guardrails to the appropriate tools and agent. Set tool_input_guardrails and tool_output_guardrails properties on function tools. Set input_guardrails and output_guardrails on the Agent.

Key considerations:

  • Each tool can have independent guardrail sets
  • Agent-level and tool-level guardrails operate at different stages
  • Tool guardrails apply to every invocation of that specific tool
  • Agent guardrails apply to the overall agent run

Step 5: Execute and Handle Guardrail Events

Run the agent via Runner.run(). Guardrails execute automatically at their respective stages. If a guardrail rejects content, the model receives feedback and retries. If a guardrail raises an exception, catch the specific exception type (ToolOutputGuardrailTripwireTriggered, InputGuardrailTripwireTriggered, or OutputGuardrailTripwireTriggered) to handle the failure.

Key considerations:

  • Rejection guardrails are transparent to the caller; the model adapts its response
  • Exception guardrails require try/except handling
  • The exception objects contain output_info with details about what triggered the guardrail
  • Guardrail execution is logged in tracing spans for observability

Execution Diagram

GitHub URL

Workflow Repository