Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Langchain ai Langchain OpenAI Moderation Middleware

From Leeroopedia
Revision as of 11:24, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Langchain_ai_Langchain_OpenAI_Moderation_Middleware.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains LLM Integration, Content Moderation, Agent Middleware
Last Updated 2026-02-11 00:00 GMT

Overview

Agent middleware that integrates OpenAI's moderation endpoint to check and filter messages for policy violations in LangChain agent pipelines.

Description

OpenAIModerationMiddleware is a class in the langchain-openai partner package that implements the AgentMiddleware protocol. It intercepts agent traffic (user inputs, model outputs, and tool results) and sends their text content to the OpenAI Moderation API for safety screening. When flagged content is detected, the middleware can raise an error, end the conversation with a violation message, or replace the offending message content, depending on the configured exit_behavior. The module also exports OpenAIModerationError, a custom exception raised when the exit behavior is set to "error".

Usage

Import OpenAIModerationMiddleware when building LangChain agent pipelines that need automated content moderation via the OpenAI Moderation API. Use OpenAIModerationError to catch moderation violations programmatically.

Code Reference

Source Location

  • Repository: Langchain_ai_Langchain
  • File: libs/partners/openai/langchain_openai/middleware/openai_moderation.py
  • Lines: 1-484

Signature

class OpenAIModerationError(RuntimeError):
    def __init__(
        self,
        *,
        content: str,
        stage: ViolationStage,
        result: Moderation,
        message: str,
    ) -> None: ...

class OpenAIModerationMiddleware(AgentMiddleware[AgentState[Any], Any]):
    def __init__(
        self,
        *,
        model: ModerationModel = "omni-moderation-latest",
        check_input: bool = True,
        check_output: bool = True,
        check_tool_results: bool = False,
        exit_behavior: Literal["error", "end", "replace"] = "end",
        violation_message: str | None = None,
        client: OpenAI | None = None,
        async_client: AsyncOpenAI | None = None,
    ) -> None: ...

Import

from langchain_openai.middleware.openai_moderation import (
    OpenAIModerationMiddleware,
    OpenAIModerationError,
)

I/O Contract

Inputs

Name Type Required Description
model ModerationModel No OpenAI moderation model to use. Defaults to "omni-moderation-latest".
check_input bool No Whether to check user input messages. Defaults to True.
check_output bool No Whether to check model output messages. Defaults to True.
check_tool_results bool No Whether to check tool result messages. Defaults to False.
exit_behavior Literal["error", "end", "replace"] No How to handle violations. "error" raises OpenAIModerationError, "end" jumps to end with a violation message, "replace" replaces flagged content in place. Defaults to "end".
violation_message str or None No Custom template for violation messages. Supports {categories}, {category_scores}, {original_content} placeholders.
client OpenAI or None No Optional pre-configured synchronous OpenAI client.
async_client AsyncOpenAI or None No Optional pre-configured asynchronous OpenAI client.

Outputs

Name Type Description
before_model return dict[str, Any] or None Updated state with moderated messages, a jump_to directive, or None if no changes.
after_model return dict[str, Any] or None Updated state with moderated output messages, a jump_to directive, or None if no changes.

Key Methods

Method Description
before_model(state, runtime) Synchronous hook that moderates user input and tool results before the model is called.
after_model(state, runtime) Synchronous hook that moderates model output after the model is called.
abefore_model(state, runtime) Async version of before_model.
aafter_model(state, runtime) Async version of after_model.

Usage Examples

Basic Usage

from langchain_openai.middleware.openai_moderation import OpenAIModerationMiddleware

# Create middleware with default settings (checks input and output, ends on violation)
moderation = OpenAIModerationMiddleware()

# Create middleware that raises an error on violation
moderation_strict = OpenAIModerationMiddleware(
    exit_behavior="error",
    check_tool_results=True,
)

# Create middleware with a custom violation message
moderation_custom = OpenAIModerationMiddleware(
    violation_message="Content flagged for: {categories}. Original: {original_content}",
)

Handling Moderation Errors

from langchain_openai.middleware.openai_moderation import (
    OpenAIModerationMiddleware,
    OpenAIModerationError,
)

moderation = OpenAIModerationMiddleware(exit_behavior="error")

try:
    # Use within an agent pipeline
    pass
except OpenAIModerationError as e:
    print(f"Violation at stage: {e.stage}")
    print(f"Flagged content: {e.content}")
    print(f"Moderation result: {e.result}")

Internal Behavior

The middleware locates the last relevant message in the conversation state (HumanMessage for input, AIMessage for output, ToolMessage for tool results) and extracts its text content. The text is sent to the OpenAI Moderation API endpoint, which returns a Moderation result indicating whether the content is flagged and which categories triggered the flag. When a violation is detected, the _apply_violation method formats a human-readable violation message and applies the configured exit behavior.

The violation message template supports three placeholders:

  • {categories}: Comma-separated list of flagged category names.
  • {category_scores}: JSON-formatted category scores.
  • {original_content}: The original content that was flagged.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment