Implementation:Langchain ai Langchain OpenAI Moderation Middleware

Knowledge Sources	Langchain_ai_Langchain
Domains	LLM Integration, Content Moderation, Agent Middleware
Last Updated	2026-02-11 00:00 GMT

Overview

Agent middleware that integrates OpenAI's moderation endpoint to check and filter messages for policy violations in LangChain agent pipelines.

Description

OpenAIModerationMiddleware is a class in the langchain-openai partner package that implements the AgentMiddleware protocol. It intercepts agent traffic (user inputs, model outputs, and tool results) and sends their text content to the OpenAI Moderation API for safety screening. When flagged content is detected, the middleware can raise an error, end the conversation with a violation message, or replace the offending message content, depending on the configured exit_behavior. The module also exports OpenAIModerationError, a custom exception raised when the exit behavior is set to "error".

Usage

Import OpenAIModerationMiddleware when building LangChain agent pipelines that need automated content moderation via the OpenAI Moderation API. Use OpenAIModerationError to catch moderation violations programmatically.

Code Reference

Source Location

Repository: Langchain_ai_Langchain
File: libs/partners/openai/langchain_openai/middleware/openai_moderation.py
Lines: 1-484

Signature

class OpenAIModerationError(RuntimeError):
    def __init__(
        self,
        *,
        content: str,
        stage: ViolationStage,
        result: Moderation,
        message: str,
    ) -> None: ...

class OpenAIModerationMiddleware(AgentMiddleware[AgentState[Any], Any]):
    def __init__(
        self,
        *,
        model: ModerationModel = "omni-moderation-latest",
        check_input: bool = True,
        check_output: bool = True,
        check_tool_results: bool = False,
        exit_behavior: Literal["error", "end", "replace"] = "end",
        violation_message: str | None = None,
        client: OpenAI | None = None,
        async_client: AsyncOpenAI | None = None,
    ) -> None: ...

Import

from langchain_openai.middleware.openai_moderation import (
    OpenAIModerationMiddleware,
    OpenAIModerationError,
)

I/O Contract

Inputs

Name	Type	Required	Description
model	ModerationModel	No	OpenAI moderation model to use. Defaults to "omni-moderation-latest".
check_input	bool	No	Whether to check user input messages. Defaults to True.
check_output	bool	No	Whether to check model output messages. Defaults to True.
check_tool_results	bool	No	Whether to check tool result messages. Defaults to False.
exit_behavior	Literal["error", "end", "replace"]	No	How to handle violations. "error" raises OpenAIModerationError, "end" jumps to end with a violation message, "replace" replaces flagged content in place. Defaults to "end".
violation_message	str or None	No	Custom template for violation messages. Supports {categories}, {category_scores}, {original_content} placeholders.
client	OpenAI or None	No	Optional pre-configured synchronous OpenAI client.
async_client	AsyncOpenAI or None	No	Optional pre-configured asynchronous OpenAI client.

Outputs

Name	Type	Description
before_model return	dict[str, Any] or None	Updated state with moderated messages, a jump_to directive, or None if no changes.
after_model return	dict[str, Any] or None	Updated state with moderated output messages, a jump_to directive, or None if no changes.

Key Methods

Method	Description
before_model(state, runtime)	Synchronous hook that moderates user input and tool results before the model is called.
after_model(state, runtime)	Synchronous hook that moderates model output after the model is called.
abefore_model(state, runtime)	Async version of before_model.
aafter_model(state, runtime)	Async version of after_model.

Usage Examples

Basic Usage

from langchain_openai.middleware.openai_moderation import OpenAIModerationMiddleware

# Create middleware with default settings (checks input and output, ends on violation)
moderation = OpenAIModerationMiddleware()

# Create middleware that raises an error on violation
moderation_strict = OpenAIModerationMiddleware(
    exit_behavior="error",
    check_tool_results=True,
)

# Create middleware with a custom violation message
moderation_custom = OpenAIModerationMiddleware(
    violation_message="Content flagged for: {categories}. Original: {original_content}",
)

Handling Moderation Errors

from langchain_openai.middleware.openai_moderation import (
    OpenAIModerationMiddleware,
    OpenAIModerationError,
)

moderation = OpenAIModerationMiddleware(exit_behavior="error")

try:
    # Use within an agent pipeline
    pass
except OpenAIModerationError as e:
    print(f"Violation at stage: {e.stage}")
    print(f"Flagged content: {e.content}")
    print(f"Moderation result: {e.result}")

Internal Behavior

The middleware locates the last relevant message in the conversation state (HumanMessage for input, AIMessage for output, ToolMessage for tool results) and extracts its text content. The text is sent to the OpenAI Moderation API endpoint, which returns a Moderation result indicating whether the content is flagged and which categories triggered the flag. When a violation is detected, the _apply_violation method formats a human-readable violation message and applies the configured exit behavior.

The violation message template supports three placeholders:

{categories}: Comma-separated list of flagged category names.
{category_scores}: JSON-formatted category scores.
{original_content}: The original content that was flagged.

Related Pages

Environment:Langchain_ai_Langchain_OpenAI_API_Credentials

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment