Implementation:Langchain ai Langchain OpenAI Moderation Middleware
| Knowledge Sources | |
|---|---|
| Domains | LLM Integration, Content Moderation, Agent Middleware |
| Last Updated | 2026-02-11 00:00 GMT |
Overview
Agent middleware that integrates OpenAI's moderation endpoint to check and filter messages for policy violations in LangChain agent pipelines.
Description
OpenAIModerationMiddleware is a class in the langchain-openai partner package that implements the AgentMiddleware protocol. It intercepts agent traffic (user inputs, model outputs, and tool results) and sends their text content to the OpenAI Moderation API for safety screening. When flagged content is detected, the middleware can raise an error, end the conversation with a violation message, or replace the offending message content, depending on the configured exit_behavior. The module also exports OpenAIModerationError, a custom exception raised when the exit behavior is set to "error".
Usage
Import OpenAIModerationMiddleware when building LangChain agent pipelines that need automated content moderation via the OpenAI Moderation API. Use OpenAIModerationError to catch moderation violations programmatically.
Code Reference
Source Location
- Repository: Langchain_ai_Langchain
- File: libs/partners/openai/langchain_openai/middleware/openai_moderation.py
- Lines: 1-484
Signature
class OpenAIModerationError(RuntimeError):
def __init__(
self,
*,
content: str,
stage: ViolationStage,
result: Moderation,
message: str,
) -> None: ...
class OpenAIModerationMiddleware(AgentMiddleware[AgentState[Any], Any]):
def __init__(
self,
*,
model: ModerationModel = "omni-moderation-latest",
check_input: bool = True,
check_output: bool = True,
check_tool_results: bool = False,
exit_behavior: Literal["error", "end", "replace"] = "end",
violation_message: str | None = None,
client: OpenAI | None = None,
async_client: AsyncOpenAI | None = None,
) -> None: ...
Import
from langchain_openai.middleware.openai_moderation import (
OpenAIModerationMiddleware,
OpenAIModerationError,
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | ModerationModel | No | OpenAI moderation model to use. Defaults to "omni-moderation-latest". |
| check_input | bool | No | Whether to check user input messages. Defaults to True. |
| check_output | bool | No | Whether to check model output messages. Defaults to True. |
| check_tool_results | bool | No | Whether to check tool result messages. Defaults to False. |
| exit_behavior | Literal["error", "end", "replace"] | No | How to handle violations. "error" raises OpenAIModerationError, "end" jumps to end with a violation message, "replace" replaces flagged content in place. Defaults to "end". |
| violation_message | str or None | No | Custom template for violation messages. Supports {categories}, {category_scores}, {original_content} placeholders. |
| client | OpenAI or None | No | Optional pre-configured synchronous OpenAI client. |
| async_client | AsyncOpenAI or None | No | Optional pre-configured asynchronous OpenAI client. |
Outputs
| Name | Type | Description |
|---|---|---|
| before_model return | dict[str, Any] or None | Updated state with moderated messages, a jump_to directive, or None if no changes. |
| after_model return | dict[str, Any] or None | Updated state with moderated output messages, a jump_to directive, or None if no changes. |
Key Methods
| Method | Description |
|---|---|
| before_model(state, runtime) | Synchronous hook that moderates user input and tool results before the model is called. |
| after_model(state, runtime) | Synchronous hook that moderates model output after the model is called. |
| abefore_model(state, runtime) | Async version of before_model. |
| aafter_model(state, runtime) | Async version of after_model. |
Usage Examples
Basic Usage
from langchain_openai.middleware.openai_moderation import OpenAIModerationMiddleware
# Create middleware with default settings (checks input and output, ends on violation)
moderation = OpenAIModerationMiddleware()
# Create middleware that raises an error on violation
moderation_strict = OpenAIModerationMiddleware(
exit_behavior="error",
check_tool_results=True,
)
# Create middleware with a custom violation message
moderation_custom = OpenAIModerationMiddleware(
violation_message="Content flagged for: {categories}. Original: {original_content}",
)
Handling Moderation Errors
from langchain_openai.middleware.openai_moderation import (
OpenAIModerationMiddleware,
OpenAIModerationError,
)
moderation = OpenAIModerationMiddleware(exit_behavior="error")
try:
# Use within an agent pipeline
pass
except OpenAIModerationError as e:
print(f"Violation at stage: {e.stage}")
print(f"Flagged content: {e.content}")
print(f"Moderation result: {e.result}")
Internal Behavior
The middleware locates the last relevant message in the conversation state (HumanMessage for input, AIMessage for output, ToolMessage for tool results) and extracts its text content. The text is sent to the OpenAI Moderation API endpoint, which returns a Moderation result indicating whether the content is flagged and which categories triggered the flag. When a violation is detected, the _apply_violation method formats a human-readable violation message and applies the configured exit behavior.
The violation message template supports three placeholders:
- {categories}: Comma-separated list of flagged category names.
- {category_scores}: JSON-formatted category scores.
- {original_content}: The original content that was flagged.