Workflow:Guardrails ai Guardrails Streaming Validation

Knowledge Sources	Guardrails Guardrails Docs
Domains	LLMs, Streaming, Validation
Last Updated	2026-02-14 12:00 GMT

Overview

End-to-end process for validating LLM output in real-time as it streams, using Guardrails' streaming validation pipeline with incremental chunk processing.

Description

This workflow demonstrates how to apply Guardrails validation to streaming LLM responses. Rather than waiting for the full response to complete, the streaming validation pipeline processes output incrementally as chunks arrive. The Guard accumulates text chunks and validates them at sentence boundaries (or custom chunk boundaries via an overridable chunking function). This enables early detection of policy violations and real-time feedback to users while maintaining the low-latency benefits of streaming. Both synchronous and asynchronous streaming are supported through StreamRunner and AsyncStreamRunner respectively.

Usage

Execute this workflow when building interactive applications that display LLM output progressively (e.g., chatbots, writing assistants) and need to validate that output in real-time. This is particularly valuable when you want to catch toxic language, PII, or policy violations as they appear rather than after the full response completes. Streaming validation is also useful for long-form content generation where waiting for the complete response before validation would introduce unacceptable latency.

Execution Steps

Step 1: Configure Guard with Validators

Create a Guard and attach validators that support streaming validation. Install required validators from Guardrails Hub and configure them with appropriate parameters and on-fail actions. Note that not all on-fail action types are compatible with streaming; consult the error remediation documentation for supported combinations.

Key considerations:

Not all on-fail actions work with streaming (e.g., reask is not supported in stream mode)
Validators process accumulated text at chunk boundaries, not individual tokens
The default chunking strategy splits on sentence boundaries

Step 2: Invoke Guard with Stream Flag

Call the Guard with stream=True to activate the streaming validation pipeline. Pass the LLM model and messages as usual. Instead of returning a single ValidationOutcome, the Guard returns a generator that yields ValidationOutcome objects for each validated chunk.

Key considerations:

The stream=True flag activates the StreamRunner instead of the standard Runner
The returned generator is lazy; validation happens as you iterate
For async applications, use AsyncGuard with stream=True for non-blocking streaming

Step 3: Process Streamed Validation Chunks

Iterate over the generator returned by the Guard. Each yielded chunk is a ValidationOutcome containing the incrementally validated output. Display or process each chunk as it arrives, checking the validation status for early termination if needed.

Key considerations:

Each chunk's validated_output contains the validated portion of accumulated text
Validation runs at sentence boundaries by default; customize with _chunking_function override
The generator completes when the LLM stream ends and all accumulated text has been validated

Step 4: Handle Stream Validation Results

After the generator is exhausted, assess the overall validation outcome. The final chunk contains the complete validated output. Handle any validation failures that occurred during streaming, such as logging violations, notifying users of filtered content, or triggering follow-up actions.

Key considerations:

Streaming validation may apply fix actions incrementally, modifying output in-flight
Filter and refrain actions can remove content mid-stream
The complete validation history is available via guard.history after the stream completes

Execution Diagram

GitHub URL

Workflow Repository