Workflow:Guardrails ai Guardrails Streaming Validation
| Knowledge Sources | |
|---|---|
| Domains | LLMs, Streaming, Validation |
| Last Updated | 2026-02-14 12:00 GMT |
Overview
End-to-end process for validating LLM output in real-time as it streams, using Guardrails' streaming validation pipeline with incremental chunk processing.
Description
This workflow demonstrates how to apply Guardrails validation to streaming LLM responses. Rather than waiting for the full response to complete, the streaming validation pipeline processes output incrementally as chunks arrive. The Guard accumulates text chunks and validates them at sentence boundaries (or custom chunk boundaries via an overridable chunking function). This enables early detection of policy violations and real-time feedback to users while maintaining the low-latency benefits of streaming. Both synchronous and asynchronous streaming are supported through StreamRunner and AsyncStreamRunner respectively.
Usage
Execute this workflow when building interactive applications that display LLM output progressively (e.g., chatbots, writing assistants) and need to validate that output in real-time. This is particularly valuable when you want to catch toxic language, PII, or policy violations as they appear rather than after the full response completes. Streaming validation is also useful for long-form content generation where waiting for the complete response before validation would introduce unacceptable latency.
Execution Steps
Step 1: Configure Guard with Validators
Create a Guard and attach validators that support streaming validation. Install required validators from Guardrails Hub and configure them with appropriate parameters and on-fail actions. Note that not all on-fail action types are compatible with streaming; consult the error remediation documentation for supported combinations.
Key considerations:
- Not all on-fail actions work with streaming (e.g., reask is not supported in stream mode)
- Validators process accumulated text at chunk boundaries, not individual tokens
- The default chunking strategy splits on sentence boundaries
Step 2: Invoke Guard with Stream Flag
Call the Guard with stream=True to activate the streaming validation pipeline. Pass the LLM model and messages as usual. Instead of returning a single ValidationOutcome, the Guard returns a generator that yields ValidationOutcome objects for each validated chunk.
Key considerations:
- The stream=True flag activates the StreamRunner instead of the standard Runner
- The returned generator is lazy; validation happens as you iterate
- For async applications, use AsyncGuard with stream=True for non-blocking streaming
Step 3: Process Streamed Validation Chunks
Iterate over the generator returned by the Guard. Each yielded chunk is a ValidationOutcome containing the incrementally validated output. Display or process each chunk as it arrives, checking the validation status for early termination if needed.
Key considerations:
- Each chunk's validated_output contains the validated portion of accumulated text
- Validation runs at sentence boundaries by default; customize with _chunking_function override
- The generator completes when the LLM stream ends and all accumulated text has been validated
Step 4: Handle Stream Validation Results
After the generator is exhausted, assess the overall validation outcome. The final chunk contains the complete validated output. Handle any validation failures that occurred during streaming, such as logging violations, notifying users of filtered content, or triggering follow-up actions.
Key considerations:
- Streaming validation may apply fix actions incrementally, modifying output in-flight
- Filter and refrain actions can remove content mid-stream
- The complete validation history is available via guard.history after the stream completes