Implementation:Guardrails ai Guardrails Validator Chunking Function
| Knowledge Sources | |
|---|---|
| Domains | Streaming, Validation |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
Concrete method interface for defining custom chunk boundary detection in streaming validators provided by the guardrails package.
Description
The Validator._chunking_function method defines how accumulated streaming text is split into segments for validation. The default implementation uses split_sentence_word_tokenizers_jl_separator for sentence boundary detection. Custom validators override this to implement application-specific segmentation. This is a Pattern Doc as it defines the interface users can override.
Usage
Override this method in custom validators that need non-default chunking behavior. Return an empty list if more text is needed, or [segment_to_validate, remaining_text] when a boundary is detected.
Code Reference
Source Location
- Repository: guardrails
- File: guardrails/validator_base.py
- Lines: L252-262
Interface Specification
def _chunking_function(self, chunk: str) -> List[str]:
"""The strategy used for chunking accumulated text input into
validation sets.
Args:
chunk (str): The accumulated text to chunk.
Returns:
list[str]: Empty list if not enough text, or
[text_to_validate, remaining_text] if boundary found.
"""
return split_sentence_word_tokenizers_jl_separator(chunk)
Import
from guardrails.validator_base import Validator
# Override _chunking_function in your Validator subclass
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| chunk | str | Yes | Accumulated text from streaming LLM chunks |
Outputs
| Name | Type | Description |
|---|---|---|
| [] | List (empty) | Not enough text accumulated yet; continue buffering |
| [segment, remainder] | List[str] (2 elements) | segment: text ready for validation; remainder: incomplete text to carry forward |
Usage Examples
Default Sentence-Based Chunking
# The default implementation splits at sentence boundaries
# No override needed for sentence-level validation
@register_validator(name="my_org/sentence_check", data_type="string")
class SentenceCheck(Validator):
def _validate(self, value, metadata):
# value will be a complete sentence
...
# Uses default _chunking_function (sentence boundaries)
Custom Paragraph-Based Chunking
@register_validator(name="my_org/paragraph_check", data_type="string")
class ParagraphCheck(Validator):
def _chunking_function(self, chunk: str):
"""Chunk at paragraph boundaries (double newline)."""
if "\n\n" in chunk:
parts = chunk.split("\n\n", 1)
return [parts[0], parts[1]]
return [] # Not enough text yet
def _validate(self, value, metadata):
# value will be a complete paragraph
...
Fixed-Length Chunking
@register_validator(name="my_org/fixed_chunk", data_type="string")
class FixedChunkValidator(Validator):
CHUNK_SIZE = 200
def _chunking_function(self, chunk: str):
"""Chunk at fixed character boundaries."""
if len(chunk) >= self.CHUNK_SIZE:
return [chunk[:self.CHUNK_SIZE], chunk[self.CHUNK_SIZE:]]
return []
def _validate(self, value, metadata):
...