Implementation:Guardrails ai Guardrails Validator Chunking Function

Knowledge Sources	Guardrails
Domains	Streaming, Validation
Last Updated	2026-02-14 00:00 GMT

Overview

Concrete method interface for defining custom chunk boundary detection in streaming validators provided by the guardrails package.

Description

The Validator._chunking_function method defines how accumulated streaming text is split into segments for validation. The default implementation uses split_sentence_word_tokenizers_jl_separator for sentence boundary detection. Custom validators override this to implement application-specific segmentation. This is a Pattern Doc as it defines the interface users can override.

Usage

Override this method in custom validators that need non-default chunking behavior. Return an empty list if more text is needed, or [segment_to_validate, remaining_text] when a boundary is detected.

Code Reference

Source Location

Repository: guardrails
File: guardrails/validator_base.py
Lines: L252-262

Interface Specification

def _chunking_function(self, chunk: str) -> List[str]:
    """The strategy used for chunking accumulated text input into
    validation sets.

    Args:
        chunk (str): The accumulated text to chunk.

    Returns:
        list[str]: Empty list if not enough text, or
                   [text_to_validate, remaining_text] if boundary found.
    """
    return split_sentence_word_tokenizers_jl_separator(chunk)

Import

from guardrails.validator_base import Validator
# Override _chunking_function in your Validator subclass

I/O Contract

Inputs

Name	Type	Required	Description
chunk	str	Yes	Accumulated text from streaming LLM chunks

Outputs

Name	Type	Description
[]	List (empty)	Not enough text accumulated yet; continue buffering
[segment, remainder]	List[str] (2 elements)	segment: text ready for validation; remainder: incomplete text to carry forward

Usage Examples

Default Sentence-Based Chunking

# The default implementation splits at sentence boundaries
# No override needed for sentence-level validation
@register_validator(name="my_org/sentence_check", data_type="string")
class SentenceCheck(Validator):
    def _validate(self, value, metadata):
        # value will be a complete sentence
        ...
    # Uses default _chunking_function (sentence boundaries)

Custom Paragraph-Based Chunking

@register_validator(name="my_org/paragraph_check", data_type="string")
class ParagraphCheck(Validator):
    def _chunking_function(self, chunk: str):
        """Chunk at paragraph boundaries (double newline)."""
        if "\n\n" in chunk:
            parts = chunk.split("\n\n", 1)
            return [parts[0], parts[1]]
        return []  # Not enough text yet

    def _validate(self, value, metadata):
        # value will be a complete paragraph
        ...

Fixed-Length Chunking

@register_validator(name="my_org/fixed_chunk", data_type="string")
class FixedChunkValidator(Validator):
    CHUNK_SIZE = 200

    def _chunking_function(self, chunk: str):
        """Chunk at fixed character boundaries."""
        if len(chunk) >= self.CHUNK_SIZE:
            return [chunk[:self.CHUNK_SIZE], chunk[self.CHUNK_SIZE:]]
        return []

    def _validate(self, value, metadata):
        ...

Related Pages

Implements Principle

Principle:Guardrails_ai_Guardrails_Validator_Streaming_Support

Uses Heuristic

Heuristic:Guardrails_ai_Guardrails_Sentence_Tokenizer_Optimization

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment