Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Guardrails ai Guardrails Validator Chunking Function

From Leeroopedia
Knowledge Sources
Domains Streaming, Validation
Last Updated 2026-02-14 00:00 GMT

Overview

Concrete method interface for defining custom chunk boundary detection in streaming validators provided by the guardrails package.

Description

The Validator._chunking_function method defines how accumulated streaming text is split into segments for validation. The default implementation uses split_sentence_word_tokenizers_jl_separator for sentence boundary detection. Custom validators override this to implement application-specific segmentation. This is a Pattern Doc as it defines the interface users can override.

Usage

Override this method in custom validators that need non-default chunking behavior. Return an empty list if more text is needed, or [segment_to_validate, remaining_text] when a boundary is detected.

Code Reference

Source Location

  • Repository: guardrails
  • File: guardrails/validator_base.py
  • Lines: L252-262

Interface Specification

def _chunking_function(self, chunk: str) -> List[str]:
    """The strategy used for chunking accumulated text input into
    validation sets.

    Args:
        chunk (str): The accumulated text to chunk.

    Returns:
        list[str]: Empty list if not enough text, or
                   [text_to_validate, remaining_text] if boundary found.
    """
    return split_sentence_word_tokenizers_jl_separator(chunk)

Import

from guardrails.validator_base import Validator
# Override _chunking_function in your Validator subclass

I/O Contract

Inputs

Name Type Required Description
chunk str Yes Accumulated text from streaming LLM chunks

Outputs

Name Type Description
[] List (empty) Not enough text accumulated yet; continue buffering
[segment, remainder] List[str] (2 elements) segment: text ready for validation; remainder: incomplete text to carry forward

Usage Examples

Default Sentence-Based Chunking

# The default implementation splits at sentence boundaries
# No override needed for sentence-level validation
@register_validator(name="my_org/sentence_check", data_type="string")
class SentenceCheck(Validator):
    def _validate(self, value, metadata):
        # value will be a complete sentence
        ...
    # Uses default _chunking_function (sentence boundaries)

Custom Paragraph-Based Chunking

@register_validator(name="my_org/paragraph_check", data_type="string")
class ParagraphCheck(Validator):
    def _chunking_function(self, chunk: str):
        """Chunk at paragraph boundaries (double newline)."""
        if "\n\n" in chunk:
            parts = chunk.split("\n\n", 1)
            return [parts[0], parts[1]]
        return []  # Not enough text yet

    def _validate(self, value, metadata):
        # value will be a complete paragraph
        ...

Fixed-Length Chunking

@register_validator(name="my_org/fixed_chunk", data_type="string")
class FixedChunkValidator(Validator):
    CHUNK_SIZE = 200

    def _chunking_function(self, chunk: str):
        """Chunk at fixed character boundaries."""
        if len(chunk) >= self.CHUNK_SIZE:
            return [chunk[:self.CHUNK_SIZE], chunk[self.CHUNK_SIZE:]]
        return []

    def _validate(self, value, metadata):
        ...

Related Pages

Implements Principle

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment