Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Protectai Llm guard Token Limit Early Guard

From Leeroopedia
Knowledge Sources
Domains Optimization, Security
Last Updated 2026-02-14 12:00 GMT

Overview

Pipeline ordering heuristic: place the TokenLimit scanner first in the pipeline to reject oversized prompts before wasting compute on expensive ML-based scanners.

Description

The TokenLimit scanner uses tiktoken (a fast BPE tokenizer) to count tokens and truncate prompts that exceed a configured limit. It is computationally cheap compared to ML-based scanners like PromptInjection or Toxicity, which require full transformer inference. Placing TokenLimit early in the scanner pipeline acts as a gatekeeper that prevents oversized inputs from consuming expensive compute resources downstream.

Usage

Use this heuristic when designing scanner pipelines that include both lightweight scanners (TokenLimit, BanSubstrings, Regex, InvisibleText) and heavyweight ML-based scanners (PromptInjection, Toxicity, BanTopics, Anonymize). Order cheap scanners first, especially when fail_fast is enabled.

The Insight (Rule of Thumb)

  • Action: Place TokenLimit, BanSubstrings, Regex, InvisibleText, and Secrets scanners before PromptInjection, Toxicity, BanTopics, Anonymize, and other ML-based scanners in the pipeline configuration.
  • Value: TokenLimit default is 4096 tokens using cl100k_base encoding.
  • Trade-off: None. This is a pure optimization with no accuracy impact, since scanner order does not affect individual scanner results.
  • Combination: Pairs well with fail_fast=True for maximum latency reduction on invalid inputs.

Reasoning

ML-based scanners run transformer models with O(n^2) attention complexity relative to input length. Processing a 100K-token prompt through PromptInjection (DeBERTa, max_length=512) is wasteful if the prompt will be rejected anyway for exceeding the token limit. The TokenLimit scanner uses tiktoken's O(n) BPE encoding which is orders of magnitude faster than a transformer forward pass.

# From llm_guard/input_scanners/token_limit.py:61-80
def scan(self, prompt: str) -> tuple[str, bool, float]:
    if prompt.strip() == "":
        return prompt, True, -1.0
    chunks, num_tokens = self._split_text_on_tokens(text=prompt)
    if num_tokens < self._limit:
        LOGGER.debug("Prompt fits the maximum tokens", ...)
        return prompt, True, -1.0
    LOGGER.warning("Prompt is too big. Splitting into chunks", ...)
    return chunks[0], False, 1.0

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment