Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Huggingface Transformers Pipeline Preprocess

From Leeroopedia
Revision as of 13:06, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Huggingface_Transformers_Pipeline_Preprocess.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains NLP, Inference
Last Updated 2026-02-13 00:00 GMT

Overview

Concrete tool for converting raw text or chat inputs into tokenized tensor dictionaries provided by the HuggingFace Transformers library.

Description

The TextGenerationPipeline.preprocess() method transforms raw text prompts or chat message histories into the tensor format required by the model's generate() method. It handles two input modalities:

  • Plain text: Prepends any configured prefix, then tokenizes using the pipeline's tokenizer, producing input_ids and attention_mask tensors.
  • Chat messages: Applies the tokenizer's chat template via apply_chat_template(), rendering the message list into tokenized tensors with proper turn delimiters and generation prompts.

The method also implements the "hole" strategy for handling inputs that exceed the model's maximum context length. When enabled, it truncates from the left side of the input, preserving the most recent context while leaving sufficient room for the requested number of new tokens.

Additional tokenizer parameters (add_special_tokens, truncation, padding, max_length) can be passed through to control the encoding behavior. Extra keyword arguments not consumed by preprocessing are forwarded as generation parameters.

Usage

This method is called automatically by the pipeline's __call__ method during inference. Use it directly when you need fine-grained control over the preprocessing step, such as:

  • Inspecting the tokenized representation before model invocation.
  • Applying custom chat templates with tool definitions or document context.
  • Debugging tokenization issues with specific inputs.

Code Reference

Source Location

  • Repository: transformers
  • File: src/transformers/pipelines/text_generation.py (lines 295-361)

Signature

def preprocess(
    self,
    prompt_text,
    prefix="",
    handle_long_generation=None,
    add_special_tokens=None,
    truncation=None,
    padding=None,
    max_length=None,
    continue_final_message=None,
    tokenizer_encode_kwargs=None,
    tools=None,
    documents=None,
    **generate_kwargs,
):

Import

from transformers import pipeline

# preprocess is a method on TextGenerationPipeline instances:
generator = pipeline("text-generation", model="gpt2")
# Access: generator.preprocess(prompt_text, ...)

I/O Contract

Inputs

Name Type Required Description
prompt_text str or Chat Yes The raw text prompt or a Chat object containing a list of message dictionaries with "role" and "content" keys.
prefix str No A prefix string prepended to the prompt before tokenization. Defaults to "".
handle_long_generation str or None No Strategy for inputs exceeding model max length. "hole" truncates from the left. None applies no special handling.
add_special_tokens bool or None No Whether to add special tokens (e.g., BOS/EOS) during tokenization. Ignored for chat inputs.
truncation bool or str or None No Truncation strategy passed to the tokenizer.
padding bool or str or None No Padding strategy passed to the tokenizer.
max_length int or None No Maximum sequence length for tokenization.
continue_final_message bool or None No For chat inputs: whether to continue the last assistant message (prefill) instead of adding a generation prompt. Auto-detected if None.
tokenizer_encode_kwargs dict or None No Additional keyword arguments passed to the tokenizer's encoding method.
tools list or None No Tool definitions passed to apply_chat_template() for function-calling models.
documents list or None No Document context passed to apply_chat_template() for retrieval-augmented models.
**generate_kwargs dict No Additional generation parameters (e.g., max_new_tokens, temperature) forwarded to the forward pass.

Outputs

Name Type Description
inputs dict A dictionary containing: input_ids (torch.Tensor of shape [1, seq_len]), attention_mask (torch.Tensor of shape [1, seq_len]), and prompt_text (the original input for postprocessing).

Usage Examples

Basic Usage

from transformers import pipeline

generator = pipeline("text-generation", model="gpt2")

# Preprocess a plain text prompt
inputs = generator.preprocess("Once upon a time")
print(inputs["input_ids"].shape)       # torch.Size([1, 5])
print(inputs["attention_mask"].shape)   # torch.Size([1, 5])
print(inputs["prompt_text"])            # "Once upon a time"

Chat Input with Template

from transformers import pipeline
from transformers.utils.chat_template_utils import Chat

generator = pipeline("text-generation", model="HuggingFaceH4/zephyr-7b-beta")

messages = [
    {"role": "user", "content": "What is the capital of France?"},
]
chat = Chat(messages)
inputs = generator.preprocess(chat)

Handling Long Inputs

from transformers import pipeline

generator = pipeline("text-generation", model="gpt2")

long_prompt = "word " * 2000  # Exceeds model max length
inputs = generator.preprocess(
    long_prompt,
    handle_long_generation="hole",
    max_new_tokens=100,
)
# input_ids truncated from the left to fit within model_max_length - 100

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment