Implementation:Huggingface Transformers Pipeline Preprocess

Knowledge Sources	Transformers Transformers Docs
Domains	NLP, Inference
Last Updated	2026-02-13 00:00 GMT

Overview

Concrete tool for converting raw text or chat inputs into tokenized tensor dictionaries provided by the HuggingFace Transformers library.

Description

The TextGenerationPipeline.preprocess() method transforms raw text prompts or chat message histories into the tensor format required by the model's generate() method. It handles two input modalities:

Plain text: Prepends any configured prefix, then tokenizes using the pipeline's tokenizer, producing input_ids and attention_mask tensors.
Chat messages: Applies the tokenizer's chat template via apply_chat_template(), rendering the message list into tokenized tensors with proper turn delimiters and generation prompts.

The method also implements the "hole" strategy for handling inputs that exceed the model's maximum context length. When enabled, it truncates from the left side of the input, preserving the most recent context while leaving sufficient room for the requested number of new tokens.

Additional tokenizer parameters (add_special_tokens, truncation, padding, max_length) can be passed through to control the encoding behavior. Extra keyword arguments not consumed by preprocessing are forwarded as generation parameters.

Usage

This method is called automatically by the pipeline's __call__ method during inference. Use it directly when you need fine-grained control over the preprocessing step, such as:

Inspecting the tokenized representation before model invocation.
Applying custom chat templates with tool definitions or document context.
Debugging tokenization issues with specific inputs.

Code Reference

Source Location

Repository: transformers
File: src/transformers/pipelines/text_generation.py (lines 295-361)

Signature

def preprocess(
    self,
    prompt_text,
    prefix="",
    handle_long_generation=None,
    add_special_tokens=None,
    truncation=None,
    padding=None,
    max_length=None,
    continue_final_message=None,
    tokenizer_encode_kwargs=None,
    tools=None,
    documents=None,
    **generate_kwargs,
):

Import

from transformers import pipeline

# preprocess is a method on TextGenerationPipeline instances:
generator = pipeline("text-generation", model="gpt2")
# Access: generator.preprocess(prompt_text, ...)

I/O Contract

Inputs

Name	Type	Required	Description
prompt_text	`str` or `Chat`	Yes	The raw text prompt or a `Chat` object containing a list of message dictionaries with "role" and "content" keys.
prefix	`str`	No	A prefix string prepended to the prompt before tokenization. Defaults to `""`.
handle_long_generation	`str` or `None`	No	Strategy for inputs exceeding model max length. `"hole"` truncates from the left. `None` applies no special handling.
add_special_tokens	`bool` or `None`	No	Whether to add special tokens (e.g., BOS/EOS) during tokenization. Ignored for chat inputs.
truncation	`bool` or `str` or `None`	No	Truncation strategy passed to the tokenizer.
padding	`bool` or `str` or `None`	No	Padding strategy passed to the tokenizer.
max_length	`int` or `None`	No	Maximum sequence length for tokenization.
continue_final_message	`bool` or `None`	No	For chat inputs: whether to continue the last assistant message (prefill) instead of adding a generation prompt. Auto-detected if `None`.
tokenizer_encode_kwargs	`dict` or `None`	No	Additional keyword arguments passed to the tokenizer's encoding method.
tools	`list` or `None`	No	Tool definitions passed to `apply_chat_template()` for function-calling models.
documents	`list` or `None`	No	Document context passed to `apply_chat_template()` for retrieval-augmented models.
**generate_kwargs	`dict`	No	Additional generation parameters (e.g., `max_new_tokens`, `temperature`) forwarded to the forward pass.

Outputs

Name	Type	Description
inputs	`dict`	A dictionary containing: `input_ids` (`torch.Tensor` of shape `[1, seq_len]`), `attention_mask` (`torch.Tensor` of shape `[1, seq_len]`), and `prompt_text` (the original input for postprocessing).

Usage Examples

Basic Usage

from transformers import pipeline

generator = pipeline("text-generation", model="gpt2")

# Preprocess a plain text prompt
inputs = generator.preprocess("Once upon a time")
print(inputs["input_ids"].shape)       # torch.Size([1, 5])
print(inputs["attention_mask"].shape)   # torch.Size([1, 5])
print(inputs["prompt_text"])            # "Once upon a time"

Chat Input with Template

from transformers import pipeline
from transformers.utils.chat_template_utils import Chat

generator = pipeline("text-generation", model="HuggingFaceH4/zephyr-7b-beta")

messages = [
    {"role": "user", "content": "What is the capital of France?"},
]
chat = Chat(messages)
inputs = generator.preprocess(chat)

Handling Long Inputs

from transformers import pipeline

generator = pipeline("text-generation", model="gpt2")

long_prompt = "word " * 2000  # Exceeds model max length
inputs = generator.preprocess(
    long_prompt,
    handle_long_generation="hole",
    max_new_tokens=100,
)
# input_ids truncated from the left to fit within model_max_length - 100

Related Pages

Implements Principle

Principle:Huggingface_Transformers_Input_Preprocessing

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment