Implementation:Huggingface Transformers Pipeline Preprocess
| Knowledge Sources | |
|---|---|
| Domains | NLP, Inference |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
Concrete tool for converting raw text or chat inputs into tokenized tensor dictionaries provided by the HuggingFace Transformers library.
Description
The TextGenerationPipeline.preprocess() method transforms raw text prompts or chat message histories into the tensor format required by the model's generate() method. It handles two input modalities:
- Plain text: Prepends any configured prefix, then tokenizes using the pipeline's tokenizer, producing
input_idsandattention_masktensors. - Chat messages: Applies the tokenizer's chat template via
apply_chat_template(), rendering the message list into tokenized tensors with proper turn delimiters and generation prompts.
The method also implements the "hole" strategy for handling inputs that exceed the model's maximum context length. When enabled, it truncates from the left side of the input, preserving the most recent context while leaving sufficient room for the requested number of new tokens.
Additional tokenizer parameters (add_special_tokens, truncation, padding, max_length) can be passed through to control the encoding behavior. Extra keyword arguments not consumed by preprocessing are forwarded as generation parameters.
Usage
This method is called automatically by the pipeline's __call__ method during inference. Use it directly when you need fine-grained control over the preprocessing step, such as:
- Inspecting the tokenized representation before model invocation.
- Applying custom chat templates with tool definitions or document context.
- Debugging tokenization issues with specific inputs.
Code Reference
Source Location
- Repository: transformers
- File:
src/transformers/pipelines/text_generation.py(lines 295-361)
Signature
def preprocess(
self,
prompt_text,
prefix="",
handle_long_generation=None,
add_special_tokens=None,
truncation=None,
padding=None,
max_length=None,
continue_final_message=None,
tokenizer_encode_kwargs=None,
tools=None,
documents=None,
**generate_kwargs,
):
Import
from transformers import pipeline
# preprocess is a method on TextGenerationPipeline instances:
generator = pipeline("text-generation", model="gpt2")
# Access: generator.preprocess(prompt_text, ...)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| prompt_text | str or Chat |
Yes | The raw text prompt or a Chat object containing a list of message dictionaries with "role" and "content" keys.
|
| prefix | str |
No | A prefix string prepended to the prompt before tokenization. Defaults to "".
|
| handle_long_generation | str or None |
No | Strategy for inputs exceeding model max length. "hole" truncates from the left. None applies no special handling.
|
| add_special_tokens | bool or None |
No | Whether to add special tokens (e.g., BOS/EOS) during tokenization. Ignored for chat inputs. |
| truncation | bool or str or None |
No | Truncation strategy passed to the tokenizer. |
| padding | bool or str or None |
No | Padding strategy passed to the tokenizer. |
| max_length | int or None |
No | Maximum sequence length for tokenization. |
| continue_final_message | bool or None |
No | For chat inputs: whether to continue the last assistant message (prefill) instead of adding a generation prompt. Auto-detected if None.
|
| tokenizer_encode_kwargs | dict or None |
No | Additional keyword arguments passed to the tokenizer's encoding method. |
| tools | list or None |
No | Tool definitions passed to apply_chat_template() for function-calling models.
|
| documents | list or None |
No | Document context passed to apply_chat_template() for retrieval-augmented models.
|
| **generate_kwargs | dict |
No | Additional generation parameters (e.g., max_new_tokens, temperature) forwarded to the forward pass.
|
Outputs
| Name | Type | Description |
|---|---|---|
| inputs | dict |
A dictionary containing: input_ids (torch.Tensor of shape [1, seq_len]), attention_mask (torch.Tensor of shape [1, seq_len]), and prompt_text (the original input for postprocessing).
|
Usage Examples
Basic Usage
from transformers import pipeline
generator = pipeline("text-generation", model="gpt2")
# Preprocess a plain text prompt
inputs = generator.preprocess("Once upon a time")
print(inputs["input_ids"].shape) # torch.Size([1, 5])
print(inputs["attention_mask"].shape) # torch.Size([1, 5])
print(inputs["prompt_text"]) # "Once upon a time"
Chat Input with Template
from transformers import pipeline
from transformers.utils.chat_template_utils import Chat
generator = pipeline("text-generation", model="HuggingFaceH4/zephyr-7b-beta")
messages = [
{"role": "user", "content": "What is the capital of France?"},
]
chat = Chat(messages)
inputs = generator.preprocess(chat)
Handling Long Inputs
from transformers import pipeline
generator = pipeline("text-generation", model="gpt2")
long_prompt = "word " * 2000 # Exceeds model max length
inputs = generator.preprocess(
long_prompt,
handle_long_generation="hole",
max_new_tokens=100,
)
# input_ids truncated from the left to fit within model_max_length - 100