Implementation:Huggingface Transformers Pipeline Forward
Appearance
| Knowledge Sources | |
|---|---|
| Domains | NLP, Inference, Deep Learning |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
Concrete tool for executing autoregressive text generation on preprocessed inputs provided by the HuggingFace Transformers library.
Description
The TextGenerationPipeline._forward() method is the core inference step that takes preprocessed tensor inputs and invokes the model's generate() method to produce token sequences. It orchestrates the following operations:
- Input extraction: Retrieves
input_ids,attention_mask, andprompt_textfrom the model inputs dictionary. - Empty prompt handling: Supports unconditional generation by setting
input_idstoNonewhen the prompt is empty. - Prefix length adjustment: When a prefix was prepended during preprocessing, adjusts
max_lengthandmin_lengthto compensate for the extra prefix tokens, ensuring the user's intended generation length is preserved. - Generation config management: Applies the pipeline's default
GenerationConfigunless the user provides a custom one in the keyword arguments. - Model invocation: Calls
self.model.generate()with the prepared inputs and generation parameters. - Output reshaping: Reshapes the generated sequences from a flat batch
[out_b, seq_len]to[in_b, num_return_sequences, seq_len]. - Auxiliary output collection: Extracts and reshapes any additional model outputs (attention weights, hidden states, scores) from the
ModelOutputobject.
Usage
This method is called automatically by the pipeline's __call__ method between preprocessing and postprocessing. Use it directly when you need:
- Fine-grained control over the generation step without postprocessing.
- Access to raw generated token IDs before decoding.
- Inspection of auxiliary model outputs (attention, hidden states).
Code Reference
Source Location
- Repository: transformers
- File:
src/transformers/pipelines/text_generation.py(lines 363-424)
Signature
def _forward(self, model_inputs, **generate_kwargs):
Import
from transformers import pipeline
# _forward is a method on TextGenerationPipeline instances:
generator = pipeline("text-generation", model="gpt2")
# Access: generator._forward(model_inputs, **generate_kwargs)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_inputs | dict |
Yes | Dictionary produced by preprocess() containing: input_ids (torch.Tensor), attention_mask (torch.Tensor), and prompt_text (str or Chat).
|
| **generate_kwargs | dict |
No | Generation parameters forwarded to model.generate(). Common keys include max_new_tokens, temperature, top_k, top_p, do_sample, num_beams, num_return_sequences, generation_config.
|
Outputs
| Name | Type | Description |
|---|---|---|
| model_outputs | dict |
Dictionary containing: generated_sequence (torch.Tensor of shape [in_b, num_return_sequences, seq_len]), input_ids (torch.Tensor or None), prompt_text (str or Chat). Optionally includes additional_outputs (dict of reshaped auxiliary outputs such as attention scores or hidden states).
|
Usage Examples
Basic Usage
from transformers import pipeline
generator = pipeline("text-generation", model="gpt2")
# Step 1: Preprocess
inputs = generator.preprocess("The quick brown fox")
# Step 2: Forward pass
model_outputs = generator._forward(inputs, max_new_tokens=20)
print(model_outputs["generated_sequence"].shape)
# torch.Size([1, 1, seq_len]) -- 1 input, 1 return sequence
print(model_outputs["prompt_text"])
# "The quick brown fox"
Multiple Return Sequences
from transformers import pipeline
generator = pipeline("text-generation", model="gpt2")
inputs = generator.preprocess("Once upon a time")
model_outputs = generator._forward(
inputs,
max_new_tokens=30,
num_return_sequences=3,
do_sample=True,
temperature=0.8,
)
print(model_outputs["generated_sequence"].shape)
# torch.Size([1, 3, seq_len]) -- 1 input, 3 return sequences
Related Pages
Implements Principle
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment