Implementation:Huggingface Transformers Pipeline Postprocess

Knowledge Sources	Transformers Transformers Docs
Domains	NLP, Inference
Last Updated	2026-02-13 00:00 GMT

Overview

Concrete tool for decoding generated token sequences into human-readable text or structured chat messages provided by the HuggingFace Transformers library.

Description

The TextGenerationPipeline.postprocess() method converts raw model outputs from the forward pass into structured result dictionaries. It handles three return types:

Full text (ReturnType.FULL_TEXT): Decodes the generated token sequence, extracts the newly generated portion by subtracting the prompt, then prepends the original prompt text to produce a complete output string.
New text (ReturnType.NEW_TEXT): Returns only the newly generated text without the prompt.
Tensors (ReturnType.TENSORS): Returns the raw generated token IDs without decoding.

For chat-formatted inputs, the method reconstructs the conversation structure:

If continue_final_message is true, the generated text is appended to the last assistant message's content.
Otherwise, the generated text is packaged as a new assistant message and appended to the message list.
When the tokenizer defines a response_schema, the generated text is parsed into structured fields (e.g., separating tool call JSON from plain content).

The method also distributes any auxiliary model outputs (attention scores, hidden states) across the per-sequence result dictionaries.

Usage

This method is called automatically by the pipeline's __call__ method as the final step. Use it directly when:

You have raw model outputs from _forward() and want to control the decoding parameters.
You want to switch between return types (full text, new text, tensors) without re-running inference.
You need to customize special-token handling or tokenization cleanup behavior.

Code Reference

Source Location

Repository: transformers
File: src/transformers/pipelines/text_generation.py (lines 426-500)

Signature

def postprocess(
    self,
    model_outputs,
    return_type=ReturnType.FULL_TEXT,
    clean_up_tokenization_spaces=True,
    continue_final_message=None,
    skip_special_tokens=None,
):

Import

from transformers import pipeline
from transformers.pipelines.text_generation import ReturnType

# postprocess is a method on TextGenerationPipeline instances:
generator = pipeline("text-generation", model="gpt2")
# Access: generator.postprocess(model_outputs, ...)

I/O Contract

Inputs

Name	Type	Required	Description
model_outputs	`dict`	Yes	Dictionary produced by `_forward()` containing: `generated_sequence` (`torch.Tensor`), `input_ids` (`torch.Tensor` or `None`), `prompt_text` (`str` or `Chat`), and optionally `additional_outputs` (`dict`).
return_type	`ReturnType`	No	Output format: `ReturnType.FULL_TEXT` (default), `ReturnType.NEW_TEXT`, or `ReturnType.TENSORS`.
clean_up_tokenization_spaces	`bool`	No	Whether to remove extra spaces introduced by subword tokenization. Defaults to `True`.
continue_final_message	`bool` or `None`	No	For chat inputs: whether the last assistant message was a prefill that should be continued. Auto-detected from the chat if `None`.
skip_special_tokens	`bool` or `None`	No	Whether to remove special tokens (BOS, EOS, PAD) during decoding. Defaults to `True` if not specified.

Outputs

Name	Type	Description
records	`list[dict]`	A list of dictionaries, one per return sequence. Each dictionary contains either `generated_text` (`str` or `list[dict]` for chat) or `generated_token_ids` (`list[int]`). May also contain keys from auxiliary model outputs.

Usage Examples

Basic Usage

from transformers import pipeline

generator = pipeline("text-generation", model="gpt2")

# Run the full pipeline
result = generator("The capital of France is", max_new_tokens=20)
print(result)
# [{'generated_text': 'The capital of France is Paris, which is...'}]

Manual Postprocessing with Different Return Types

from transformers import pipeline
from transformers.pipelines.text_generation import ReturnType

generator = pipeline("text-generation", model="gpt2")

# Preprocess and forward
inputs = generator.preprocess("Hello world")
model_outputs = generator._forward(inputs, max_new_tokens=10)

# Postprocess as full text
full = generator.postprocess(model_outputs, return_type=ReturnType.FULL_TEXT)
print(full)
# [{'generated_text': 'Hello world and welcome to...'}]

# Postprocess as new text only
new = generator.postprocess(model_outputs, return_type=ReturnType.NEW_TEXT)
print(new)
# [{'generated_text': ' and welcome to...'}]

# Postprocess as raw token IDs
tensors = generator.postprocess(model_outputs, return_type=ReturnType.TENSORS)
print(tensors)
# [{'generated_token_ids': [15496, 995, 290, ...]}]

Chat Output Formatting

from transformers import pipeline

generator = pipeline("text-generation", model="HuggingFaceH4/zephyr-7b-beta")

messages = [
    {"role": "user", "content": "What is 2+2?"},
]
result = generator(messages, max_new_tokens=50)
# result[0]["generated_text"] is a list of message dicts:
# [
#   {"role": "user", "content": "What is 2+2?"},
#   {"role": "assistant", "content": "2+2 equals 4."},
# ]

Related Pages

Implements Principle

Principle:Huggingface_Transformers_Output_Postprocessing

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment