Implementation:Huggingface Transformers Pipeline Postprocess
| Knowledge Sources | |
|---|---|
| Domains | NLP, Inference |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
Concrete tool for decoding generated token sequences into human-readable text or structured chat messages provided by the HuggingFace Transformers library.
Description
The TextGenerationPipeline.postprocess() method converts raw model outputs from the forward pass into structured result dictionaries. It handles three return types:
- Full text (
ReturnType.FULL_TEXT): Decodes the generated token sequence, extracts the newly generated portion by subtracting the prompt, then prepends the original prompt text to produce a complete output string. - New text (
ReturnType.NEW_TEXT): Returns only the newly generated text without the prompt. - Tensors (
ReturnType.TENSORS): Returns the raw generated token IDs without decoding.
For chat-formatted inputs, the method reconstructs the conversation structure:
- If
continue_final_messageis true, the generated text is appended to the last assistant message's content. - Otherwise, the generated text is packaged as a new assistant message and appended to the message list.
- When the tokenizer defines a
response_schema, the generated text is parsed into structured fields (e.g., separating tool call JSON from plain content).
The method also distributes any auxiliary model outputs (attention scores, hidden states) across the per-sequence result dictionaries.
Usage
This method is called automatically by the pipeline's __call__ method as the final step. Use it directly when:
- You have raw model outputs from
_forward()and want to control the decoding parameters. - You want to switch between return types (full text, new text, tensors) without re-running inference.
- You need to customize special-token handling or tokenization cleanup behavior.
Code Reference
Source Location
- Repository: transformers
- File:
src/transformers/pipelines/text_generation.py(lines 426-500)
Signature
def postprocess(
self,
model_outputs,
return_type=ReturnType.FULL_TEXT,
clean_up_tokenization_spaces=True,
continue_final_message=None,
skip_special_tokens=None,
):
Import
from transformers import pipeline
from transformers.pipelines.text_generation import ReturnType
# postprocess is a method on TextGenerationPipeline instances:
generator = pipeline("text-generation", model="gpt2")
# Access: generator.postprocess(model_outputs, ...)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_outputs | dict |
Yes | Dictionary produced by _forward() containing: generated_sequence (torch.Tensor), input_ids (torch.Tensor or None), prompt_text (str or Chat), and optionally additional_outputs (dict).
|
| return_type | ReturnType |
No | Output format: ReturnType.FULL_TEXT (default), ReturnType.NEW_TEXT, or ReturnType.TENSORS.
|
| clean_up_tokenization_spaces | bool |
No | Whether to remove extra spaces introduced by subword tokenization. Defaults to True.
|
| continue_final_message | bool or None |
No | For chat inputs: whether the last assistant message was a prefill that should be continued. Auto-detected from the chat if None.
|
| skip_special_tokens | bool or None |
No | Whether to remove special tokens (BOS, EOS, PAD) during decoding. Defaults to True if not specified.
|
Outputs
| Name | Type | Description |
|---|---|---|
| records | list[dict] |
A list of dictionaries, one per return sequence. Each dictionary contains either generated_text (str or list[dict] for chat) or generated_token_ids (list[int]). May also contain keys from auxiliary model outputs.
|
Usage Examples
Basic Usage
from transformers import pipeline
generator = pipeline("text-generation", model="gpt2")
# Run the full pipeline
result = generator("The capital of France is", max_new_tokens=20)
print(result)
# [{'generated_text': 'The capital of France is Paris, which is...'}]
Manual Postprocessing with Different Return Types
from transformers import pipeline
from transformers.pipelines.text_generation import ReturnType
generator = pipeline("text-generation", model="gpt2")
# Preprocess and forward
inputs = generator.preprocess("Hello world")
model_outputs = generator._forward(inputs, max_new_tokens=10)
# Postprocess as full text
full = generator.postprocess(model_outputs, return_type=ReturnType.FULL_TEXT)
print(full)
# [{'generated_text': 'Hello world and welcome to...'}]
# Postprocess as new text only
new = generator.postprocess(model_outputs, return_type=ReturnType.NEW_TEXT)
print(new)
# [{'generated_text': ' and welcome to...'}]
# Postprocess as raw token IDs
tensors = generator.postprocess(model_outputs, return_type=ReturnType.TENSORS)
print(tensors)
# [{'generated_token_ids': [15496, 995, 290, ...]}]
Chat Output Formatting
from transformers import pipeline
generator = pipeline("text-generation", model="HuggingFaceH4/zephyr-7b-beta")
messages = [
{"role": "user", "content": "What is 2+2?"},
]
result = generator(messages, max_new_tokens=50)
# result[0]["generated_text"] is a list of message dicts:
# [
# {"role": "user", "content": "What is 2+2?"},
# {"role": "assistant", "content": "2+2 equals 4."},
# ]