Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Huggingface Transformers Pipeline Postprocess

From Leeroopedia
Knowledge Sources
Domains NLP, Inference
Last Updated 2026-02-13 00:00 GMT

Overview

Concrete tool for decoding generated token sequences into human-readable text or structured chat messages provided by the HuggingFace Transformers library.

Description

The TextGenerationPipeline.postprocess() method converts raw model outputs from the forward pass into structured result dictionaries. It handles three return types:

  • Full text (ReturnType.FULL_TEXT): Decodes the generated token sequence, extracts the newly generated portion by subtracting the prompt, then prepends the original prompt text to produce a complete output string.
  • New text (ReturnType.NEW_TEXT): Returns only the newly generated text without the prompt.
  • Tensors (ReturnType.TENSORS): Returns the raw generated token IDs without decoding.

For chat-formatted inputs, the method reconstructs the conversation structure:

  • If continue_final_message is true, the generated text is appended to the last assistant message's content.
  • Otherwise, the generated text is packaged as a new assistant message and appended to the message list.
  • When the tokenizer defines a response_schema, the generated text is parsed into structured fields (e.g., separating tool call JSON from plain content).

The method also distributes any auxiliary model outputs (attention scores, hidden states) across the per-sequence result dictionaries.

Usage

This method is called automatically by the pipeline's __call__ method as the final step. Use it directly when:

  • You have raw model outputs from _forward() and want to control the decoding parameters.
  • You want to switch between return types (full text, new text, tensors) without re-running inference.
  • You need to customize special-token handling or tokenization cleanup behavior.

Code Reference

Source Location

  • Repository: transformers
  • File: src/transformers/pipelines/text_generation.py (lines 426-500)

Signature

def postprocess(
    self,
    model_outputs,
    return_type=ReturnType.FULL_TEXT,
    clean_up_tokenization_spaces=True,
    continue_final_message=None,
    skip_special_tokens=None,
):

Import

from transformers import pipeline
from transformers.pipelines.text_generation import ReturnType

# postprocess is a method on TextGenerationPipeline instances:
generator = pipeline("text-generation", model="gpt2")
# Access: generator.postprocess(model_outputs, ...)

I/O Contract

Inputs

Name Type Required Description
model_outputs dict Yes Dictionary produced by _forward() containing: generated_sequence (torch.Tensor), input_ids (torch.Tensor or None), prompt_text (str or Chat), and optionally additional_outputs (dict).
return_type ReturnType No Output format: ReturnType.FULL_TEXT (default), ReturnType.NEW_TEXT, or ReturnType.TENSORS.
clean_up_tokenization_spaces bool No Whether to remove extra spaces introduced by subword tokenization. Defaults to True.
continue_final_message bool or None No For chat inputs: whether the last assistant message was a prefill that should be continued. Auto-detected from the chat if None.
skip_special_tokens bool or None No Whether to remove special tokens (BOS, EOS, PAD) during decoding. Defaults to True if not specified.

Outputs

Name Type Description
records list[dict] A list of dictionaries, one per return sequence. Each dictionary contains either generated_text (str or list[dict] for chat) or generated_token_ids (list[int]). May also contain keys from auxiliary model outputs.

Usage Examples

Basic Usage

from transformers import pipeline

generator = pipeline("text-generation", model="gpt2")

# Run the full pipeline
result = generator("The capital of France is", max_new_tokens=20)
print(result)
# [{'generated_text': 'The capital of France is Paris, which is...'}]

Manual Postprocessing with Different Return Types

from transformers import pipeline
from transformers.pipelines.text_generation import ReturnType

generator = pipeline("text-generation", model="gpt2")

# Preprocess and forward
inputs = generator.preprocess("Hello world")
model_outputs = generator._forward(inputs, max_new_tokens=10)

# Postprocess as full text
full = generator.postprocess(model_outputs, return_type=ReturnType.FULL_TEXT)
print(full)
# [{'generated_text': 'Hello world and welcome to...'}]

# Postprocess as new text only
new = generator.postprocess(model_outputs, return_type=ReturnType.NEW_TEXT)
print(new)
# [{'generated_text': ' and welcome to...'}]

# Postprocess as raw token IDs
tensors = generator.postprocess(model_outputs, return_type=ReturnType.TENSORS)
print(tensors)
# [{'generated_token_ids': [15496, 995, 290, ...]}]

Chat Output Formatting

from transformers import pipeline

generator = pipeline("text-generation", model="HuggingFaceH4/zephyr-7b-beta")

messages = [
    {"role": "user", "content": "What is 2+2?"},
]
result = generator(messages, max_new_tokens=50)
# result[0]["generated_text"] is a list of message dicts:
# [
#   {"role": "user", "content": "What is 2+2?"},
#   {"role": "assistant", "content": "2+2 equals 4."},
# ]

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment