Implementation:Pytorch Serve Scriptable Tokenizer Handler

Overview

Scriptable_Tokenizer_Handler implements text classification inference using a TorchScript-compatible tokenizer. The CustomTextClassifier class extends BaseHandler and ABC, providing HTML tag removal, lowercasing in preprocessing, softmax-based classification in inference, and label mapping in postprocessing. It loads a scripted model that bundles the tokenizer directly, enabling fully self-contained deployment.

Field	Value
Page Type	Implementation
Implementation Type	API Doc
Domains	Text_Classification, NLP
Knowledge Sources	Pytorch_Serve
Workflow	Text_Classification_Pipeline
Last Updated	2026-02-13 18:52 GMT

Description

This handler provides a complete text classification pipeline with a scriptable tokenizer. Unlike standard handlers that require separate tokenizer files, this approach bundles the tokenizer into the TorchScript model itself, simplifying deployment. The handler performs text cleaning (HTML removal, lowercasing), tokenization via the scripted model's embedded tokenizer, softmax classification, and maps numeric predictions to human-readable labels.

Key Responsibilities

Text Cleaning: Removes HTML tags and converts text to lowercase before tokenization
Scriptable Model Loading: Loads a TorchScript model that includes an embedded tokenizer
Softmax Classification: Applies softmax to model outputs for probability-based classification
Label Mapping: Maps predicted class indices to human-readable labels using map_class_to_label

Code Reference

Source Location

File	Lines	Description
`examples/text_classification_with_scriptable_tokenizer/handler.py`	L1-113	Full handler module
`examples/text_classification_with_scriptable_tokenizer/handler.py`	L19-24	`remove_html_tags()` utility function
`examples/text_classification_with_scriptable_tokenizer/handler.py`	L27-113	`CustomTextClassifier` class

Signature

def remove_html_tags(text):
    """
    Remove HTML tags from a string using regex.

    Args:
        text (str): Input text potentially containing HTML tags.

    Returns:
        str: Cleaned text with all HTML tags removed.
    """
    ...

class CustomTextClassifier(BaseHandler, ABC):
    """
    Text classification handler with scriptable tokenizer support.

    Extends BaseHandler to provide custom preprocessing with HTML
    tag removal, softmax-based inference, and label mapping.
    """

    def preprocess(self, data):
        """
        Clean and prepare text input for classification.

        Removes HTML tags, converts to lowercase, and prepares
        text for the scriptable tokenizer.

        Args:
            data (list): List of request input dictionaries.

        Returns:
            list: Cleaned text strings ready for tokenization.
        """
        ...

    def inference(self, data, *args, **kwargs):
        """
        Run classification with softmax output.

        Tokenizes input using the scripted model's embedded tokenizer,
        runs forward pass, and applies softmax for probability scores.

        Args:
            data (list): Preprocessed text inputs.

        Returns:
            torch.Tensor: Softmax probability scores per class.
        """
        ...

    def postprocess(self, data):
        """
        Map prediction scores to human-readable labels.

        Uses map_class_to_label to convert numeric predictions
        to labeled output.

        Args:
            data (torch.Tensor): Softmax output from inference.

        Returns:
            list: List of label-score mappings.
        """
        ...

    def _load_torchscript_model(self, model_pt_path):
        """
        Load a TorchScript model with embedded tokenizer.

        Overrides BaseHandler's model loading to use a scripted
        model that bundles the tokenizer.

        Args:
            model_pt_path (str): Path to the .pt TorchScript file.

        Returns:
            torch.jit.ScriptModule: Loaded scripted model with tokenizer.
        """
        ...

Import

from handler import CustomTextClassifier

# External dependencies:
import torch
import torch.nn.functional as F
from ts.torch_handler.base_handler import BaseHandler
from abc import ABC

I/O Contract

Method	Input	Output	Notes
`remove_html_tags(text)`	`text`: str with potential HTML	`str`: cleaned text	Lines 19-24; uses regex substitution
`preprocess(data)`	`data`: list of request dicts	`list`: cleaned text strings	Removes HTML tags, lowercases text
`inference(data)`	`data`: list of text strings	`torch.Tensor`: softmax probabilities	Uses scripted model with embedded tokenizer
`postprocess(data)`	`data`: `torch.Tensor` softmax scores	`list`: label-score mappings	Uses `map_class_to_label` for mapping
`_load_torchscript_model(model_pt_path)`	`model_pt_path`: str path to .pt file	`torch.jit.ScriptModule`	Loads model with bundled tokenizer

Usage Examples

Example 1: Handler Registration in Model Archive

# Create model archive with the scriptable tokenizer handler
torch-model-archiver --model-name text_classifier \
    --version 1.0 \
    --serialized-file model.pt \
    --handler handler.py \
    --extra-files "index_to_name.json"

Example 2: Preprocessing Pipeline

# The preprocess method cleans raw HTML text:
# Input: [{"data": "<p>This is a <b>test</b> review.</p>"}]
# After HTML removal: "this is a test review."
# The cleaned text is then passed to the scriptable tokenizer
# embedded in the TorchScript model during inference.

Example 3: Inference Request

# Send a text classification request
curl -X POST http://localhost:8080/predictions/text_classifier \
    -H "Content-Type: application/json" \
    -d '{"data": "This movie was absolutely wonderful and entertaining."}'

Related Pages

Principle:Pytorch_Serve_Text_Classification - The text classification principle this handler implements
Implementation:Pytorch_Serve_Spm_Dataset - Alternative text classification approach using SentencePiece tokenization
Implementation:Pytorch_Serve_BaseHandler - Base handler class that CustomTextClassifier extends
Implementation:Pytorch_Serve_Generate_Model_Archive - Packages the handler into a .mar archive

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment