Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Pytorch Serve Scriptable Tokenizer Handler

From Leeroopedia

Overview

Scriptable_Tokenizer_Handler implements text classification inference using a TorchScript-compatible tokenizer. The CustomTextClassifier class extends BaseHandler and ABC, providing HTML tag removal, lowercasing in preprocessing, softmax-based classification in inference, and label mapping in postprocessing. It loads a scripted model that bundles the tokenizer directly, enabling fully self-contained deployment.

Field Value
Page Type Implementation
Implementation Type API Doc
Domains Text_Classification, NLP
Knowledge Sources Pytorch_Serve
Workflow Text_Classification_Pipeline
Last Updated 2026-02-13 18:52 GMT

Description

This handler provides a complete text classification pipeline with a scriptable tokenizer. Unlike standard handlers that require separate tokenizer files, this approach bundles the tokenizer into the TorchScript model itself, simplifying deployment. The handler performs text cleaning (HTML removal, lowercasing), tokenization via the scripted model's embedded tokenizer, softmax classification, and maps numeric predictions to human-readable labels.

Key Responsibilities

  • Text Cleaning: Removes HTML tags and converts text to lowercase before tokenization
  • Scriptable Model Loading: Loads a TorchScript model that includes an embedded tokenizer
  • Softmax Classification: Applies softmax to model outputs for probability-based classification
  • Label Mapping: Maps predicted class indices to human-readable labels using map_class_to_label

Code Reference

Source Location

File Lines Description
examples/text_classification_with_scriptable_tokenizer/handler.py L1-113 Full handler module
examples/text_classification_with_scriptable_tokenizer/handler.py L19-24 remove_html_tags() utility function
examples/text_classification_with_scriptable_tokenizer/handler.py L27-113 CustomTextClassifier class

Signature

def remove_html_tags(text):
    """
    Remove HTML tags from a string using regex.

    Args:
        text (str): Input text potentially containing HTML tags.

    Returns:
        str: Cleaned text with all HTML tags removed.
    """
    ...

class CustomTextClassifier(BaseHandler, ABC):
    """
    Text classification handler with scriptable tokenizer support.

    Extends BaseHandler to provide custom preprocessing with HTML
    tag removal, softmax-based inference, and label mapping.
    """

    def preprocess(self, data):
        """
        Clean and prepare text input for classification.

        Removes HTML tags, converts to lowercase, and prepares
        text for the scriptable tokenizer.

        Args:
            data (list): List of request input dictionaries.

        Returns:
            list: Cleaned text strings ready for tokenization.
        """
        ...

    def inference(self, data, *args, **kwargs):
        """
        Run classification with softmax output.

        Tokenizes input using the scripted model's embedded tokenizer,
        runs forward pass, and applies softmax for probability scores.

        Args:
            data (list): Preprocessed text inputs.

        Returns:
            torch.Tensor: Softmax probability scores per class.
        """
        ...

    def postprocess(self, data):
        """
        Map prediction scores to human-readable labels.

        Uses map_class_to_label to convert numeric predictions
        to labeled output.

        Args:
            data (torch.Tensor): Softmax output from inference.

        Returns:
            list: List of label-score mappings.
        """
        ...

    def _load_torchscript_model(self, model_pt_path):
        """
        Load a TorchScript model with embedded tokenizer.

        Overrides BaseHandler's model loading to use a scripted
        model that bundles the tokenizer.

        Args:
            model_pt_path (str): Path to the .pt TorchScript file.

        Returns:
            torch.jit.ScriptModule: Loaded scripted model with tokenizer.
        """
        ...

Import

from handler import CustomTextClassifier

# External dependencies:
import torch
import torch.nn.functional as F
from ts.torch_handler.base_handler import BaseHandler
from abc import ABC

I/O Contract

Method Input Output Notes
remove_html_tags(text) text: str with potential HTML str: cleaned text Lines 19-24; uses regex substitution
preprocess(data) data: list of request dicts list: cleaned text strings Removes HTML tags, lowercases text
inference(data) data: list of text strings torch.Tensor: softmax probabilities Uses scripted model with embedded tokenizer
postprocess(data) data: torch.Tensor softmax scores list: label-score mappings Uses map_class_to_label for mapping
_load_torchscript_model(model_pt_path) model_pt_path: str path to .pt file torch.jit.ScriptModule Loads model with bundled tokenizer

Usage Examples

Example 1: Handler Registration in Model Archive

# Create model archive with the scriptable tokenizer handler
torch-model-archiver --model-name text_classifier \
    --version 1.0 \
    --serialized-file model.pt \
    --handler handler.py \
    --extra-files "index_to_name.json"

Example 2: Preprocessing Pipeline

# The preprocess method cleans raw HTML text:
# Input: [{"data": "<p>This is a <b>test</b> review.</p>"}]
# After HTML removal: "this is a test review."
# The cleaned text is then passed to the scriptable tokenizer
# embedded in the TorchScript model during inference.

Example 3: Inference Request

# Send a text classification request
curl -X POST http://localhost:8080/predictions/text_classifier \
    -H "Content-Type: application/json" \
    -d '{"data": "This movie was absolutely wonderful and entertaining."}'

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment