Overview
Scriptable_Tokenizer_Handler implements text classification inference using a TorchScript-compatible tokenizer. The CustomTextClassifier class extends BaseHandler and ABC, providing HTML tag removal, lowercasing in preprocessing, softmax-based classification in inference, and label mapping in postprocessing. It loads a scripted model that bundles the tokenizer directly, enabling fully self-contained deployment.
Description
This handler provides a complete text classification pipeline with a scriptable tokenizer. Unlike standard handlers that require separate tokenizer files, this approach bundles the tokenizer into the TorchScript model itself, simplifying deployment. The handler performs text cleaning (HTML removal, lowercasing), tokenization via the scripted model's embedded tokenizer, softmax classification, and maps numeric predictions to human-readable labels.
Key Responsibilities
- Text Cleaning: Removes HTML tags and converts text to lowercase before tokenization
- Scriptable Model Loading: Loads a TorchScript model that includes an embedded tokenizer
- Softmax Classification: Applies softmax to model outputs for probability-based classification
- Label Mapping: Maps predicted class indices to human-readable labels using
map_class_to_label
Code Reference
Source Location
| File |
Lines |
Description
|
examples/text_classification_with_scriptable_tokenizer/handler.py |
L1-113 |
Full handler module
|
examples/text_classification_with_scriptable_tokenizer/handler.py |
L19-24 |
remove_html_tags() utility function
|
examples/text_classification_with_scriptable_tokenizer/handler.py |
L27-113 |
CustomTextClassifier class
|
Signature
def remove_html_tags(text):
"""
Remove HTML tags from a string using regex.
Args:
text (str): Input text potentially containing HTML tags.
Returns:
str: Cleaned text with all HTML tags removed.
"""
...
class CustomTextClassifier(BaseHandler, ABC):
"""
Text classification handler with scriptable tokenizer support.
Extends BaseHandler to provide custom preprocessing with HTML
tag removal, softmax-based inference, and label mapping.
"""
def preprocess(self, data):
"""
Clean and prepare text input for classification.
Removes HTML tags, converts to lowercase, and prepares
text for the scriptable tokenizer.
Args:
data (list): List of request input dictionaries.
Returns:
list: Cleaned text strings ready for tokenization.
"""
...
def inference(self, data, *args, **kwargs):
"""
Run classification with softmax output.
Tokenizes input using the scripted model's embedded tokenizer,
runs forward pass, and applies softmax for probability scores.
Args:
data (list): Preprocessed text inputs.
Returns:
torch.Tensor: Softmax probability scores per class.
"""
...
def postprocess(self, data):
"""
Map prediction scores to human-readable labels.
Uses map_class_to_label to convert numeric predictions
to labeled output.
Args:
data (torch.Tensor): Softmax output from inference.
Returns:
list: List of label-score mappings.
"""
...
def _load_torchscript_model(self, model_pt_path):
"""
Load a TorchScript model with embedded tokenizer.
Overrides BaseHandler's model loading to use a scripted
model that bundles the tokenizer.
Args:
model_pt_path (str): Path to the .pt TorchScript file.
Returns:
torch.jit.ScriptModule: Loaded scripted model with tokenizer.
"""
...
Import
from handler import CustomTextClassifier
# External dependencies:
import torch
import torch.nn.functional as F
from ts.torch_handler.base_handler import BaseHandler
from abc import ABC
I/O Contract
| Method |
Input |
Output |
Notes
|
remove_html_tags(text) |
text: str with potential HTML |
str: cleaned text |
Lines 19-24; uses regex substitution
|
preprocess(data) |
data: list of request dicts |
list: cleaned text strings |
Removes HTML tags, lowercases text
|
inference(data) |
data: list of text strings |
torch.Tensor: softmax probabilities |
Uses scripted model with embedded tokenizer
|
postprocess(data) |
data: torch.Tensor softmax scores |
list: label-score mappings |
Uses map_class_to_label for mapping
|
_load_torchscript_model(model_pt_path) |
model_pt_path: str path to .pt file |
torch.jit.ScriptModule |
Loads model with bundled tokenizer
|
Usage Examples
Example 1: Handler Registration in Model Archive
# Create model archive with the scriptable tokenizer handler
torch-model-archiver --model-name text_classifier \
--version 1.0 \
--serialized-file model.pt \
--handler handler.py \
--extra-files "index_to_name.json"
Example 2: Preprocessing Pipeline
# The preprocess method cleans raw HTML text:
# Input: [{"data": "<p>This is a <b>test</b> review.</p>"}]
# After HTML removal: "this is a test review."
# The cleaned text is then passed to the scriptable tokenizer
# embedded in the TorchScript model during inference.
Example 3: Inference Request
# Send a text classification request
curl -X POST http://localhost:8080/predictions/text_classifier \
-H "Content-Type: application/json" \
-d '{"data": "This movie was absolutely wonderful and entertaining."}'
Related Pages