Implementation:Pytorch Serve NMT Translation Handler

Overview

LanguageTranslationHandler is a TorchServe handler for neural machine translation using fairseq's TransformerModel. It extends BaseHandler and provides beam search translation with configurable BPE settings loaded from setup_config.json. The handler accepts text input, translates using beam search with beam size 5, and returns JSON output pairing the original input with its translation.

Field	Value
Implementation Name	NMT_Translation_Handler
Type	Example Handler
Workflow	Neural_Machine_Translation
Domains	NLP, Machine_Translation
Knowledge Sources	Pytorch_Serve
Last Updated	2026-02-13 18:52 GMT

Description

The LanguageTranslationHandler class implements the full inference lifecycle for sequence-to-sequence neural machine translation. During initialization, it reads BPE configuration from setup_config.json in the model directory, loads a fairseq TransformerModel with Moses tokenizer, and moves the model to the appropriate device. The translation uses beam search with a fixed beam width of 5.

Key Responsibilities

Configuration Loading: Reads setup_config.json for BPE settings and translated output field name
Model Loading: Loads fairseq TransformerModel.from_pretrained() with Moses tokenizer and configured BPE
Text Preprocessing: Extracts text from request data and decodes bytes to UTF-8
Beam Search Translation: Calls model.translate() with beam=5 under torch.no_grad()
JSON Output: Returns list of JSON strings with input text and translated output, using configurable output field name

Usage

from model_handler_generalized import LanguageTranslationHandler

The handler requires a setup_config.json in the model directory:

{
    "bpe": "fastbpe",
    "translated_output": "french_output"
}

Code Reference

Source Location

File	Lines	Description
`examples/nmt_transformer/model_handler_generalized.py`	L1-74	Full handler module (73 lines)
`examples/nmt_transformer/model_handler_generalized.py`	L10-74	`LanguageTranslationHandler` class definition
`examples/nmt_transformer/model_handler_generalized.py`	L18-49	`initialize(context)` -- config loading, fairseq model setup
`examples/nmt_transformer/model_handler_generalized.py`	L51-57	`preprocess(data)` -- text extraction and UTF-8 decoding
`examples/nmt_transformer/model_handler_generalized.py`	L59-70	`inference(data)` -- beam search translation with JSON formatting
`examples/nmt_transformer/model_handler_generalized.py`	L72-73	`postprocess(data)` -- identity passthrough

Signature

class LanguageTranslationHandler(BaseHandler):

    def __init__(self):
        self._context = None
        self.initialized = False
        self.model = None
        self.device = None

    def initialize(self, context):
        """
        Load fairseq TransformerModel with Moses tokenizer and BPE config.

        Reads setup_config.json from model_dir for BPE settings.
        Loads TransformerModel.from_pretrained() with checkpoint file
        'model.pt' and Moses tokenizer.

        Args:
            context: TorchServe context with system_properties and manifest.
        """
        ...

    def preprocess(self, data):
        """
        Extract and decode text inputs from request data.

        Args:
            data (list): List of dicts with "data" or "body" keys
                         containing bytes-encoded text.

        Returns:
            list[str]: List of decoded UTF-8 text strings.
        """
        ...

    def inference(self, data, *args, **kwargs):
        """
        Translate input texts using beam search.

        Calls model.translate() with beam=5 under torch.no_grad().
        Returns JSON strings pairing input text with translation.

        Args:
            data (list[str]): List of source language text strings.

        Returns:
            list[str]: List of JSON strings with input and translation.
        """
        ...

    def postprocess(self, data):
        """
        Return inference output unchanged.

        Args:
            data (list): JSON string list from inference.

        Returns:
            list: Same as input.
        """
        ...

Import

# Handler imports
from ts.torch_handler.base_handler import BaseHandler
from fairseq.models.transformer import TransformerModel
import torch
import json
import os

I/O Contract

Method	Input	Output	Notes
`initialize(context)`	Context with `system_properties["model_dir"]` containing `setup_config.json` and `model.pt`	None (sets `self.model`, `self.setup_config`, `self.initialized = True`)	Warns if `setup_config.json` is missing
`preprocess(data)`	`list[dict]` with `"data"`/`"body"` containing bytes	`list[str]` -- UTF-8 decoded text	Calls `.decode('utf-8')` on each input
`inference(data)`	`list[str]` -- source language texts	`list[str]` -- JSON strings with input/translation pairs	Uses `beam=5`; output key from `setup_config["translated_output"]`
`postprocess(data)`	`list[str]` from inference	`list[str]`	Identity passthrough

Request/Response Format

// Request (single text input)
{
    "data": "Hello, how are you?"
}

// Response (JSON string)
{
    "input": "Hello, how are you?",
    "french_output": "Bonjour, comment allez-vous?"
}

Usage Examples

Example 1: Initialization with BPE Configuration

# From model_handler_generalized.py L18-49: initialize() loads config and model
def initialize(self, context):
    self._context = context
    self.initialized = True
    self.manifest = context.manifest

    properties = context.system_properties
    model_dir = properties.get("model_dir")

    self.device = torch.device(
        "cuda:" + str(properties.get("gpu_id"))
        if torch.cuda.is_available() and properties.get("gpu_id") is not None
        else "cpu"
    )

    # Read BPE config from setup_config.json
    setup_config_path = os.path.join(model_dir, "setup_config.json")
    if os.path.isfile(setup_config_path):
        with open(setup_config_path) as setup_config_file:
            self.setup_config = json.load(setup_config_file)
    else:
        logger.warning('Missing the setup_config.json file.')

    # Load fairseq TransformerModel with Moses tokenizer
    self.model = TransformerModel.from_pretrained(
        model_dir,
        checkpoint_file='model.pt',
        data_name_or_path=model_dir,
        tokenizer='moses',
        bpe=self.setup_config["bpe"]
    )
    self.model.to(self.device)
    self.model.eval()
    self.initialized = True

Example 2: Beam Search Inference

# From model_handler_generalized.py L59-70: inference() with beam=5
def inference(self, data, *args, **kwargs):
    inference_output = []
    with torch.no_grad():
        translation = self.model.translate(data, beam=5)
    for i in range(0, len(data)):
        output = {
            "input": data[i],
            self.setup_config["translated_output"]: translation[i]
        }
        inference_output.append(json.dumps(output))
    return inference_output

Related Pages

Principle:Pytorch_Serve_Neural_Machine_Translation -- principle for serving sequence-to-sequence translation models
Implementation:Pytorch_Serve_BaseHandler - Parent class providing the handle() orchestration

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment