Implementation:Pytorch Serve NMT Translation Handler
Overview
LanguageTranslationHandler is a TorchServe handler for neural machine translation using fairseq's TransformerModel. It extends BaseHandler and provides beam search translation with configurable BPE settings loaded from setup_config.json. The handler accepts text input, translates using beam search with beam size 5, and returns JSON output pairing the original input with its translation.
| Field | Value |
|---|---|
| Implementation Name | NMT_Translation_Handler |
| Type | Example Handler |
| Workflow | Neural_Machine_Translation |
| Domains | NLP, Machine_Translation |
| Knowledge Sources | Pytorch_Serve |
| Last Updated | 2026-02-13 18:52 GMT |
Description
The LanguageTranslationHandler class implements the full inference lifecycle for sequence-to-sequence neural machine translation. During initialization, it reads BPE configuration from setup_config.json in the model directory, loads a fairseq TransformerModel with Moses tokenizer, and moves the model to the appropriate device. The translation uses beam search with a fixed beam width of 5.
Key Responsibilities
- Configuration Loading: Reads
setup_config.jsonfor BPE settings and translated output field name - Model Loading: Loads fairseq
TransformerModel.from_pretrained()with Moses tokenizer and configured BPE - Text Preprocessing: Extracts text from request data and decodes bytes to UTF-8
- Beam Search Translation: Calls
model.translate()withbeam=5undertorch.no_grad() - JSON Output: Returns list of JSON strings with input text and translated output, using configurable output field name
Usage
from model_handler_generalized import LanguageTranslationHandler
The handler requires a setup_config.json in the model directory:
{
"bpe": "fastbpe",
"translated_output": "french_output"
}
Code Reference
Source Location
| File | Lines | Description |
|---|---|---|
examples/nmt_transformer/model_handler_generalized.py |
L1-74 | Full handler module (73 lines) |
examples/nmt_transformer/model_handler_generalized.py |
L10-74 | LanguageTranslationHandler class definition
|
examples/nmt_transformer/model_handler_generalized.py |
L18-49 | initialize(context) -- config loading, fairseq model setup
|
examples/nmt_transformer/model_handler_generalized.py |
L51-57 | preprocess(data) -- text extraction and UTF-8 decoding
|
examples/nmt_transformer/model_handler_generalized.py |
L59-70 | inference(data) -- beam search translation with JSON formatting
|
examples/nmt_transformer/model_handler_generalized.py |
L72-73 | postprocess(data) -- identity passthrough
|
Signature
class LanguageTranslationHandler(BaseHandler):
def __init__(self):
self._context = None
self.initialized = False
self.model = None
self.device = None
def initialize(self, context):
"""
Load fairseq TransformerModel with Moses tokenizer and BPE config.
Reads setup_config.json from model_dir for BPE settings.
Loads TransformerModel.from_pretrained() with checkpoint file
'model.pt' and Moses tokenizer.
Args:
context: TorchServe context with system_properties and manifest.
"""
...
def preprocess(self, data):
"""
Extract and decode text inputs from request data.
Args:
data (list): List of dicts with "data" or "body" keys
containing bytes-encoded text.
Returns:
list[str]: List of decoded UTF-8 text strings.
"""
...
def inference(self, data, *args, **kwargs):
"""
Translate input texts using beam search.
Calls model.translate() with beam=5 under torch.no_grad().
Returns JSON strings pairing input text with translation.
Args:
data (list[str]): List of source language text strings.
Returns:
list[str]: List of JSON strings with input and translation.
"""
...
def postprocess(self, data):
"""
Return inference output unchanged.
Args:
data (list): JSON string list from inference.
Returns:
list: Same as input.
"""
...
Import
# Handler imports
from ts.torch_handler.base_handler import BaseHandler
from fairseq.models.transformer import TransformerModel
import torch
import json
import os
I/O Contract
| Method | Input | Output | Notes |
|---|---|---|---|
initialize(context) |
Context with system_properties["model_dir"] containing setup_config.json and model.pt |
None (sets self.model, self.setup_config, self.initialized = True) |
Warns if setup_config.json is missing
|
preprocess(data) |
list[dict] with "data"/"body" containing bytes |
list[str] -- UTF-8 decoded text |
Calls .decode('utf-8') on each input
|
inference(data) |
list[str] -- source language texts |
list[str] -- JSON strings with input/translation pairs |
Uses beam=5; output key from setup_config["translated_output"]
|
postprocess(data) |
list[str] from inference |
list[str] |
Identity passthrough |
Request/Response Format
// Request (single text input)
{
"data": "Hello, how are you?"
}
// Response (JSON string)
{
"input": "Hello, how are you?",
"french_output": "Bonjour, comment allez-vous?"
}
Usage Examples
Example 1: Initialization with BPE Configuration
# From model_handler_generalized.py L18-49: initialize() loads config and model
def initialize(self, context):
self._context = context
self.initialized = True
self.manifest = context.manifest
properties = context.system_properties
model_dir = properties.get("model_dir")
self.device = torch.device(
"cuda:" + str(properties.get("gpu_id"))
if torch.cuda.is_available() and properties.get("gpu_id") is not None
else "cpu"
)
# Read BPE config from setup_config.json
setup_config_path = os.path.join(model_dir, "setup_config.json")
if os.path.isfile(setup_config_path):
with open(setup_config_path) as setup_config_file:
self.setup_config = json.load(setup_config_file)
else:
logger.warning('Missing the setup_config.json file.')
# Load fairseq TransformerModel with Moses tokenizer
self.model = TransformerModel.from_pretrained(
model_dir,
checkpoint_file='model.pt',
data_name_or_path=model_dir,
tokenizer='moses',
bpe=self.setup_config["bpe"]
)
self.model.to(self.device)
self.model.eval()
self.initialized = True
Example 2: Beam Search Inference
# From model_handler_generalized.py L59-70: inference() with beam=5
def inference(self, data, *args, **kwargs):
inference_output = []
with torch.no_grad():
translation = self.model.translate(data, beam=5)
for i in range(0, len(data)):
output = {
"input": data[i],
self.setup_config["translated_output"]: translation[i]
}
inference_output.append(json.dumps(output))
return inference_output
Related Pages
- Principle:Pytorch_Serve_Neural_Machine_Translation -- principle for serving sequence-to-sequence translation models
- Implementation:Pytorch_Serve_BaseHandler - Parent class providing the
handle()orchestration