Overview
DLRM_Handler implements the TorchServe handler for DLRM (Deep Learning Recommendation Model) inference. The TorchRecDLRMHandler class extends BaseHandler and ABC, providing model initialization with GPU management, JSON-to-tensor preprocessing (converting dense features and sparse features into KeyedJaggedTensor), model forward pass for inference, and postprocessing of prediction scores.
Description
The TorchRecDLRMHandler is a custom TorchServe handler designed specifically for serving torchrec-based DLRM models. It handles the unique data format requirements of recommendation models, where input consists of both dense (continuous) features and sparse (categorical) features. The sparse features are converted into torchrec's KeyedJaggedTensor format, which efficiently represents variable-length sparse feature vectors.
Key Responsibilities
- Model Initialization: Loads the DLRM model with proper GPU configuration and device placement
- Dense Feature Processing: Converts JSON input arrays into dense float tensors
- Sparse Feature Processing: Transforms JSON sparse feature data into
KeyedJaggedTensor format required by torchrec
- Inference Execution: Runs the model forward pass on combined dense and sparse inputs
- Score Extraction: Extracts and formats prediction scores from model output
Code Reference
Source Location
| File |
Lines |
Description
|
examples/torchrec_dlrm/dlrm_handler.py |
L1-161 |
Full handler module
|
examples/torchrec_dlrm/dlrm_handler.py |
L18-161 |
TorchRecDLRMHandler class
|
Signature
class TorchRecDLRMHandler(BaseHandler, ABC):
"""
TorchServe handler for DLRM recommendation model inference.
Processes mixed dense and sparse feature inputs and serves
recommendation predictions using a torchrec-sharded model.
"""
def initialize(self, context):
"""
Load the DLRM model and configure GPU resources.
Sets up CUDA device, loads the sharded DLRM model,
and prepares for inference.
Args:
context: TorchServe context with system_properties
and model_yaml_config.
"""
...
def preprocess(self, data):
"""
Convert JSON request data to dense tensors and KeyedJaggedTensor.
Parses the request body to extract dense feature arrays and
sparse feature indices. Dense features become float tensors,
while sparse features are packed into a KeyedJaggedTensor
with appropriate keys, values, and lengths.
Args:
data (list): List of request input dictionaries containing
JSON with 'dense' and 'sparse' feature fields.
Returns:
tuple: (dense_tensor, kjt) where dense_tensor is a
torch.Tensor of float features and kjt is a
KeyedJaggedTensor of sparse features.
"""
...
def inference(self, data, *args, **kwargs):
"""
Execute model forward pass on dense and sparse inputs.
Runs the DLRM model with the preprocessed dense tensor
and KeyedJaggedTensor under inference mode.
Args:
data (tuple): (dense_tensor, kjt) from preprocess.
Returns:
torch.Tensor: Model prediction output scores.
"""
...
def postprocess(self, data):
"""
Extract prediction scores from model output.
Converts raw model output tensor to a serializable
list of prediction scores.
Args:
data (torch.Tensor): Raw prediction output.
Returns:
list: Prediction scores as Python list.
"""
...
Import
from dlrm_handler import TorchRecDLRMHandler
# External dependencies:
import torch
from torchrec.sparse.jagged_tensor import KeyedJaggedTensor
from ts.torch_handler.base_handler import BaseHandler
from abc import ABC
I/O Contract
| Method |
Input |
Output |
Notes
|
initialize(context) |
context: TorchServe Context object |
None (sets self.model, self.device) |
Loads DLRM model with GPU config
|
preprocess(data) |
data: list of request dicts with JSON body |
tuple: (dense_tensor, KeyedJaggedTensor) |
Converts JSON to dense + sparse tensors
|
inference(data) |
data: tuple of (dense_tensor, kjt) |
torch.Tensor: prediction scores |
Forward pass under inference mode
|
postprocess(data) |
data: torch.Tensor prediction output |
list: prediction scores |
Converts tensor to serializable list
|
Request Format
{
"dense": [0.1, 0.5, 0.3, 0.8, 0.2],
"sparse": {
"keys": ["feature_0", "feature_1", "feature_2"],
"values": [101, 205, 302, 407, 510],
"lengths": [2, 1, 2]
}
}
Usage Examples
Example 1: Handler Registration
# Create model archive with DLRM handler
torch-model-archiver --model-name dlrm \
--version 1.0 \
--handler dlrm_handler.py \
--extra-files "dlrm_factory.py" \
--config-file model-config.yaml
Example 2: Preprocessing Flow
# The preprocess method converts JSON to torchrec-compatible tensors:
# 1. Parse JSON body from request
body = json.loads(data[0].get("body"))
# 2. Convert dense features to tensor
dense_tensor = torch.tensor(body["dense"], dtype=torch.float32).to(self.device)
# 3. Convert sparse features to KeyedJaggedTensor
kjt = KeyedJaggedTensor(
keys=body["sparse"]["keys"],
values=torch.tensor(body["sparse"]["values"], dtype=torch.long),
lengths=torch.tensor(body["sparse"]["lengths"], dtype=torch.int),
)
Example 3: Inference Request
# Send a recommendation request
curl -X POST http://localhost:8080/predictions/dlrm \
-H "Content-Type: application/json" \
-d '{
"dense": [0.1, 0.5, 0.3, 0.8, 0.2],
"sparse": {
"keys": ["feature_0", "feature_1"],
"values": [101, 205, 302],
"lengths": [2, 1]
}
}'
Related Pages