Implementation:Pytorch Serve DLRM Handler

Overview

DLRM_Handler implements the TorchServe handler for DLRM (Deep Learning Recommendation Model) inference. The TorchRecDLRMHandler class extends BaseHandler and ABC, providing model initialization with GPU management, JSON-to-tensor preprocessing (converting dense features and sparse features into KeyedJaggedTensor), model forward pass for inference, and postprocessing of prediction scores.

Field	Value
Page Type	Implementation
Implementation Type	API Doc
Domains	Recommendation_Systems, Model_Serving
Knowledge Sources	Pytorch_Serve
Workflow	Recommendation_Model_Deployment
Last Updated	2026-02-13 18:52 GMT

Description

The TorchRecDLRMHandler is a custom TorchServe handler designed specifically for serving torchrec-based DLRM models. It handles the unique data format requirements of recommendation models, where input consists of both dense (continuous) features and sparse (categorical) features. The sparse features are converted into torchrec's KeyedJaggedTensor format, which efficiently represents variable-length sparse feature vectors.

Key Responsibilities

Model Initialization: Loads the DLRM model with proper GPU configuration and device placement
Dense Feature Processing: Converts JSON input arrays into dense float tensors
Sparse Feature Processing: Transforms JSON sparse feature data into KeyedJaggedTensor format required by torchrec
Inference Execution: Runs the model forward pass on combined dense and sparse inputs
Score Extraction: Extracts and formats prediction scores from model output

Code Reference

Source Location

File	Lines	Description
`examples/torchrec_dlrm/dlrm_handler.py`	L1-161	Full handler module
`examples/torchrec_dlrm/dlrm_handler.py`	L18-161	`TorchRecDLRMHandler` class

Signature

class TorchRecDLRMHandler(BaseHandler, ABC):
    """
    TorchServe handler for DLRM recommendation model inference.

    Processes mixed dense and sparse feature inputs and serves
    recommendation predictions using a torchrec-sharded model.
    """

    def initialize(self, context):
        """
        Load the DLRM model and configure GPU resources.

        Sets up CUDA device, loads the sharded DLRM model,
        and prepares for inference.

        Args:
            context: TorchServe context with system_properties
                     and model_yaml_config.
        """
        ...

    def preprocess(self, data):
        """
        Convert JSON request data to dense tensors and KeyedJaggedTensor.

        Parses the request body to extract dense feature arrays and
        sparse feature indices. Dense features become float tensors,
        while sparse features are packed into a KeyedJaggedTensor
        with appropriate keys, values, and lengths.

        Args:
            data (list): List of request input dictionaries containing
                         JSON with 'dense' and 'sparse' feature fields.

        Returns:
            tuple: (dense_tensor, kjt) where dense_tensor is a
                   torch.Tensor of float features and kjt is a
                   KeyedJaggedTensor of sparse features.
        """
        ...

    def inference(self, data, *args, **kwargs):
        """
        Execute model forward pass on dense and sparse inputs.

        Runs the DLRM model with the preprocessed dense tensor
        and KeyedJaggedTensor under inference mode.

        Args:
            data (tuple): (dense_tensor, kjt) from preprocess.

        Returns:
            torch.Tensor: Model prediction output scores.
        """
        ...

    def postprocess(self, data):
        """
        Extract prediction scores from model output.

        Converts raw model output tensor to a serializable
        list of prediction scores.

        Args:
            data (torch.Tensor): Raw prediction output.

        Returns:
            list: Prediction scores as Python list.
        """
        ...

Import

from dlrm_handler import TorchRecDLRMHandler

# External dependencies:
import torch
from torchrec.sparse.jagged_tensor import KeyedJaggedTensor
from ts.torch_handler.base_handler import BaseHandler
from abc import ABC

I/O Contract

Method	Input	Output	Notes
`initialize(context)`	`context`: TorchServe Context object	None (sets `self.model`, `self.device`)	Loads DLRM model with GPU config
`preprocess(data)`	`data`: list of request dicts with JSON body	`tuple`: (dense_tensor, KeyedJaggedTensor)	Converts JSON to dense + sparse tensors
`inference(data)`	`data`: tuple of (dense_tensor, kjt)	`torch.Tensor`: prediction scores	Forward pass under inference mode
`postprocess(data)`	`data`: `torch.Tensor` prediction output	`list`: prediction scores	Converts tensor to serializable list

Request Format

{
    "dense": [0.1, 0.5, 0.3, 0.8, 0.2],
    "sparse": {
        "keys": ["feature_0", "feature_1", "feature_2"],
        "values": [101, 205, 302, 407, 510],
        "lengths": [2, 1, 2]
    }
}

Usage Examples

Example 1: Handler Registration

# Create model archive with DLRM handler
torch-model-archiver --model-name dlrm \
    --version 1.0 \
    --handler dlrm_handler.py \
    --extra-files "dlrm_factory.py" \
    --config-file model-config.yaml

Example 2: Preprocessing Flow

# The preprocess method converts JSON to torchrec-compatible tensors:

# 1. Parse JSON body from request
body = json.loads(data[0].get("body"))

# 2. Convert dense features to tensor
dense_tensor = torch.tensor(body["dense"], dtype=torch.float32).to(self.device)

# 3. Convert sparse features to KeyedJaggedTensor
kjt = KeyedJaggedTensor(
    keys=body["sparse"]["keys"],
    values=torch.tensor(body["sparse"]["values"], dtype=torch.long),
    lengths=torch.tensor(body["sparse"]["lengths"], dtype=torch.int),
)

Example 3: Inference Request

# Send a recommendation request
curl -X POST http://localhost:8080/predictions/dlrm \
    -H "Content-Type: application/json" \
    -d '{
        "dense": [0.1, 0.5, 0.3, 0.8, 0.2],
        "sparse": {
            "keys": ["feature_0", "feature_1"],
            "values": [101, 205, 302],
            "lengths": [2, 1]
        }
    }'

Related Pages

Principle:Pytorch_Serve_Recommendation_Model_Serving - The recommendation model serving principle this handler implements
Implementation:Pytorch_Serve_DLRMFactory - Factory class used to construct the DLRM model
Implementation:Pytorch_Serve_BaseHandler - Base handler class that TorchRecDLRMHandler extends
Implementation:Pytorch_Serve_Service_Predict - The Service.predict() method that invokes handle()

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment