Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Pytorch Serve DLRM Handler

From Leeroopedia

Overview

DLRM_Handler implements the TorchServe handler for DLRM (Deep Learning Recommendation Model) inference. The TorchRecDLRMHandler class extends BaseHandler and ABC, providing model initialization with GPU management, JSON-to-tensor preprocessing (converting dense features and sparse features into KeyedJaggedTensor), model forward pass for inference, and postprocessing of prediction scores.

Field Value
Page Type Implementation
Implementation Type API Doc
Domains Recommendation_Systems, Model_Serving
Knowledge Sources Pytorch_Serve
Workflow Recommendation_Model_Deployment
Last Updated 2026-02-13 18:52 GMT

Description

The TorchRecDLRMHandler is a custom TorchServe handler designed specifically for serving torchrec-based DLRM models. It handles the unique data format requirements of recommendation models, where input consists of both dense (continuous) features and sparse (categorical) features. The sparse features are converted into torchrec's KeyedJaggedTensor format, which efficiently represents variable-length sparse feature vectors.

Key Responsibilities

  • Model Initialization: Loads the DLRM model with proper GPU configuration and device placement
  • Dense Feature Processing: Converts JSON input arrays into dense float tensors
  • Sparse Feature Processing: Transforms JSON sparse feature data into KeyedJaggedTensor format required by torchrec
  • Inference Execution: Runs the model forward pass on combined dense and sparse inputs
  • Score Extraction: Extracts and formats prediction scores from model output

Code Reference

Source Location

File Lines Description
examples/torchrec_dlrm/dlrm_handler.py L1-161 Full handler module
examples/torchrec_dlrm/dlrm_handler.py L18-161 TorchRecDLRMHandler class

Signature

class TorchRecDLRMHandler(BaseHandler, ABC):
    """
    TorchServe handler for DLRM recommendation model inference.

    Processes mixed dense and sparse feature inputs and serves
    recommendation predictions using a torchrec-sharded model.
    """

    def initialize(self, context):
        """
        Load the DLRM model and configure GPU resources.

        Sets up CUDA device, loads the sharded DLRM model,
        and prepares for inference.

        Args:
            context: TorchServe context with system_properties
                     and model_yaml_config.
        """
        ...

    def preprocess(self, data):
        """
        Convert JSON request data to dense tensors and KeyedJaggedTensor.

        Parses the request body to extract dense feature arrays and
        sparse feature indices. Dense features become float tensors,
        while sparse features are packed into a KeyedJaggedTensor
        with appropriate keys, values, and lengths.

        Args:
            data (list): List of request input dictionaries containing
                         JSON with 'dense' and 'sparse' feature fields.

        Returns:
            tuple: (dense_tensor, kjt) where dense_tensor is a
                   torch.Tensor of float features and kjt is a
                   KeyedJaggedTensor of sparse features.
        """
        ...

    def inference(self, data, *args, **kwargs):
        """
        Execute model forward pass on dense and sparse inputs.

        Runs the DLRM model with the preprocessed dense tensor
        and KeyedJaggedTensor under inference mode.

        Args:
            data (tuple): (dense_tensor, kjt) from preprocess.

        Returns:
            torch.Tensor: Model prediction output scores.
        """
        ...

    def postprocess(self, data):
        """
        Extract prediction scores from model output.

        Converts raw model output tensor to a serializable
        list of prediction scores.

        Args:
            data (torch.Tensor): Raw prediction output.

        Returns:
            list: Prediction scores as Python list.
        """
        ...

Import

from dlrm_handler import TorchRecDLRMHandler

# External dependencies:
import torch
from torchrec.sparse.jagged_tensor import KeyedJaggedTensor
from ts.torch_handler.base_handler import BaseHandler
from abc import ABC

I/O Contract

Method Input Output Notes
initialize(context) context: TorchServe Context object None (sets self.model, self.device) Loads DLRM model with GPU config
preprocess(data) data: list of request dicts with JSON body tuple: (dense_tensor, KeyedJaggedTensor) Converts JSON to dense + sparse tensors
inference(data) data: tuple of (dense_tensor, kjt) torch.Tensor: prediction scores Forward pass under inference mode
postprocess(data) data: torch.Tensor prediction output list: prediction scores Converts tensor to serializable list

Request Format

{
    "dense": [0.1, 0.5, 0.3, 0.8, 0.2],
    "sparse": {
        "keys": ["feature_0", "feature_1", "feature_2"],
        "values": [101, 205, 302, 407, 510],
        "lengths": [2, 1, 2]
    }
}

Usage Examples

Example 1: Handler Registration

# Create model archive with DLRM handler
torch-model-archiver --model-name dlrm \
    --version 1.0 \
    --handler dlrm_handler.py \
    --extra-files "dlrm_factory.py" \
    --config-file model-config.yaml

Example 2: Preprocessing Flow

# The preprocess method converts JSON to torchrec-compatible tensors:

# 1. Parse JSON body from request
body = json.loads(data[0].get("body"))

# 2. Convert dense features to tensor
dense_tensor = torch.tensor(body["dense"], dtype=torch.float32).to(self.device)

# 3. Convert sparse features to KeyedJaggedTensor
kjt = KeyedJaggedTensor(
    keys=body["sparse"]["keys"],
    values=torch.tensor(body["sparse"]["values"], dtype=torch.long),
    lengths=torch.tensor(body["sparse"]["lengths"], dtype=torch.int),
)

Example 3: Inference Request

# Send a recommendation request
curl -X POST http://localhost:8080/predictions/dlrm \
    -H "Content-Type: application/json" \
    -d '{
        "dense": [0.1, 0.5, 0.3, 0.8, 0.2],
        "sparse": {
            "keys": ["feature_0", "feature_1"],
            "values": [101, 205, 302],
            "lengths": [2, 1]
        }
    }'

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment