Implementation:Pytorch Serve DLRMFactory

Overview

DLRMFactory is a factory class that constructs a fully configured DLRM (Deep Learning Recommendation Model) instance using a metaclass pattern. Its __new__ method handles GPU setup, embedding configuration, quantization, sharding plan generation, and distributed allocation using torchrec. A companion function create_default_model_config() provides default configuration values for the model.

Field	Value
Page Type	Implementation
Implementation Type	API Doc
Domains	Recommendation_Systems, Distributed_Computing
Knowledge Sources	Pytorch_Serve
Workflow	Recommendation_Model_Deployment
Last Updated	2026-02-13 18:52 GMT

Description

The DLRMFactory class uses Python's metaclass __new__ pattern to act as a factory rather than a traditional class. When instantiated, it does not return a DLRMFactory instance but instead returns a fully constructed and sharded DLRM model. This design encapsulates the complex multi-step process of creating a distributed recommendation model: configuring embeddings, applying quantization, generating a sharding plan, and allocating the model across available GPUs.

Key Responsibilities

GPU Setup: Configures CUDA device and determines the number of available GPUs
Embedding Configuration: Creates EmbeddingBagConfig entries for each embedding table specified in the model config
DLRM Construction: Instantiates the DLRM model with the specified architecture (dense layers, over layers, embedding configs)
Quantization: Applies quantization to the embedding bag collection using QuantEmbeddingBagCollectionSharder
Sharding Plan: Uses EmbeddingShardingPlanner to generate an optimal distribution plan across GPUs
Distributed Allocation: Allocates the sharded model using ShardedQuantEmbeddingBagCollection

Code Reference

Source Location

File	Lines	Description
`examples/torchrec_dlrm/dlrm_factory.py`	L1-135	Full factory module
`examples/torchrec_dlrm/dlrm_factory.py`	L26-62	`create_default_model_config()` function
`examples/torchrec_dlrm/dlrm_factory.py`	L65-135	`DLRMFactory` class with `__new__`

Signature

def create_default_model_config():
    """
    Create a default model configuration dictionary for DLRM.

    Returns:
        dict: Configuration with keys for embedding dimensions,
              dense architecture, over architecture, number of
              embeddings per feature, and other DLRM parameters.
    """
    ...

class DLRMFactory:
    """
    Factory for creating a fully configured, sharded DLRM model.

    Uses __new__ metaclass pattern to return a DLRM model instance
    rather than a DLRMFactory instance.
    """

    def __new__(cls, model_config):
        """
        Construct and return a sharded DLRM model.

        1. Sets up CUDA device and GPU count
        2. Creates EmbeddingBagConfig for each embedding table
        3. Instantiates DLRM with dense/over arch and embeddings
        4. Applies quantization via QuantEmbeddingBagCollectionSharder
        5. Generates sharding plan with EmbeddingShardingPlanner
        6. Allocates model across GPUs

        Args:
            model_config (dict): Model configuration dictionary
                (from create_default_model_config or custom).

        Returns:
            ShardedQuantEmbeddingBagCollection: A fully configured,
                quantized, and sharded DLRM model ready for inference.
        """
        ...

Import

from dlrm_factory import DLRMFactory, create_default_model_config

# External dependencies (torchrec):
from torchrec.models.dlrm import DLRM
from torchrec.modules.embedding_configs import EmbeddingBagConfig
from torchrec.distributed.planner import EmbeddingShardingPlanner
from torchrec.distributed.quant_embeddingbag import QuantEmbeddingBagCollectionSharder

I/O Contract

Function/Method	Input	Output	Notes
`create_default_model_config()`	None	`dict`: default DLRM configuration	Lines 26-62; returns embedding dims, arch config, feature counts
`DLRMFactory.__new__(cls, model_config)`	`model_config`: dict with DLRM parameters	Sharded, quantized DLRM model	Lines 65-135; performs GPU setup, embedding config, quantization, sharding

Usage Examples

Example 1: Create DLRM with Default Config

from dlrm_factory import DLRMFactory, create_default_model_config

# Create default configuration
config = create_default_model_config()

# Factory returns a fully sharded and quantized model
model = DLRMFactory(config)

Example 2: Custom Model Configuration

from dlrm_factory import DLRMFactory, create_default_model_config

# Start with defaults and override specific parameters
config = create_default_model_config()
config["embedding_dim"] = 128
config["num_embeddings_per_feature"] = [1000, 2000, 5000]

# Create the model with custom config
model = DLRMFactory(config)

Example 3: Factory Pattern Internals

# The __new__ method executes the following pipeline:

# Step 1: GPU setup
device = torch.device("cuda", 0)
num_gpus = torch.cuda.device_count()

# Step 2: Embedding configuration
eb_configs = [
    EmbeddingBagConfig(
        name=f"table_{i}",
        embedding_dim=model_config["embedding_dim"],
        num_embeddings=num_emb,
    )
    for i, num_emb in enumerate(model_config["num_embeddings_per_feature"])
]

# Step 3: Model construction
dlrm_model = DLRM(
    embedding_bag_collection=...,
    dense_in_features=model_config["dense_in_features"],
    dense_arch_layer_sizes=model_config["dense_arch_layer_sizes"],
    over_arch_layer_sizes=model_config["over_arch_layer_sizes"],
)

# Step 4: Quantization and sharding
# ... quantize, plan, and distribute across GPUs

Related Pages

Principle:Pytorch_Serve_Recommendation_Model_Serving - The recommendation model serving principle this factory supports
Implementation:Pytorch_Serve_DLRM_Handler - The handler that uses DLRMFactory to load and serve the model
Implementation:Pytorch_Serve_BaseHandler - Base handler class for the inference pipeline

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment