Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Pytorch Serve DLRMFactory

From Leeroopedia

Overview

DLRMFactory is a factory class that constructs a fully configured DLRM (Deep Learning Recommendation Model) instance using a metaclass pattern. Its __new__ method handles GPU setup, embedding configuration, quantization, sharding plan generation, and distributed allocation using torchrec. A companion function create_default_model_config() provides default configuration values for the model.

Field Value
Page Type Implementation
Implementation Type API Doc
Domains Recommendation_Systems, Distributed_Computing
Knowledge Sources Pytorch_Serve
Workflow Recommendation_Model_Deployment
Last Updated 2026-02-13 18:52 GMT

Description

The DLRMFactory class uses Python's metaclass __new__ pattern to act as a factory rather than a traditional class. When instantiated, it does not return a DLRMFactory instance but instead returns a fully constructed and sharded DLRM model. This design encapsulates the complex multi-step process of creating a distributed recommendation model: configuring embeddings, applying quantization, generating a sharding plan, and allocating the model across available GPUs.

Key Responsibilities

  • GPU Setup: Configures CUDA device and determines the number of available GPUs
  • Embedding Configuration: Creates EmbeddingBagConfig entries for each embedding table specified in the model config
  • DLRM Construction: Instantiates the DLRM model with the specified architecture (dense layers, over layers, embedding configs)
  • Quantization: Applies quantization to the embedding bag collection using QuantEmbeddingBagCollectionSharder
  • Sharding Plan: Uses EmbeddingShardingPlanner to generate an optimal distribution plan across GPUs
  • Distributed Allocation: Allocates the sharded model using ShardedQuantEmbeddingBagCollection

Code Reference

Source Location

File Lines Description
examples/torchrec_dlrm/dlrm_factory.py L1-135 Full factory module
examples/torchrec_dlrm/dlrm_factory.py L26-62 create_default_model_config() function
examples/torchrec_dlrm/dlrm_factory.py L65-135 DLRMFactory class with __new__

Signature

def create_default_model_config():
    """
    Create a default model configuration dictionary for DLRM.

    Returns:
        dict: Configuration with keys for embedding dimensions,
              dense architecture, over architecture, number of
              embeddings per feature, and other DLRM parameters.
    """
    ...

class DLRMFactory:
    """
    Factory for creating a fully configured, sharded DLRM model.

    Uses __new__ metaclass pattern to return a DLRM model instance
    rather than a DLRMFactory instance.
    """

    def __new__(cls, model_config):
        """
        Construct and return a sharded DLRM model.

        1. Sets up CUDA device and GPU count
        2. Creates EmbeddingBagConfig for each embedding table
        3. Instantiates DLRM with dense/over arch and embeddings
        4. Applies quantization via QuantEmbeddingBagCollectionSharder
        5. Generates sharding plan with EmbeddingShardingPlanner
        6. Allocates model across GPUs

        Args:
            model_config (dict): Model configuration dictionary
                (from create_default_model_config or custom).

        Returns:
            ShardedQuantEmbeddingBagCollection: A fully configured,
                quantized, and sharded DLRM model ready for inference.
        """
        ...

Import

from dlrm_factory import DLRMFactory, create_default_model_config

# External dependencies (torchrec):
from torchrec.models.dlrm import DLRM
from torchrec.modules.embedding_configs import EmbeddingBagConfig
from torchrec.distributed.planner import EmbeddingShardingPlanner
from torchrec.distributed.quant_embeddingbag import QuantEmbeddingBagCollectionSharder

I/O Contract

Function/Method Input Output Notes
create_default_model_config() None dict: default DLRM configuration Lines 26-62; returns embedding dims, arch config, feature counts
DLRMFactory.__new__(cls, model_config) model_config: dict with DLRM parameters Sharded, quantized DLRM model Lines 65-135; performs GPU setup, embedding config, quantization, sharding

Usage Examples

Example 1: Create DLRM with Default Config

from dlrm_factory import DLRMFactory, create_default_model_config

# Create default configuration
config = create_default_model_config()

# Factory returns a fully sharded and quantized model
model = DLRMFactory(config)

Example 2: Custom Model Configuration

from dlrm_factory import DLRMFactory, create_default_model_config

# Start with defaults and override specific parameters
config = create_default_model_config()
config["embedding_dim"] = 128
config["num_embeddings_per_feature"] = [1000, 2000, 5000]

# Create the model with custom config
model = DLRMFactory(config)

Example 3: Factory Pattern Internals

# The __new__ method executes the following pipeline:

# Step 1: GPU setup
device = torch.device("cuda", 0)
num_gpus = torch.cuda.device_count()

# Step 2: Embedding configuration
eb_configs = [
    EmbeddingBagConfig(
        name=f"table_{i}",
        embedding_dim=model_config["embedding_dim"],
        num_embeddings=num_emb,
    )
    for i, num_emb in enumerate(model_config["num_embeddings_per_feature"])
]

# Step 3: Model construction
dlrm_model = DLRM(
    embedding_bag_collection=...,
    dense_in_features=model_config["dense_in_features"],
    dense_arch_layer_sizes=model_config["dense_arch_layer_sizes"],
    over_arch_layer_sizes=model_config["over_arch_layer_sizes"],
)

# Step 4: Quantization and sharding
# ... quantize, plan, and distribute across GPUs

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment