Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Predibase Lorax LoRA Weights Load

From Leeroopedia


Knowledge Sources
Domains Parameter_Efficient_Finetuning, Model_Serving
Last Updated 2026-02-08 02:00 GMT

Overview

Concrete tool for loading LoRA adapter weights into GPU memory provided by the LoraConfig and LoraWeights classes.

Description

LoraConfig loads adapter configuration from HuggingFace (via peft.LoraConfig) and maps weight names to module positions. LoraWeights.load() iterates over model layers, loads the LoRA A and B matrices from safetensors, applies scaling, transposes for kernel compatibility, and pads ranks for SGMV/BGMV kernel requirements.

Usage

Called internally when a new adapter is requested. Not called directly by end users. The adapter loading pipeline in the router triggers this via gRPC.

Code Reference

Source Location

  • Repository: LoRAX
  • File: server/lorax_server/adapters/lora.py
  • Lines: 29-210

Signature

@dataclass
class LoraConfig(AdapterConfig):
    r: int
    target_modules: Optional[Union[List[str], str]]
    fan_in_fan_out: bool
    lora_alpha: int
    use_rslora: bool

    @classmethod
    def load(cls, adapter_id: str, api_token: str) -> "LoraConfig":
        """Load LoRA config from HuggingFace Hub."""

class LoraWeights(AdapterWeights):
    def __init__(
        self,
        weights_a: List[torch.Tensor],
        weights_b: List[torch.Tensor],
        adapter_config: LoraConfig,
    ):

    @classmethod
    def load(
        cls,
        config: LoraConfig,
        model: "Model",
        module_map: Dict[str, Dict],
        layer_type: str,
        unused_weight_names: Set[str],
    ) -> Optional[AdapterWeights]:
        """Load LoRA weights for all layers of a given type."""

Import

from lorax_server.adapters.lora import LoraConfig, LoraWeights

I/O Contract

Inputs

Name Type Required Description
adapter_id str Yes HuggingFace adapter ID or local path
api_token str No Auth token for private adapters
config LoraConfig Yes Adapter config (rank, target_modules, alpha)
model Model Yes Base model instance for layer mapping
module_map Dict[str, Dict] Yes Weight name to tensor mapping
layer_type str Yes Layer type to load (e.g., "q_proj")

Outputs

Name Type Description
weights Optional[LoraWeights] Stacked LoRA A/B tensors, or None if no weights for layer type

Usage Examples

Internal Adapter Loading

# Called internally during adapter resolution
config = LoraConfig.load("my-org/my-adapter", api_token="hf_xxx")
# config.r = 16, config.lora_alpha = 32, config.target_modules = ["q_proj", "v_proj"]

# Load weights for each layer type
weights = LoraWeights.load(
    config=config,
    model=model,
    module_map=module_map,
    layer_type="q_proj",
    unused_weight_names=unused,
)
# weights.weights_a shape: [num_layers, hidden_size, r]
# weights.weights_b shape: [num_layers, r, hidden_size]

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment