Principle:Vllm project Vllm LoRA Request Creation

Knowledge Sources	vLLM vLLM LoRA Docs
Domains	LLM Serving, Model Adaptation, Request Management
Last Updated	2026-02-08 13:00 GMT

Overview

LoRA request creation is the process of constructing a lightweight descriptor object that identifies which LoRA adapter should be applied to a specific inference request.

Description

In a multi-LoRA serving system, each inference request may target a different fine-tuned adapter. A LoRA request object encapsulates the metadata needed to identify and load the correct adapter: a human-readable name, a unique integer identifier, and the filesystem path to the adapter weights. This object acts as a handle that the engine uses to look up, load, and apply the appropriate adapter weights during the forward pass.

The request object is intentionally lightweight -- it does not contain the adapter weights themselves, only the information needed to locate and identify them. This separation of concerns allows the engine to manage adapter weight caching and swapping independently of the request lifecycle.

Usage

Use LoRA request creation when:

Associating a specific LoRA adapter with an inference request
Building a multi-LoRA serving pipeline where different prompts target different adapters
Constructing request batches that mix base-model and adapter-augmented requests
Defining multiple adapter identities that reference the same underlying weights (for testing or A/B comparison)

Theoretical Basis

The LoRA request object embodies several important design decisions:

Identity via Name: Equality comparison and hashing are based solely on lora_name, not on the integer ID or path. This means two LoRA request objects with the same name are considered equivalent even if they point to different paths. This design enables consistent identification of adapters across distributed engine components.

Positive Integer IDs: The lora_int_id must be strictly greater than zero. This constraint exists because the ID is used internally to index into adapter weight arrays, where index 0 is reserved for the base model (no adapter). The integer ID must be globally unique for a given adapter within a running engine.

Non-Empty Path: The lora_path is validated to be non-empty. This ensures that every LoRA request object can be traced to an actual adapter weight directory on the filesystem.

Struct-Based Design: The object is implemented as a msgspec.Struct with array_like=True serialization, enabling efficient zero-copy serialization across process boundaries in multi-process engine configurations.

Related Pages

Implemented By

Implementation:Vllm_project_Vllm_LoRARequest_Init

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment