Principle:Vllm project Vllm LoRA Request Creation
| Knowledge Sources | |
|---|---|
| Domains | LLM Serving, Model Adaptation, Request Management |
| Last Updated | 2026-02-08 13:00 GMT |
Overview
LoRA request creation is the process of constructing a lightweight descriptor object that identifies which LoRA adapter should be applied to a specific inference request.
Description
In a multi-LoRA serving system, each inference request may target a different fine-tuned adapter. A LoRA request object encapsulates the metadata needed to identify and load the correct adapter: a human-readable name, a unique integer identifier, and the filesystem path to the adapter weights. This object acts as a handle that the engine uses to look up, load, and apply the appropriate adapter weights during the forward pass.
The request object is intentionally lightweight -- it does not contain the adapter weights themselves, only the information needed to locate and identify them. This separation of concerns allows the engine to manage adapter weight caching and swapping independently of the request lifecycle.
Usage
Use LoRA request creation when:
- Associating a specific LoRA adapter with an inference request
- Building a multi-LoRA serving pipeline where different prompts target different adapters
- Constructing request batches that mix base-model and adapter-augmented requests
- Defining multiple adapter identities that reference the same underlying weights (for testing or A/B comparison)
Theoretical Basis
The LoRA request object embodies several important design decisions:
Identity via Name: Equality comparison and hashing are based solely on lora_name, not on the integer ID or path. This means two LoRA request objects with the same name are considered equivalent even if they point to different paths. This design enables consistent identification of adapters across distributed engine components.
Positive Integer IDs: The lora_int_id must be strictly greater than zero. This constraint exists because the ID is used internally to index into adapter weight arrays, where index 0 is reserved for the base model (no adapter). The integer ID must be globally unique for a given adapter within a running engine.
Non-Empty Path: The lora_path is validated to be non-empty. This ensures that every LoRA request object can be traced to an actual adapter weight directory on the filesystem.
Struct-Based Design: The object is implemented as a msgspec.Struct with array_like=True serialization, enabling efficient zero-copy serialization across process boundaries in multi-process engine configurations.