Implementation:Predibase Lorax Adapter Scheduler Next Batch

Knowledge Sources	LoRAX
Domains	Inference_Optimization, Scheduling
Last Updated	2026-02-08 02:00 GMT

Overview

Concrete tool for adapter-aware continuous batch scheduling provided by the LoRAX AdapterScheduler in the Rust router.

Description

The AdapterScheduler manages the request queue and batch formation in the LoRAX router. Its process() method enqueues incoming requests, and next_batch() forms optimal batches considering token budgets and active adapter sets. The scheduling logic in AdapterSchedulerState::next_batch() groups requests by adapter and respects prefill/decode token limits.

Usage

Used internally by the router's batching task. Not called directly by end users.

Code Reference

Source Location

Repository: LoRAX
File: router/src/scheduler.rs
Lines: 26-546

Signature

pub(crate) struct AdapterScheduler {
    sender: mpsc::UnboundedSender<SchedulerCommand>,
}

impl AdapterScheduler {
    pub(crate) fn new(/* ... */) -> Self;

    pub(crate) fn process(
        &self,
        adapter: Adapter,
        entry: Entry,
    );

    pub(crate) async fn next_batch(
        &self,
        adapters_in_use: HashSet<Adapter>,
        min_size: Option<usize>,
        prefill_token_budget: u32,
        token_budget: u32,
    ) -> Option<NextBatch>;
}

Import

use crate::scheduler::AdapterScheduler;

I/O Contract

Inputs

Name	Type	Required	Description
adapter	Adapter	Yes	Adapter identifier for the request
entry	Entry	Yes	Request entry with prompt and parameters
adapters_in_use	HashSet[Adapter]	Yes	Currently active adapters in batch
prefill_token_budget	u32	Yes	Max tokens for prefill phase
token_budget	u32	Yes	Max total tokens in batch

Outputs

Name	Type	Description
next_batch	Option[NextBatch]	Batch of entries to process, or None if queue empty

Usage Examples

Router Batching Task

// In router/src/infer.rs - batching_task
loop {
    // Form next batch from queued requests
    let batch = scheduler.next_batch(
        adapters_in_use,
        min_size,
        prefill_token_budget,
        token_budget,
    ).await;

    if let Some(next_batch) = batch {
        // Send batch to shards for prefill
        let result = client.prefill(next_batch.batch).await;
        // Process results...
    }
}

Related Pages

Implements Principle

Principle:Predibase_Lorax_Continuous_Batching_Inference

Requires Environment

Environment:Predibase_Lorax_CUDA_GPU_Runtime

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment