Heuristic:Predibase Lorax Warning Deprecated BitsAndBytes 8bit

Knowledge Sources	Predibase_Lorax
Domains	Quantization, Deprecation
Last Updated	2026-02-08 00:00 GMT

Overview

Deprecation warning: BitsAndBytes 8-bit quantization (Linear8bitLt) is deprecated in LoRAX. Use EETQ as a drop-in replacement with better performance.

Description

The BitsAndBytes 8-bit quantization path (Linear8bitLt) is explicitly deprecated in the LoRAX codebase. The warn_deprecate_bnb() function emits a logger warning at runtime when 8-bit BitsAndBytes quantization is selected, advising users to switch to EETQ which provides equivalent INT8 quantization with significantly better inference throughput.

The deprecation applies specifically to the 8-bit path (bitsandbytes quantize flag). The 4-bit paths (bitsandbytes-nf4 and bitsandbytes-fp4 via Linear4bit) are not deprecated and remain supported.

Additionally, the memory_efficient_backward parameter on Linear8bitLt is deprecated since bitsandbytes 0.37.0 and will be removed in 0.39.0.

Usage

This warning applies when you encounter CUDA OOM errors or slow inference and are considering 8-bit quantization. If the current configuration uses --quantize bitsandbytes, switch to --quantize eetq for equivalent functionality with better performance.

The Insight (Rule of Thumb)

Action: Replace --quantize bitsandbytes with --quantize eetq in server startup.
Value: EETQ is a direct drop-in replacement requiring no model re-quantization or code changes.
Trade-off: None negative. EETQ provides equivalent or better quality with faster inference than BitsAndBytes 8-bit.

Reasoning

The LoRAX codebase explicitly marks BitsAndBytes 8-bit as deprecated via a cached warning function:

@lru_cache(1)
def warn_deprecate_bnb():
    logger.warning(
        "Bitsandbytes 8bit is deprecated, using `eetq` is a drop-in replacement, "
        "and has much better performnce"
    )

EETQ uses optimized CUDA kernels specifically designed for INT8 inference, while BitsAndBytes uses more generic CUDA implementations. Both quantize at runtime (no pre-quantized weights needed), but EETQ achieves better throughput through kernel-level optimizations.

Related Pages

Implementation:Predibase_Lorax_BitsAndBytes_Layers

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment