Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Bitsandbytes foundation Bitsandbytes BitsAndBytesConfig 8bit

From Leeroopedia


Metadata

Field Value
Sources Repo: bitsandbytes, Doc: HuggingFace Transformers, Paper: LLM.int8()
Domains Quantization, NLP
Type Wrapper Doc (External Library)
Last updated 2026-02-07 14:00 GMT

Overview

Concrete tool for configuring 8-bit LLM.int8() quantization parameters provided by the HuggingFace Transformers library.

Description

BitsAndBytesConfig with load_in_8bit=True configures models to use Linear8bitLt layers with INT8 quantization. When passed to a model loading function such as AutoModelForCausalLM.from_pretrained(), all eligible linear layers in the model are replaced with bitsandbytes.nn.Linear8bitLt layers. These layers store weights in INT8 precision and use the LLM.int8() mixed-precision decomposition during inference.

The configuration object encapsulates three key parameters:

  • load_in_8bit: Enables 8-bit quantization mode.
  • llm_int8_threshold: Sets the outlier detection threshold for the mixed-precision decomposition.
  • llm_int8_has_fp16_weight: Controls whether FP16 weight copies are retained for fine-tuning support.

Code Reference

  • Source: External (transformers library)
  • Import:
from transformers import BitsAndBytesConfig
  • Signature:
transformers.BitsAndBytesConfig(
    load_in_8bit: bool = False,
    llm_int8_threshold: float = 6.0,
    llm_int8_has_fp16_weight: bool = False,
)

I/O Contract

Inputs

Parameter Type Required Default Description
load_in_8bit bool Yes False Set to True to enable 8-bit LLM.int8() quantization.
llm_int8_threshold float No 6.0 Outlier detection threshold. Features with magnitudes exceeding this value are computed in FP16.
llm_int8_has_fp16_weight bool No False If True, retains FP16 weight copies for fine-tuning. If False, only INT8 weights are stored (inference-only mode).

Outputs

Output Type Description
config BitsAndBytesConfig Configuration object to pass to model loading functions.

Usage Examples

Load a model with LLM.int8() quantization:

from transformers import AutoModelForCausalLM, BitsAndBytesConfig

# Configure 8-bit quantization with default settings
quantization_config = BitsAndBytesConfig(
    load_in_8bit=True,
    llm_int8_threshold=6.0,
    llm_int8_has_fp16_weight=False,
)

# Load model with 8-bit quantization
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    quantization_config=quantization_config,
    device_map="auto",
)

Load a model for 8-bit fine-tuning:

from transformers import AutoModelForCausalLM, BitsAndBytesConfig

# Keep FP16 weights for fine-tuning
quantization_config = BitsAndBytesConfig(
    load_in_8bit=True,
    llm_int8_has_fp16_weight=True,
)

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-2-7b-hf",
    quantization_config=quantization_config,
    device_map="auto",
)

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment