Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:FMInference FlexLLMGen CompressionConfig

From Leeroopedia


Field Value
Sources Repo: FlexLLMGen
Domains Quantization, Memory_Optimization
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for configuring group-wise tensor quantization provided by the FlexLLMGen library.

Description

CompressionConfig is a dataclass that parameterizes group-wise quantization for weights or KV cache tensors. It specifies the number of quantization bits, group size, dimension to group along, and whether to use symmetric quantization.

The configuration is used in two places within the Policy dataclass:

  • comp_weight_config -- Controls quantization of model weights when compress_weight=True.
  • comp_cache_config -- Controls quantization of KV cache tensors when compress_cache=True.

Each tensor type typically requires different grouping dimensions:

  • Weights are grouped along dimension 0 (output features).
  • KV cache tensors are grouped along dimension 2 (sequence length).

Usage

Create CompressionConfig instances and pass them to Policy's comp_weight_config and comp_cache_config fields when enabling compress_weight or compress_cache. Even when compression is disabled in the Policy, a CompressionConfig instance must still be provided (with enabled=False or as a placeholder).

Code Reference

Field Value
Repository FlexLLMGen
File flexllmgen/compression.py
Lines 11-18

Signature:

@dataclasses.dataclass
class CompressionConfig:
    num_bits: int
    group_size: int
    group_dim: int
    symmetric: bool
    enabled: bool = True

Import:

from flexllmgen.compression import CompressionConfig

I/O Contract

Inputs

Parameter Type Required Description
num_bits int Yes Number of quantization bits (e.g., 4)
group_size int Yes Number of elements per quantization group (e.g., 64)
group_dim int Yes Tensor dimension to group along: 0 for weights, 2 for KV cache
symmetric bool Yes Use symmetric vs asymmetric quantization
enabled bool No Whether compression is active (default: True)

Outputs

Output Type Description
CompressionConfig dataclass instance Configuration object used by Policy for weight or cache compression

Usage Examples

Example 1: Weight compression configuration

from flexllmgen.compression import CompressionConfig

# 4-bit asymmetric quantization for model weights
# group_dim=0 groups along the output feature dimension
weight_config = CompressionConfig(
    num_bits=4,
    group_size=64,
    group_dim=0,
    symmetric=False,
)
# weight_config.enabled defaults to True

Example 2: KV cache compression configuration

from flexllmgen.compression import CompressionConfig

# 4-bit asymmetric quantization for KV cache
# group_dim=2 groups along the sequence length dimension
cache_config = CompressionConfig(
    num_bits=4,
    group_size=64,
    group_dim=2,
    symmetric=False,
)

Example 3: Disabled compression (placeholder)

from flexllmgen.compression import CompressionConfig

# Placeholder config when compression is not used
no_compress = CompressionConfig(
    num_bits=4,
    group_size=64,
    group_dim=0,
    symmetric=False,
    enabled=False,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment