Implementation:FMInference FlexLLMGen CompressionConfig
| Field | Value |
|---|---|
| Sources | Repo: FlexLLMGen |
| Domains | Quantization, Memory_Optimization |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for configuring group-wise tensor quantization provided by the FlexLLMGen library.
Description
CompressionConfig is a dataclass that parameterizes group-wise quantization for weights or KV cache tensors. It specifies the number of quantization bits, group size, dimension to group along, and whether to use symmetric quantization.
The configuration is used in two places within the Policy dataclass:
- comp_weight_config -- Controls quantization of model weights when compress_weight=True.
- comp_cache_config -- Controls quantization of KV cache tensors when compress_cache=True.
Each tensor type typically requires different grouping dimensions:
- Weights are grouped along dimension 0 (output features).
- KV cache tensors are grouped along dimension 2 (sequence length).
Usage
Create CompressionConfig instances and pass them to Policy's comp_weight_config and comp_cache_config fields when enabling compress_weight or compress_cache. Even when compression is disabled in the Policy, a CompressionConfig instance must still be provided (with enabled=False or as a placeholder).
Code Reference
| Field | Value |
|---|---|
| Repository | FlexLLMGen |
| File | flexllmgen/compression.py |
| Lines | 11-18 |
Signature:
@dataclasses.dataclass
class CompressionConfig:
num_bits: int
group_size: int
group_dim: int
symmetric: bool
enabled: bool = True
Import:
from flexllmgen.compression import CompressionConfig
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
| num_bits | int | Yes | Number of quantization bits (e.g., 4) |
| group_size | int | Yes | Number of elements per quantization group (e.g., 64) |
| group_dim | int | Yes | Tensor dimension to group along: 0 for weights, 2 for KV cache |
| symmetric | bool | Yes | Use symmetric vs asymmetric quantization |
| enabled | bool | No | Whether compression is active (default: True) |
Outputs
| Output | Type | Description |
|---|---|---|
| CompressionConfig | dataclass instance | Configuration object used by Policy for weight or cache compression |
Usage Examples
Example 1: Weight compression configuration
from flexllmgen.compression import CompressionConfig
# 4-bit asymmetric quantization for model weights
# group_dim=0 groups along the output feature dimension
weight_config = CompressionConfig(
num_bits=4,
group_size=64,
group_dim=0,
symmetric=False,
)
# weight_config.enabled defaults to True
Example 2: KV cache compression configuration
from flexllmgen.compression import CompressionConfig
# 4-bit asymmetric quantization for KV cache
# group_dim=2 groups along the sequence length dimension
cache_config = CompressionConfig(
num_bits=4,
group_size=64,
group_dim=2,
symmetric=False,
)
Example 3: Disabled compression (placeholder)
from flexllmgen.compression import CompressionConfig
# Placeholder config when compression is not used
no_compress = CompressionConfig(
num_bits=4,
group_size=64,
group_dim=0,
symmetric=False,
enabled=False,
)