Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:FMInference FlexLLMGen DeepSpeed Compression Config

From Leeroopedia


Field Value
Sources Repo: FlexLLMGen
Domains Model_Compression, Configuration_Management
Last Updated 2026-02-09 00:00 GMT

Overview

Vendored DeepSpeed compression configuration parser that reads the compression training section of a DeepSpeed JSON config and produces structured configuration dictionaries for weight quantization, activation quantization, sparse pruning, row pruning, head pruning, channel pruning, and layer reduction.

Description

config.py provides the configuration parsing layer for DeepSpeed's compression training feature. It reads a nested dictionary from the DeepSpeed JSON configuration (under the compression_training key) and normalizes it into a structured output with validated parameters and applied defaults.

The module handles six compression technique categories, each organized with the same two-level structure:

  • shared_parameters -- Global settings for the technique (enabled flag, method, schedule offset, quantization type).
  • different_groups -- Per-module-group settings that specify which model layers receive the compression and with what parameters (bit-widths, density ratios, module scopes).

Supported compression techniques:

  • Weight quantization -- Configures bit-width (start and target), quantization type (symmetric/asymmetric), rounding mode (nearest/stochastic), number of groups, FP16 mixed quantization, and schedule offset.
  • Activation quantization -- Configures bit-width, quantization type, range calibration mode (static/dynamic), and schedule offset.
  • Sparse pruning -- Configures density ratio and method (L1/TopK).
  • Row pruning -- Configures density ratio and method (L1/TopK).
  • Head pruning -- Configures density ratio, method (L1/TopK), and number of attention heads.
  • Channel pruning -- Configures density ratio and method (L1/TopK).
  • Layer reduction -- Configures layer elimination parameters.

Each getter function validates required fields (e.g., start_bits and target_bits must be specified for weight quantization groups), asserts enumeration constraints (e.g., quantization type must be 'symmetric' or 'asymmetric'), and applies default values from the constants module.

This is AUTO_KEEP vendored code from DeepSpeed.

Code Reference

Field Value
Repository FlexLLMGen
File benchmark/third_party/DeepSpeed/deepspeed/compression/config.py
Lines 1-490

Key Functions:

def get_compression_config(param_dict): ...
def get_weight_quantization(param_dict): ...
def get_activation_quantization(param_dict): ...
def get_sparse_pruning(param_dict): ...
def get_row_pruning(param_dict): ...
def get_head_pruning(param_dict): ...
def get_channel_pruning(param_dict): ...
def get_layer_reduction(param_dict): ...
def get_quantize_enabled(param_dict): ...

I/O Contract

Inputs

Parameter Type Required Description
param_dict dict Yes Top-level DeepSpeed configuration dictionary containing optional compression_training key

Outputs

Output Type Description
output dict Structured compression config with keys: weight_quantization, activation_quantization, sparse_pruning, row_pruning, head_pruning, channel_pruning, layer_reduction. Each contains shared_parameters and different_groups sub-dicts.

Configuration Structure

{
    "weight_quantization": {
        "shared_parameters": {
            "enabled": bool, "schedule_offset": int, "quantize_groups": int,
            "quantize_verbose": bool, "quantization_type": "symmetric"|"asymmetric",
            "quantize_weight_in_forward": bool, "rounding": "nearest"|"stochastic",
            "fp16_mixed_quantize": {"enabled": bool, "quantize_change_ratio": float}
        },
        "different_groups": {
            "group_name": {
                "params": {"start_bits": int, "target_bits": int, "quantization_period": int},
                "modules": [...], "related_modules": [...]
            }
        }
    },
    "sparse_pruning": { ... },
    "row_pruning": { ... },
    "head_pruning": { ... },
    "channel_pruning": { ... },
    "activation_quantization": { ... },
    "layer_reduction": { ... }
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment