Implementation:FMInference FlexLLMGen DeepSpeed Compression Config

Field	Value
Sources	Repo: FlexLLMGen
Domains	Model_Compression, Configuration_Management
Last Updated	2026-02-09 00:00 GMT

Overview

Vendored DeepSpeed compression configuration parser that reads the compression training section of a DeepSpeed JSON config and produces structured configuration dictionaries for weight quantization, activation quantization, sparse pruning, row pruning, head pruning, channel pruning, and layer reduction.

Description

config.py provides the configuration parsing layer for DeepSpeed's compression training feature. It reads a nested dictionary from the DeepSpeed JSON configuration (under the compression_training key) and normalizes it into a structured output with validated parameters and applied defaults.

The module handles six compression technique categories, each organized with the same two-level structure:

shared_parameters -- Global settings for the technique (enabled flag, method, schedule offset, quantization type).
different_groups -- Per-module-group settings that specify which model layers receive the compression and with what parameters (bit-widths, density ratios, module scopes).

Supported compression techniques:

Weight quantization -- Configures bit-width (start and target), quantization type (symmetric/asymmetric), rounding mode (nearest/stochastic), number of groups, FP16 mixed quantization, and schedule offset.
Activation quantization -- Configures bit-width, quantization type, range calibration mode (static/dynamic), and schedule offset.
Sparse pruning -- Configures density ratio and method (L1/TopK).
Row pruning -- Configures density ratio and method (L1/TopK).
Head pruning -- Configures density ratio, method (L1/TopK), and number of attention heads.
Channel pruning -- Configures density ratio and method (L1/TopK).
Layer reduction -- Configures layer elimination parameters.

Each getter function validates required fields (e.g., start_bits and target_bits must be specified for weight quantization groups), asserts enumeration constraints (e.g., quantization type must be 'symmetric' or 'asymmetric'), and applies default values from the constants module.

This is AUTO_KEEP vendored code from DeepSpeed.

Code Reference

Field	Value
Repository	FlexLLMGen
File	benchmark/third_party/DeepSpeed/deepspeed/compression/config.py
Lines	1-490

Key Functions:

def get_compression_config(param_dict): ...
def get_weight_quantization(param_dict): ...
def get_activation_quantization(param_dict): ...
def get_sparse_pruning(param_dict): ...
def get_row_pruning(param_dict): ...
def get_head_pruning(param_dict): ...
def get_channel_pruning(param_dict): ...
def get_layer_reduction(param_dict): ...
def get_quantize_enabled(param_dict): ...

I/O Contract

Inputs

Parameter	Type	Required	Description
param_dict	dict	Yes	Top-level DeepSpeed configuration dictionary containing optional compression_training key

Outputs

Output	Type	Description
output	dict	Structured compression config with keys: weight_quantization, activation_quantization, sparse_pruning, row_pruning, head_pruning, channel_pruning, layer_reduction. Each contains shared_parameters and different_groups sub-dicts.

Configuration Structure

{
    "weight_quantization": {
        "shared_parameters": {
            "enabled": bool, "schedule_offset": int, "quantize_groups": int,
            "quantize_verbose": bool, "quantization_type": "symmetric"|"asymmetric",
            "quantize_weight_in_forward": bool, "rounding": "nearest"|"stochastic",
            "fp16_mixed_quantize": {"enabled": bool, "quantize_change_ratio": float}
        },
        "different_groups": {
            "group_name": {
                "params": {"start_bits": int, "target_bits": int, "quantization_period": int},
                "modules": [...], "related_modules": [...]
            }
        }
    },
    "sparse_pruning": { ... },
    "row_pruning": { ... },
    "head_pruning": { ... },
    "channel_pruning": { ... },
    "activation_quantization": { ... },
    "layer_reduction": { ... }
}

Related Pages

Principle:FMInference_FlexLLMGen_Model_Compression_Configuration

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment