Principle:LaurentMazare Tch rs Precision Control
| Knowledge Sources | |
|---|---|
| Domains | Deep_Learning, Memory_Optimization |
| Last Updated | 2026-02-08 14:00 GMT |
Overview
Technique for converting model parameters to reduced-precision floating-point formats to decrease memory usage and increase inference throughput.
Description
Large language models often use float16 or bfloat16 instead of float32 to reduce memory footprint by 50% while maintaining acceptable numerical accuracy. Precision control casts all VarStore variables to the target type and sets the default kind for new variables, ensuring consistent precision across the entire model.
Usage
Use before loading LLM weights to ensure the VarStore allocates parameters in the correct precision. Typically set to Half (float16) or BFloat16 for LLM inference.
Theoretical Basis
Precision Formats:
Float (f32): 32 bits, ~7 decimal digits, 4 bytes/param
Half (f16): 16 bits, ~3.3 digits, 2 bytes/param, risk of overflow
BFloat16: 16 bits, same range as f32, ~2.4 digits, 2 bytes/param
Memory Savings for LLaMA-7B (~7B parameters):
f32: ~28 GB
f16: ~14 GB
VarStore::set_kind converts all existing variables and sets default for new ones.