Implementation:Vllm project Vllm Scalar Type

Knowledge Sources	vllm
Domains	Quantization, Type_System, Core
Last Updated	2026-02-08 00:00 GMT

Overview

Defines the ScalarType class for representing a wide range of floating point and integer types, including sub-byte data types not supported by torch.dtype.

Description

The vllm::ScalarType class models numeric types by their exponent bits, mantissa bits, signedness, bias, and NaN representation. It provides static factory methods (int_, uint, float_IEEE754, float_) for creating various numeric formats and supports compile-time unique ID generation for C++17 template specialization. The header also defines a comprehensive set of predefined scalar type constants covering standard types (Half, BFloat16), quantized integer types (int4, uint4, uint4b8, uint8b128), and exotic floating point formats (FP4 e2m1, FP6 e3m2, FP8 e4m3fn, FP8 e5m2).

Usage

This header is included throughout the vLLM C++ codebase wherever quantized or custom numeric types are needed. It is the C++ counterpart of vllm/scalar_type.py and is used by quantization kernels, type dispatch logic, and model weight loading code.

Code Reference

Source Location

Repository: vllm
File: csrc/core/scalar_type.hpp
Lines: 1-352

Signature

namespace vllm {

class ScalarType {
public:
  enum NanRepr : uint8_t { NAN_NONE, NAN_IEEE_754, NAN_EXTD_RANGE_MAX_MIN };

  constexpr ScalarType(uint8_t exponent, uint8_t mantissa, bool signed_,
                       int32_t bias, bool finite_values_only = false,
                       NanRepr nan_repr = NAN_IEEE_754);

  static constexpr ScalarType int_(uint8_t size_bits, int32_t bias = 0);
  static constexpr ScalarType uint(uint8_t size_bits, int32_t bias = 0);
  static constexpr ScalarType float_IEEE754(uint8_t exponent, uint8_t mantissa);
  static constexpr ScalarType float_(uint8_t exponent, uint8_t mantissa,
                                     bool finite_values_only, NanRepr nan_repr);

  constexpr Id id() const;
  static constexpr ScalarType from_id(Id id);

  constexpr int64_t size_bits() const;
  constexpr bool is_signed() const;
  constexpr bool is_integer() const;
  constexpr bool is_floating_point() const;
  constexpr bool is_ieee_754() const;
  constexpr std::variant<int64_t, double> max() const;
  constexpr std::variant<int64_t, double> min() const;
  std::string str() const;
};

} // namespace vllm

Import

#include "core/scalar_type.hpp"

I/O Contract

Inputs

Name	Type	Required	Description
exponent	`uint8_t`	Yes	Number of exponent bits (0 for integer types)
mantissa	`uint8_t`	Yes	Number of mantissa bits (or integer bits excluding sign)
signed_	`bool`	Yes	Whether the type supports negative numbers
bias	`int32_t`	No	Stored value offset (value = stored - bias); used for quantized types
finite_values_only	`bool`	No	If true, no +/-inf values exist
nan_repr	`NanRepr`	No	How NaNs are encoded (NAN_NONE, NAN_IEEE_754, NAN_EXTD_RANGE_MAX_MIN)

Outputs

Name	Type	Description
ScalarType instance	`ScalarType`	Immutable object representing the numeric type with query methods for size, range, and properties
id	`int64_t`	Unique compile-time identifier for template specialization
str	`std::string`	Human-readable string representation following ml_dtypes naming conventions

Usage Examples

#include "core/scalar_type.hpp"

// Use predefined types
auto fp8_type = vllm::kFloat8_e4m3fn;
auto int4_type = vllm::kInt4;

// Query type properties
int bits = fp8_type.size_bits();       // 8
bool is_fp = fp8_type.is_floating_point(); // true
std::string name = fp8_type.str();     // "float8_e4m3fn"

// Create custom quantized integer type
auto custom_uint = vllm::ScalarType::uint(4, 8);  // uint4 with bias=8

// Use id for template specialization
constexpr auto type_id = vllm::kFloat8_e4m3fn.id();

Related Pages

Environment:Vllm_project_Vllm_CUDA_GPU_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment