Implementation:Rapidsai Cuml SMO Solver

Knowledge Sources	Rapidsai_Cuml
Domains	Machine_Learning, Support_Vector_Machines
Last Updated	2026-02-08 12:00 GMT

Overview

The GPU-accelerated Sequential Minimal Optimization (SMO) solver that trains Support Vector Machine models using a two-level decomposition approach.

Description

smosolver.h implements the core SVM training algorithm using a hierarchical decomposition strategy. The general approach follows Osuna's decomposition method, where a subset of training examples is selected at each iteration and the quadratic programming (QP) sub-problem is solved for that subset.

The implementation uses two levels of decomposition:

Outer level (q1=1024) -- Selects a working set of up to 1024 training vectors and solves the QP sub-problem (QP1). This is implemented in the Solve method.
Inner level (q2=2, SMO) -- Within each working set, pairs of variables are optimized using the classic SMO algorithm via SmoBlockSolve.

The SmoSolver class template supports both SVM Classification (SVC) and epsilon-SVR regression. Key methods include:

Solve -- The main entry point that iterates outer and inner loops until convergence. It manages the working set, kernel cache, and convergence tracking. Output includes dual coefficients, support vectors, their indices, and the bias term.
Initialize -- Sets up the optimization problem, zero-initializing dual coefficients and computing the initial optimality indicator vector f.
SvcInit -- Initializes the f vector for classification: f_i = -y_i (since alpha starts at zero).
SvrInit -- Initializes for epsilon-SVR by doubling the problem size (alpha+ and alpha- for each sample) and setting f_i = y_i * epsilon - y_regression_i.
UpdateF -- Updates the optimality indicator vector after each block solve step using the kernel matrix tile and change in dual coefficients.
CheckStoppingCondition -- Monitors convergence by tracking the optimality gap, detecting stalls (unchanged diff over consecutive steps), and warning about non-monotonic convergence.

The solver uses RMM device buffers for all GPU memory management and supports both precomputed kernel matrices and on-the-fly kernel evaluation via the cuVS GramMatrix interface.

References:

Joachims (1998) - Large-scale SVM practical methods
Vanek et al. (2017) - GPU-optimized hierarchical decomposition for SVM
Wen et al. (2018) - ThunderSVM

Usage

This class is used internally by the cuML SVM implementation. It is not typically called directly by users; instead, use the Python cuml.svm.SVC or cuml.svm.SVR classes.

Code Reference

Source Location

Repository: Rapidsai_Cuml
File: cpp/src/svm/smosolver.h

Signature

namespace ML {
namespace SVM {

template <typename math_t>
class SmoSolver {
 public:
  SmoSolver(const raft::handle_t& handle,
            SvmParameter param,
            cuvs::distance::kernels::KernelType kernel_type,
            cuvs::distance::kernels::GramMatrixBase<math_t>* kernel,
            bool is_precomputed = false);

  template <typename MatrixViewType>
  void Solve(MatrixViewType matrix, int n_rows, int n_cols,
             math_t* y, const math_t* sample_weight,
             math_t** dual_coefs, int* n_support,
             SupportStorage<math_t>* support_matrix,
             int** idx, math_t* b,
             int max_iter = -1, int max_outer_iter = -1,
             int max_inner_iter = 10000);

  void Initialize(math_t** y, const math_t* sample_weight, int n_rows, int n_cols);
  void SvcInit(const math_t* y);
  void SvrInit(const math_t* yr, int n_rows, math_t* yc, math_t* f);
  void UpdateF(math_t* f, int n_rows, const math_t* delta_alpha,
               int n_ws, const math_t* cacheTile);
  int GetNIter();
};

} // namespace SVM
} // namespace ML

Import

#include "smosolver.h"
// Dependencies:
#include <cuml/svm/svm_model.h>
#include <raft/core/handle.hpp>
#include <cuvs/distance/grammian.hpp>

I/O Contract

Inputs

Name	Type	Required	Description
handle	raft::handle_t	Yes	RAFT handle with CUDA stream and cuBLAS context
param	SvmParameter	Yes	SVM parameters including C, tol, epsilon, cache_size, svmType, verbosity
kernel_type	KernelType	Yes	Type of kernel function (RBF, linear, polynomial, etc.)
kernel	GramMatrixBase<math_t>*	Yes	Kernel function evaluator
matrix	MatrixViewType	Yes	Training data matrix [n_rows x n_cols]
y	math_t*	Yes	Labels (+/-1 for SVC, target values for SVR), size [n_rows]
sample_weight	math_t*	No	Per-sample weights (nullptr for uniform)
max_outer_iter	int	No	Maximum outer iterations (default: 100 * n_rows)
max_inner_iter	int	No	Maximum inner SMO iterations per block (default: 10000)

Outputs

Name	Type	Description
dual_coefs	math_t**	Device array of dual coefficients, size [n_support]
n_support	int*	Number of support vectors found
support_matrix	SupportStorage<math_t>*	Support vectors in matrix format [n_support x n_cols]
idx	int**	Original training set indices of support vectors, size [n_support]
b	math_t*	Bias term for the decision function

Usage Examples

// Internal usage within cuML SVM implementation
raft::handle_t handle;
SvmParameter param;
param.C = 1.0;
param.tol = 1e-3;
param.svmType = C_SVC;

auto* kernel = new cuvs::distance::kernels::RBFKernel<float>(gamma);

SmoSolver<float> solver(handle, param,
                        cuvs::distance::kernels::KernelType::RBF, kernel);

float* dual_coefs = nullptr;
int n_support = 0;
SupportStorage<float> support_matrix;
int* idx = nullptr;
float b;

solver.Solve(matrix_view, n_rows, n_cols, y, nullptr,
             &dual_coefs, &n_support, &support_matrix, &idx, &b);

Related Pages

Environment:Rapidsai_Cuml_CUDA_GPU

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment