Implementation:Rapidsai Cuml SVM WorkingSet

Knowledge Sources	Rapidsai_Cuml
Domains	Machine_Learning, Support_Vector_Machines
Last Updated	2026-02-08 12:00 GMT

Overview

The working set selection module for the SMO-based SVM solver, responsible for choosing which training vectors to optimize at each outer iteration.

Description

workingset.h implements the WorkingSet class template that manages the subset of training vectors selected for optimization at each outer iteration of the SMO solver. By default, the working set contains up to 1024 elements, which is the sub-problem size for the outer decomposition level.

The class implements two selection strategies:

SimpleSelect -- Follows Joachims' strategy (1998) of selecting the top n/2 elements from the upper set (where the optimality indicator f is largest) and the bottom n/2 from the lower set (where f is smallest). This is used for initial selection and to fill remaining slots.

Select (with retention) -- To prevent training vectors from oscillating in and out of the working set, this method retains half of the previous working set and fills only the other half with new elements. Two retention policies are supported:

FIFO (default, tested) -- Keeps the newer half of the previous working set, following the ThunderSVM approach (Wen et al., 2018).
Priority-based -- Keeps elements based on how long they have been in the working set, preferring newer elements. Follows Serafini & Zanni's gradient-projection decomposition approach.

PrioritySelect -- Sorts the previous working set by priority (ascending) and selects elements from free vectors first, then from lower/upper bound vectors.

The class manages several GPU buffers for sorting, selection, and priority tracking:

idx -- Current working set indices
f_idx / f_idx_sorted -- Index arrays for sorting by f values
available -- Flag vector marking vectors available for selection
ws_priority -- Priority scores for retention decisions
ws_idx_save -- Saved working set for retention across iterations

For epsilon-SVR, the number of training vectors is doubled (alpha+ and alpha- for each sample), which the class handles by setting n_train = n_rows * 2.

Usage

This class is used internally by SmoSolver::Solve and is not called directly by users. It is instantiated once per solver invocation and its Select method is called at each outer iteration.

Code Reference

Source Location

Repository: Rapidsai_Cuml
File: cpp/src/svm/workingset.h

Signature

namespace ML {
namespace SVM {

template <typename math_t>
class WorkingSet {
 public:
  bool FIFO_strategy = true;

  WorkingSet(const raft::handle_t& handle, cudaStream_t stream,
             int n_rows = 0, int n_ws = 0, SvmType svmType = C_SVC);

  void SetSize(int n_train, int n_ws = 0);
  int GetSize();
  int* GetIndices();

  void Select(math_t* f, math_t* alpha, math_t* y, const math_t* C);

  void SimpleSelect(math_t* f, math_t* alpha, math_t* y, const math_t* C,
                    int n_already_selected = 0);

  int PrioritySelect(math_t* alpha, const math_t* C, int nc);
};

} // namespace SVM
} // namespace ML

Import

#include "workingset.h"
// Dependencies:
#include <cuml/svm/svm_parameter.h>
#include <raft/core/handle.hpp>

I/O Contract

Inputs

Name	Type	Required	Description
handle	raft::handle_t	Yes	RAFT handle for GPU operations
stream	cudaStream_t	Yes	CUDA stream for working set operations
n_rows	int	Yes	Number of original training vectors
n_ws	int	No	Working set size (default: min(1024, n_train))
svmType	SvmType	No	SVM type: C_SVC or EPSILON_SVR (default: C_SVC)
f	math_t*	Yes (for Select)	Optimality indicator vector, size [n_train]
alpha	math_t*	Yes (for Select)	Dual coefficients, size [n_train]
y	math_t*	Yes (for Select)	Class labels (+/-1), size [n_train]
C	const math_t*	Yes (for Select)	Penalty parameter vector, size [n_train]

Outputs

Name	Type	Description
Working set indices	int* (via GetIndices())	Device array of selected training vector indices, size [n_ws]
Working set size	int (via GetSize())	Number of elements in the working set

Usage Examples

// Internal usage within SmoSolver::Solve
raft::handle_t handle;
cudaStream_t stream = handle.get_stream();

WorkingSet<float> ws(handle, stream, n_rows, 1024, C_SVC);

// At each outer iteration:
ws.Select(f_ptr, alpha_ptr, y_ptr, C_vec_ptr);

// Get indices for the kernel cache tile
int* ws_indices = ws.GetIndices();
int ws_size = ws.GetSize();

Related Pages

Environment:Rapidsai_Cuml_CUDA_GPU

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment