Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Rapidsai Cuml SVM WorkingSet

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Support_Vector_Machines
Last Updated 2026-02-08 12:00 GMT

Overview

The working set selection module for the SMO-based SVM solver, responsible for choosing which training vectors to optimize at each outer iteration.

Description

workingset.h implements the WorkingSet class template that manages the subset of training vectors selected for optimization at each outer iteration of the SMO solver. By default, the working set contains up to 1024 elements, which is the sub-problem size for the outer decomposition level.

The class implements two selection strategies:

SimpleSelect -- Follows Joachims' strategy (1998) of selecting the top n/2 elements from the upper set (where the optimality indicator f is largest) and the bottom n/2 from the lower set (where f is smallest). This is used for initial selection and to fill remaining slots.

Select (with retention) -- To prevent training vectors from oscillating in and out of the working set, this method retains half of the previous working set and fills only the other half with new elements. Two retention policies are supported:

  • FIFO (default, tested) -- Keeps the newer half of the previous working set, following the ThunderSVM approach (Wen et al., 2018).
  • Priority-based -- Keeps elements based on how long they have been in the working set, preferring newer elements. Follows Serafini & Zanni's gradient-projection decomposition approach.

PrioritySelect -- Sorts the previous working set by priority (ascending) and selects elements from free vectors first, then from lower/upper bound vectors.

The class manages several GPU buffers for sorting, selection, and priority tracking:

  • idx -- Current working set indices
  • f_idx / f_idx_sorted -- Index arrays for sorting by f values
  • available -- Flag vector marking vectors available for selection
  • ws_priority -- Priority scores for retention decisions
  • ws_idx_save -- Saved working set for retention across iterations

For epsilon-SVR, the number of training vectors is doubled (alpha+ and alpha- for each sample), which the class handles by setting n_train = n_rows * 2.

Usage

This class is used internally by SmoSolver::Solve and is not called directly by users. It is instantiated once per solver invocation and its Select method is called at each outer iteration.

Code Reference

Source Location

Signature

namespace ML {
namespace SVM {

template <typename math_t>
class WorkingSet {
 public:
  bool FIFO_strategy = true;

  WorkingSet(const raft::handle_t& handle, cudaStream_t stream,
             int n_rows = 0, int n_ws = 0, SvmType svmType = C_SVC);

  void SetSize(int n_train, int n_ws = 0);
  int GetSize();
  int* GetIndices();

  void Select(math_t* f, math_t* alpha, math_t* y, const math_t* C);

  void SimpleSelect(math_t* f, math_t* alpha, math_t* y, const math_t* C,
                    int n_already_selected = 0);

  int PrioritySelect(math_t* alpha, const math_t* C, int nc);
};

} // namespace SVM
} // namespace ML

Import

#include "workingset.h"
// Dependencies:
#include <cuml/svm/svm_parameter.h>
#include <raft/core/handle.hpp>

I/O Contract

Inputs

Name Type Required Description
handle raft::handle_t Yes RAFT handle for GPU operations
stream cudaStream_t Yes CUDA stream for working set operations
n_rows int Yes Number of original training vectors
n_ws int No Working set size (default: min(1024, n_train))
svmType SvmType No SVM type: C_SVC or EPSILON_SVR (default: C_SVC)
f math_t* Yes (for Select) Optimality indicator vector, size [n_train]
alpha math_t* Yes (for Select) Dual coefficients, size [n_train]
y math_t* Yes (for Select) Class labels (+/-1), size [n_train]
C const math_t* Yes (for Select) Penalty parameter vector, size [n_train]

Outputs

Name Type Description
Working set indices int* (via GetIndices()) Device array of selected training vector indices, size [n_ws]
Working set size int (via GetSize()) Number of elements in the working set

Usage Examples

// Internal usage within SmoSolver::Solve
raft::handle_t handle;
cudaStream_t stream = handle.get_stream();

WorkingSet<float> ws(handle, stream, n_rows, 1024, C_SVC);

// At each outer iteration:
ws.Select(f_ptr, alpha_ptr, y_ptr, C_vec_ptr);

// Get indices for the kernel cache tile
int* ws_indices = ws.GetIndices();
int ws_size = ws.GetSize();

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment