Implementation:Rapidsai Cuml SVM WorkingSet
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Support_Vector_Machines |
| Last Updated | 2026-02-08 12:00 GMT |
Overview
The working set selection module for the SMO-based SVM solver, responsible for choosing which training vectors to optimize at each outer iteration.
Description
workingset.h implements the WorkingSet class template that manages the subset of training vectors selected for optimization at each outer iteration of the SMO solver. By default, the working set contains up to 1024 elements, which is the sub-problem size for the outer decomposition level.
The class implements two selection strategies:
SimpleSelect -- Follows Joachims' strategy (1998) of selecting the top n/2 elements from the upper set (where the optimality indicator f is largest) and the bottom n/2 from the lower set (where f is smallest). This is used for initial selection and to fill remaining slots.
Select (with retention) -- To prevent training vectors from oscillating in and out of the working set, this method retains half of the previous working set and fills only the other half with new elements. Two retention policies are supported:
- FIFO (default, tested) -- Keeps the newer half of the previous working set, following the ThunderSVM approach (Wen et al., 2018).
- Priority-based -- Keeps elements based on how long they have been in the working set, preferring newer elements. Follows Serafini & Zanni's gradient-projection decomposition approach.
PrioritySelect -- Sorts the previous working set by priority (ascending) and selects elements from free vectors first, then from lower/upper bound vectors.
The class manages several GPU buffers for sorting, selection, and priority tracking:
idx-- Current working set indicesf_idx/f_idx_sorted-- Index arrays for sorting byfvaluesavailable-- Flag vector marking vectors available for selectionws_priority-- Priority scores for retention decisionsws_idx_save-- Saved working set for retention across iterations
For epsilon-SVR, the number of training vectors is doubled (alpha+ and alpha- for each sample), which the class handles by setting n_train = n_rows * 2.
Usage
This class is used internally by SmoSolver::Solve and is not called directly by users. It is instantiated once per solver invocation and its Select method is called at each outer iteration.
Code Reference
Source Location
- Repository: Rapidsai_Cuml
- File:
cpp/src/svm/workingset.h
Signature
namespace ML {
namespace SVM {
template <typename math_t>
class WorkingSet {
public:
bool FIFO_strategy = true;
WorkingSet(const raft::handle_t& handle, cudaStream_t stream,
int n_rows = 0, int n_ws = 0, SvmType svmType = C_SVC);
void SetSize(int n_train, int n_ws = 0);
int GetSize();
int* GetIndices();
void Select(math_t* f, math_t* alpha, math_t* y, const math_t* C);
void SimpleSelect(math_t* f, math_t* alpha, math_t* y, const math_t* C,
int n_already_selected = 0);
int PrioritySelect(math_t* alpha, const math_t* C, int nc);
};
} // namespace SVM
} // namespace ML
Import
#include "workingset.h"
// Dependencies:
#include <cuml/svm/svm_parameter.h>
#include <raft/core/handle.hpp>
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| handle | raft::handle_t | Yes | RAFT handle for GPU operations |
| stream | cudaStream_t | Yes | CUDA stream for working set operations |
| n_rows | int | Yes | Number of original training vectors |
| n_ws | int | No | Working set size (default: min(1024, n_train)) |
| svmType | SvmType | No | SVM type: C_SVC or EPSILON_SVR (default: C_SVC) |
| f | math_t* | Yes (for Select) | Optimality indicator vector, size [n_train] |
| alpha | math_t* | Yes (for Select) | Dual coefficients, size [n_train] |
| y | math_t* | Yes (for Select) | Class labels (+/-1), size [n_train] |
| C | const math_t* | Yes (for Select) | Penalty parameter vector, size [n_train] |
Outputs
| Name | Type | Description |
|---|---|---|
| Working set indices | int* (via GetIndices()) | Device array of selected training vector indices, size [n_ws] |
| Working set size | int (via GetSize()) | Number of elements in the working set |
Usage Examples
// Internal usage within SmoSolver::Solve
raft::handle_t handle;
cudaStream_t stream = handle.get_stream();
WorkingSet<float> ws(handle, stream, n_rows, 1024, C_SVC);
// At each outer iteration:
ws.Select(f_ptr, alpha_ptr, y_ptr, C_vec_ptr);
// Get indices for the kernel cache tile
int* ws_indices = ws.GetIndices();
int ws_size = ws.GetSize();