Implementation:Rapidsai Cuml Kernel SHAP

Knowledge Sources	Rapidsai_Cuml
Domains	Machine_Learning, Explainability
Last Updated	2026-02-08 12:00 GMT

Overview

Generates GPU-accelerated sample datasets for the Kernel SHAP (SHapley Additive exPlanations) algorithm, enabling model-agnostic feature importance explanations.

Description

The ML::Explainer::kernel_dataset function constructs the combinatorial dataset required by the Kernel SHAP algorithm. Given a binary mask matrix X (indicating which features to take from the observation vs. the background), a background dataset, and an observation row, it produces a "scattered" dataset where each row is a combination of observation and background feature values according to the mask.

The function handles both the exact part of the Kernel SHAP dataset (where the mask is fully specified) and the sampled part (where k entries are randomly selected). The nsamples array controls how many features are randomly sampled for each row of the mask, and maxsample specifies the largest sample size.

Each block in the GPU kernel scatters one row of the observation into the corresponding background rows in the output dataset, based on the binary mask in X.

Usage

Use this function as part of a Kernel SHAP pipeline to generate the perturbation dataset on the GPU. After generating the dataset, pass it through the model to obtain predictions, then compute SHAP values from the prediction differences. This accelerates the typically expensive Kernel SHAP sampling process.

Code Reference

Source Location

Repository: Rapidsai_Cuml
File: cpp/include/cuml/explainer/kernel_shap.hpp

Signature

namespace ML {
namespace Explainer {

void kernel_dataset(const raft::handle_t& handle,
                    float* X,
                    int nrows_X,
                    int ncols,
                    float* background,
                    int nrows_background,
                    float* dataset,
                    float* observation,
                    int* nsamples,
                    int len_nsamples,
                    int maxsample,
                    uint64_t seed = 0ULL);

}  // namespace Explainer
}  // namespace ML

Import

#include <cuml/explainer/kernel_shap.hpp>

I/O Contract

Inputs

Name	Type	Required	Description
handle	const raft::handle_t&	Yes	cuML handle for GPU resource management
X	float*	Yes (inout)	Binary mask matrix on device [nrows_X x ncols], row-major; modified in-place for sampled rows
nrows_X	int	Yes	Number of rows in X (number of mask combinations)
ncols	int	Yes	Number of columns (features) shared by X, background, and dataset
background	float*	Yes	Background dataset on device [nrows_background x ncols]
nrows_background	int	Yes	Number of rows in the background dataset
observation	float*	Yes	The observation row to explain on device [ncols]
nsamples	int*	Yes	Array specifying number of features to randomly sample per mask row [len_nsamples]
len_nsamples	int	Yes	Number of entries in the nsamples array
maxsample	int	Yes	Size of the largest sample in nsamples
seed	uint64_t	No (default 0)	Seed for the random number generator

Outputs

Name	Type	Description
dataset	float*	Device pointer to the generated Kernel SHAP dataset [nrows_X * nrows_background x ncols], row-major
X	float*	Modified binary mask matrix (updated in-place for sampled rows)

Usage Examples

#include <cuml/explainer/kernel_shap.hpp>
#include <raft/core/handle.hpp>

void run_kernel_shap() {
    raft::handle_t handle;

    int ncols = 4;
    int nrows_X = 2;
    int nrows_background = 2;

    // Allocate and initialize device memory
    float* X;            // binary mask [nrows_X x ncols]
    float* background;   // background data [nrows_background x ncols]
    float* dataset;      // output [nrows_X * nrows_background x ncols]
    float* observation;  // single observation [ncols]
    int* nsamples;       // number of features to sample per row

    cudaMalloc(&X, nrows_X * ncols * sizeof(float));
    cudaMalloc(&background, nrows_background * ncols * sizeof(float));
    cudaMalloc(&dataset, nrows_X * nrows_background * ncols * sizeof(float));
    cudaMalloc(&observation, ncols * sizeof(float));
    cudaMalloc(&nsamples, nrows_X * sizeof(int));

    // Initialize X, background, observation, nsamples on device...

    ML::Explainer::kernel_dataset(handle,
                                  X, nrows_X, ncols,
                                  background, nrows_background,
                                  dataset, observation,
                                  nsamples, nrows_X,
                                  3,       // maxsample
                                  42ULL);  // seed

    handle.sync_stream();

    // Pass dataset through model to get predictions, then compute SHAP values...

    cudaFree(X);
    cudaFree(background);
    cudaFree(dataset);
    cudaFree(observation);
    cudaFree(nsamples);
}

Related Pages

Environment:Rapidsai_Cuml_CUDA_GPU

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment