Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Rapidsai Cuml Kernel SHAP

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Explainability
Last Updated 2026-02-08 12:00 GMT

Overview

Generates GPU-accelerated sample datasets for the Kernel SHAP (SHapley Additive exPlanations) algorithm, enabling model-agnostic feature importance explanations.

Description

The ML::Explainer::kernel_dataset function constructs the combinatorial dataset required by the Kernel SHAP algorithm. Given a binary mask matrix X (indicating which features to take from the observation vs. the background), a background dataset, and an observation row, it produces a "scattered" dataset where each row is a combination of observation and background feature values according to the mask.

The function handles both the exact part of the Kernel SHAP dataset (where the mask is fully specified) and the sampled part (where k entries are randomly selected). The nsamples array controls how many features are randomly sampled for each row of the mask, and maxsample specifies the largest sample size.

Each block in the GPU kernel scatters one row of the observation into the corresponding background rows in the output dataset, based on the binary mask in X.

Usage

Use this function as part of a Kernel SHAP pipeline to generate the perturbation dataset on the GPU. After generating the dataset, pass it through the model to obtain predictions, then compute SHAP values from the prediction differences. This accelerates the typically expensive Kernel SHAP sampling process.

Code Reference

Source Location

  • Repository: Rapidsai_Cuml
  • File: cpp/include/cuml/explainer/kernel_shap.hpp

Signature

namespace ML {
namespace Explainer {

void kernel_dataset(const raft::handle_t& handle,
                    float* X,
                    int nrows_X,
                    int ncols,
                    float* background,
                    int nrows_background,
                    float* dataset,
                    float* observation,
                    int* nsamples,
                    int len_nsamples,
                    int maxsample,
                    uint64_t seed = 0ULL);

}  // namespace Explainer
}  // namespace ML

Import

#include <cuml/explainer/kernel_shap.hpp>

I/O Contract

Inputs

Name Type Required Description
handle const raft::handle_t& Yes cuML handle for GPU resource management
X float* Yes (inout) Binary mask matrix on device [nrows_X x ncols], row-major; modified in-place for sampled rows
nrows_X int Yes Number of rows in X (number of mask combinations)
ncols int Yes Number of columns (features) shared by X, background, and dataset
background float* Yes Background dataset on device [nrows_background x ncols]
nrows_background int Yes Number of rows in the background dataset
observation float* Yes The observation row to explain on device [ncols]
nsamples int* Yes Array specifying number of features to randomly sample per mask row [len_nsamples]
len_nsamples int Yes Number of entries in the nsamples array
maxsample int Yes Size of the largest sample in nsamples
seed uint64_t No (default 0) Seed for the random number generator

Outputs

Name Type Description
dataset float* Device pointer to the generated Kernel SHAP dataset [nrows_X * nrows_background x ncols], row-major
X float* Modified binary mask matrix (updated in-place for sampled rows)

Usage Examples

#include <cuml/explainer/kernel_shap.hpp>
#include <raft/core/handle.hpp>

void run_kernel_shap() {
    raft::handle_t handle;

    int ncols = 4;
    int nrows_X = 2;
    int nrows_background = 2;

    // Allocate and initialize device memory
    float* X;            // binary mask [nrows_X x ncols]
    float* background;   // background data [nrows_background x ncols]
    float* dataset;      // output [nrows_X * nrows_background x ncols]
    float* observation;  // single observation [ncols]
    int* nsamples;       // number of features to sample per row

    cudaMalloc(&X, nrows_X * ncols * sizeof(float));
    cudaMalloc(&background, nrows_background * ncols * sizeof(float));
    cudaMalloc(&dataset, nrows_X * nrows_background * ncols * sizeof(float));
    cudaMalloc(&observation, ncols * sizeof(float));
    cudaMalloc(&nsamples, nrows_X * sizeof(int));

    // Initialize X, background, observation, nsamples on device...

    ML::Explainer::kernel_dataset(handle,
                                  X, nrows_X, ncols,
                                  background, nrows_background,
                                  dataset, observation,
                                  nsamples, nrows_X,
                                  3,       // maxsample
                                  42ULL);  // seed

    handle.sync_stream();

    // Pass dataset through model to get predictions, then compute SHAP values...

    cudaFree(X);
    cudaFree(background);
    cudaFree(dataset);
    cudaFree(observation);
    cudaFree(nsamples);
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment