Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Interpretml Interpret CutWinsorized

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, EBM_Core
Last Updated 2026-02-07 12:00 GMT

Overview

CutWinsorized is a C++ module that generates winsorized cut points for feature discretization by trimming outliers from the tails of the data distribution before placing evenly spaced cuts.

Description

The CutWinsorized function implements a winsorized binning strategy where extreme values are trimmed and cut points are placed within the resulting range. The algorithm works by:

  1. Copying and sorting the input feature values, removing missing values and replacing infinities with the maximum/minimum representable float values.
  2. Determining outer boundary values by moving inward from the sorted extremes by a fraction proportional to the number of bins ((cSamples - 1) / cBins).
  3. Finding transition points near the center of the data when only one cut is requested.
  4. For multiple cuts, finding the inner transition values and placing evenly spaced cuts between them.
  5. Using the ArithmeticMean helper to compute midpoints between transition values.
  6. Using FloatTickIncrement to ensure the upper boundary slightly exceeds the last observed value for proper lower-bound-inclusive bin assignment.

The algorithm handles edge cases including single-valued data, data with few distinct values, and numerical overflow in step size computation.

Usage

This module is called during the feature discretization phase when the winsorized binning strategy is selected. It provides a more robust alternative to uniform binning by reducing the influence of extreme outliers on bin boundary placement.

Code Reference

Source Location

Signature

EBM_API_BODY ErrorEbm EBM_CALLING_CONVENTION CutWinsorized(
    IntEbm countSamples,
    const double* featureVals,
    IntEbm* countCutsInOut,
    double* cutsLowerBoundInclusiveOut);

I/O Contract

Inputs

Name Type Required Description
countSamples IntEbm Yes Number of feature value samples
featureVals const double* Yes Array of feature values (may contain NaN for missing)
countCutsInOut IntEbm* Yes Pointer to the desired number of cuts (updated on output)
cutsLowerBoundInclusiveOut double* Yes Output buffer for cut points

Outputs

Name Type Description
return value ErrorEbm Error code (Error_None on success)
countCutsInOut IntEbm* Updated with the actual number of cuts placed
cutsLowerBoundInclusiveOut double* Array of lower-bound-inclusive cut points

Usage Examples

Pipeline Context

# This C++ module is called internally via the native bindings
# during preprocessing when winsorized binning is selected
from interpret.glassbox import ExplainableBoostingClassifier
ebm = ExplainableBoostingClassifier()
ebm.fit(X, y)  # Internally calls CutWinsorized during discretization

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment