Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Dotnet Machinelearning Feature Engineering Transforms

From Leeroopedia


Knowledge Sources
Domains Machine Learning, Feature Engineering, Data Preprocessing
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tools for transforming raw categorical, text, and numeric columns into a unified numeric feature vector, provided by ML.NET.

Description

ML.NET provides a composable set of transform estimators accessed through the TransformsCatalog on MLContext. These transforms follow the estimator-transformer pattern: each is an IEstimator<ITransformer> that learns data-dependent parameters during Fit and produces a stateless ITransformer for applying the transform to new data.

  • OneHotEncoding maps categorical string columns to indicator vectors using a vocabulary learned from the training data.
  • FeaturizeText applies a full NLP pipeline (tokenization, n-gram extraction, stop-word removal, TF-IDF) to text columns.
  • NormalizeMinMax learns per-feature minimum and maximum values and rescales to [0, 1].
  • Concatenate joins multiple columns into a single vector column.

These transforms are chained using the Append method to build a pipeline that is fitted and applied as a single unit.

Usage

Use these transforms after loading data and before appending a trainer. Select transforms based on column types in your schema. Always finish with Concatenate to produce the single "Features" column expected by trainers.

Code Reference

Source Location

  • Repository: ML.NET
  • File: src/Microsoft.ML.Transforms/CategoricalCatalog.cs:L36+
  • File: src/Microsoft.ML.Transforms/Text/TextCatalog.cs:L36-40
  • File: src/Microsoft.ML.Data/Transforms/NormalizationCatalog.cs:L51+
  • File: src/Microsoft.ML.Data/Transforms/ColumnConcatenatingEstimator.cs:L20+

Signature

// One-hot encoding for categorical columns
public OneHotEncodingEstimator OneHotEncoding(
    string outputColumnName,
    string inputColumnName = null)

// Text featurization (tokenize, n-gram, TF-IDF)
public TextFeaturizingEstimator FeaturizeText(
    string outputColumnName,
    string inputColumnName = null)

// Min-max normalization for numeric columns
public NormalizingEstimator NormalizeMinMax(
    string outputColumnName,
    string inputColumnName = null,
    long maximumExampleCount = 1000000000,
    bool fixZero = true)

// Concatenate multiple columns into a single vector
public ColumnConcatenatingEstimator Concatenate(
    string outputColumnName,
    params string[] inputColumnNames)

Import

using Microsoft.ML;

Additional dependencies for text and categorical transforms:

// NuGet: Microsoft.ML (includes core transforms)
// NuGet: Microsoft.ML.Transforms (included in Microsoft.ML meta-package)

I/O Contract

Inputs

Name Type Required Description
outputColumnName string Yes Name of the output column to create.
inputColumnName string No Name of the input column to transform. Defaults to outputColumnName if null.
inputColumnNames string[] Yes (Concatenate) Array of column names to concatenate into the output vector.
maximumExampleCount long No (NormalizeMinMax) Max rows to scan for min/max. Default: 1,000,000,000.
fixZero bool No (NormalizeMinMax) Whether to map zero to zero. Default: true.

Outputs

Name Type Description
(return) IEstimator<ITransformer> A composable estimator that can be appended to a pipeline. Produces the named output column when fitted and applied.

Usage Examples

Basic Example

using Microsoft.ML;
using Microsoft.ML.Data;

public class HousingData
{
    [LoadColumn(0)] public string Neighborhood { get; set; }
    [LoadColumn(1)] public string Description { get; set; }
    [LoadColumn(2)] public float SquareFeet { get; set; }
    [LoadColumn(3)] public float LotSize { get; set; }
    [LoadColumn(4), ColumnName("Label")] public bool IsLuxury { get; set; }
}

var mlContext = new MLContext(seed: 42);

var data = mlContext.Data.LoadFromTextFile<HousingData>(
    "housing.csv", separatorChar: ',', hasHeader: true);

// Build feature engineering pipeline
var featurePipeline = mlContext.Transforms.Categorical
        .OneHotEncoding("NeighborhoodEncoded", "Neighborhood")
    .Append(mlContext.Transforms.Text
        .FeaturizeText("DescriptionFeatures", "Description"))
    .Append(mlContext.Transforms
        .NormalizeMinMax("SquareFeetNorm", "SquareFeet"))
    .Append(mlContext.Transforms
        .NormalizeMinMax("LotSizeNorm", "LotSize"))
    .Append(mlContext.Transforms.Concatenate("Features",
        "NeighborhoodEncoded",
        "DescriptionFeatures",
        "SquareFeetNorm",
        "LotSizeNorm"));

// Append a trainer and fit the full pipeline
var fullPipeline = featurePipeline
    .Append(mlContext.BinaryClassification.Trainers.FastTree());

var splitData = mlContext.Data.TrainTestSplit(data, testFraction: 0.2);
var model = fullPipeline.Fit(splitData.TrainSet);

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment