Implementation:Dotnet Machinelearning Feature Engineering Transforms
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Feature Engineering, Data Preprocessing |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tools for transforming raw categorical, text, and numeric columns into a unified numeric feature vector, provided by ML.NET.
Description
ML.NET provides a composable set of transform estimators accessed through the TransformsCatalog on MLContext. These transforms follow the estimator-transformer pattern: each is an IEstimator<ITransformer> that learns data-dependent parameters during Fit and produces a stateless ITransformer for applying the transform to new data.
- OneHotEncoding maps categorical string columns to indicator vectors using a vocabulary learned from the training data.
- FeaturizeText applies a full NLP pipeline (tokenization, n-gram extraction, stop-word removal, TF-IDF) to text columns.
- NormalizeMinMax learns per-feature minimum and maximum values and rescales to [0, 1].
- Concatenate joins multiple columns into a single vector column.
These transforms are chained using the Append method to build a pipeline that is fitted and applied as a single unit.
Usage
Use these transforms after loading data and before appending a trainer. Select transforms based on column types in your schema. Always finish with Concatenate to produce the single "Features" column expected by trainers.
Code Reference
Source Location
- Repository: ML.NET
- File:
src/Microsoft.ML.Transforms/CategoricalCatalog.cs:L36+ - File:
src/Microsoft.ML.Transforms/Text/TextCatalog.cs:L36-40 - File:
src/Microsoft.ML.Data/Transforms/NormalizationCatalog.cs:L51+ - File:
src/Microsoft.ML.Data/Transforms/ColumnConcatenatingEstimator.cs:L20+
Signature
// One-hot encoding for categorical columns
public OneHotEncodingEstimator OneHotEncoding(
string outputColumnName,
string inputColumnName = null)
// Text featurization (tokenize, n-gram, TF-IDF)
public TextFeaturizingEstimator FeaturizeText(
string outputColumnName,
string inputColumnName = null)
// Min-max normalization for numeric columns
public NormalizingEstimator NormalizeMinMax(
string outputColumnName,
string inputColumnName = null,
long maximumExampleCount = 1000000000,
bool fixZero = true)
// Concatenate multiple columns into a single vector
public ColumnConcatenatingEstimator Concatenate(
string outputColumnName,
params string[] inputColumnNames)
Import
using Microsoft.ML;
Additional dependencies for text and categorical transforms:
// NuGet: Microsoft.ML (includes core transforms)
// NuGet: Microsoft.ML.Transforms (included in Microsoft.ML meta-package)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| outputColumnName | string | Yes | Name of the output column to create. |
| inputColumnName | string | No | Name of the input column to transform. Defaults to outputColumnName if null. |
| inputColumnNames | string[] | Yes (Concatenate) | Array of column names to concatenate into the output vector. |
| maximumExampleCount | long | No (NormalizeMinMax) | Max rows to scan for min/max. Default: 1,000,000,000. |
| fixZero | bool | No (NormalizeMinMax) | Whether to map zero to zero. Default: true. |
Outputs
| Name | Type | Description |
|---|---|---|
| (return) | IEstimator<ITransformer> | A composable estimator that can be appended to a pipeline. Produces the named output column when fitted and applied. |
Usage Examples
Basic Example
using Microsoft.ML;
using Microsoft.ML.Data;
public class HousingData
{
[LoadColumn(0)] public string Neighborhood { get; set; }
[LoadColumn(1)] public string Description { get; set; }
[LoadColumn(2)] public float SquareFeet { get; set; }
[LoadColumn(3)] public float LotSize { get; set; }
[LoadColumn(4), ColumnName("Label")] public bool IsLuxury { get; set; }
}
var mlContext = new MLContext(seed: 42);
var data = mlContext.Data.LoadFromTextFile<HousingData>(
"housing.csv", separatorChar: ',', hasHeader: true);
// Build feature engineering pipeline
var featurePipeline = mlContext.Transforms.Categorical
.OneHotEncoding("NeighborhoodEncoded", "Neighborhood")
.Append(mlContext.Transforms.Text
.FeaturizeText("DescriptionFeatures", "Description"))
.Append(mlContext.Transforms
.NormalizeMinMax("SquareFeetNorm", "SquareFeet"))
.Append(mlContext.Transforms
.NormalizeMinMax("LotSizeNorm", "LotSize"))
.Append(mlContext.Transforms.Concatenate("Features",
"NeighborhoodEncoded",
"DescriptionFeatures",
"SquareFeetNorm",
"LotSizeNorm"));
// Append a trainer and fit the full pipeline
var fullPipeline = featurePipeline
.Append(mlContext.BinaryClassification.Trainers.FastTree());
var splitData = mlContext.Data.TrainTestSplit(data, testFraction: 0.2);
var model = fullPipeline.Fit(splitData.TrainSet);