Implementation:Dotnet Machinelearning LoadFromTextFile and TrainTestSplit

Knowledge Sources	ML.NET ML.NET API Reference
Domains	Machine Learning, Data Engineering, .NET
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tools for loading delimited text files into a lazy data view and splitting data into training and test partitions, provided by ML.NET.

Description

LoadFromTextFile<TInput> reads a delimited text file and maps its columns to the properties of a user-defined class TInput using column attributes. The method returns an IDataView, which is a lazy, cursorable representation of the data. No rows are materialized until a downstream consumer iterates over them.

TrainTestSplit takes an IDataView and produces a TrainTestData object containing two IDataView properties: TrainSet and TestSet. The split is performed randomly according to the specified test fraction. An optional samplingKeyColumnName enables stratified splitting to preserve class distributions.

Usage

Use LoadFromTextFile when source data is in CSV, TSV, or other delimited text format. Use TrainTestSplit immediately after loading to establish an evaluation protocol before any data leakage can occur through feature engineering.

Code Reference

Source Location

Repository: ML.NET
File: src/Microsoft.ML.Data/DataLoadSave/Text/TextLoaderSaverCatalog.cs:L160-265
File: src/Microsoft.ML.Data/DataLoadSave/DataOperationsCatalog.cs:L411-438

Signature

// Load from text file with schema inferred from TInput
public static IDataView LoadFromTextFile<TInput>(
    this DataOperationsCatalog catalog,
    string path,
    char separatorChar = '\t',
    bool hasHeader = false,
    bool allowQuoting = false,
    bool trimWhitespace = false,
    bool allowSparse = false)

// Split into train and test sets
public TrainTestData TrainTestSplit(
    IDataView data,
    double testFraction = 0.1,
    string samplingKeyColumnName = null,
    int? seed = null)

Import

using Microsoft.ML;

I/O Contract

Inputs

LoadFromTextFile:

Name	Type	Required	Description
path	string	Yes	File path to the delimited text file (CSV, TSV, etc.).
separatorChar	char	No	Column delimiter character. Default: '\t' (tab).
hasHeader	bool	No	Whether the first line is a header row. Default: false.
allowQuoting	bool	No	Whether fields may be quoted. Default: false.
trimWhitespace	bool	No	Whether to trim whitespace from values. Default: false.
allowSparse	bool	No	Whether to allow sparse format. Default: false.

TrainTestSplit:

Name	Type	Required	Description
data	IDataView	Yes	The dataset to split.
testFraction	double	No	Fraction of data for the test set. Default: 0.1.
samplingKeyColumnName	string	No	Column name for stratified splitting. Default: null.
seed	int?	No	Random seed for reproducibility. Default: null.

Outputs

LoadFromTextFile:

Name	Type	Description
(return)	IDataView	Lazy, cursorable data view with schema inferred from TInput.

TrainTestSplit:

Name	Type	Description
(return)	TrainTestData	Object containing TrainSet (IDataView) and TestSet (IDataView).

Usage Examples

Basic Example

using Microsoft.ML;
using Microsoft.ML.Data;

// Define the input data schema
public class SentimentData
{
    [LoadColumn(0)]
    public string SentimentText { get; set; }

    [LoadColumn(1), ColumnName("Label")]
    public bool Sentiment { get; set; }
}

// Initialize context
var mlContext = new MLContext(seed: 42);

// Load data from a CSV file
IDataView dataView = mlContext.Data.LoadFromTextFile<SentimentData>(
    path: "sentiment_data.csv",
    separatorChar: ',',
    hasHeader: true,
    allowQuoting: true);

// Split into 80% train and 20% test
var splitData = mlContext.Data.TrainTestSplit(
    dataView,
    testFraction: 0.2);

IDataView trainData = splitData.TrainSet;
IDataView testData = splitData.TestSet;

// trainData and testData are ready for pipeline construction

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment