Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Dotnet Machinelearning LoadFromTextFile and TrainTestSplit

From Leeroopedia


Knowledge Sources
Domains Machine Learning, Data Engineering, .NET
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tools for loading delimited text files into a lazy data view and splitting data into training and test partitions, provided by ML.NET.

Description

LoadFromTextFile<TInput> reads a delimited text file and maps its columns to the properties of a user-defined class TInput using column attributes. The method returns an IDataView, which is a lazy, cursorable representation of the data. No rows are materialized until a downstream consumer iterates over them.

TrainTestSplit takes an IDataView and produces a TrainTestData object containing two IDataView properties: TrainSet and TestSet. The split is performed randomly according to the specified test fraction. An optional samplingKeyColumnName enables stratified splitting to preserve class distributions.

Usage

Use LoadFromTextFile when source data is in CSV, TSV, or other delimited text format. Use TrainTestSplit immediately after loading to establish an evaluation protocol before any data leakage can occur through feature engineering.

Code Reference

Source Location

  • Repository: ML.NET
  • File: src/Microsoft.ML.Data/DataLoadSave/Text/TextLoaderSaverCatalog.cs:L160-265
  • File: src/Microsoft.ML.Data/DataLoadSave/DataOperationsCatalog.cs:L411-438

Signature

// Load from text file with schema inferred from TInput
public static IDataView LoadFromTextFile<TInput>(
    this DataOperationsCatalog catalog,
    string path,
    char separatorChar = '\t',
    bool hasHeader = false,
    bool allowQuoting = false,
    bool trimWhitespace = false,
    bool allowSparse = false)

// Split into train and test sets
public TrainTestData TrainTestSplit(
    IDataView data,
    double testFraction = 0.1,
    string samplingKeyColumnName = null,
    int? seed = null)

Import

using Microsoft.ML;

I/O Contract

Inputs

LoadFromTextFile:

Name Type Required Description
path string Yes File path to the delimited text file (CSV, TSV, etc.).
separatorChar char No Column delimiter character. Default: '\t' (tab).
hasHeader bool No Whether the first line is a header row. Default: false.
allowQuoting bool No Whether fields may be quoted. Default: false.
trimWhitespace bool No Whether to trim whitespace from values. Default: false.
allowSparse bool No Whether to allow sparse format. Default: false.

TrainTestSplit:

Name Type Required Description
data IDataView Yes The dataset to split.
testFraction double No Fraction of data for the test set. Default: 0.1.
samplingKeyColumnName string No Column name for stratified splitting. Default: null.
seed int? No Random seed for reproducibility. Default: null.

Outputs

LoadFromTextFile:

Name Type Description
(return) IDataView Lazy, cursorable data view with schema inferred from TInput.

TrainTestSplit:

Name Type Description
(return) TrainTestData Object containing TrainSet (IDataView) and TestSet (IDataView).

Usage Examples

Basic Example

using Microsoft.ML;
using Microsoft.ML.Data;

// Define the input data schema
public class SentimentData
{
    [LoadColumn(0)]
    public string SentimentText { get; set; }

    [LoadColumn(1), ColumnName("Label")]
    public bool Sentiment { get; set; }
}

// Initialize context
var mlContext = new MLContext(seed: 42);

// Load data from a CSV file
IDataView dataView = mlContext.Data.LoadFromTextFile<SentimentData>(
    path: "sentiment_data.csv",
    separatorChar: ',',
    hasHeader: true,
    allowQuoting: true);

// Split into 80% train and 20% test
var splitData = mlContext.Data.TrainTestSplit(
    dataView,
    testFraction: 0.2);

IDataView trainData = splitData.TrainSet;
IDataView testData = splitData.TestSet;

// trainData and testData are ready for pipeline construction

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment