Implementation:Dotnet Machinelearning LoadFromTextFile and TrainTestSplit
| Knowledge Sources | |
|---|---|
| Domains | Machine Learning, Data Engineering, .NET |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tools for loading delimited text files into a lazy data view and splitting data into training and test partitions, provided by ML.NET.
Description
LoadFromTextFile<TInput> reads a delimited text file and maps its columns to the properties of a user-defined class TInput using column attributes. The method returns an IDataView, which is a lazy, cursorable representation of the data. No rows are materialized until a downstream consumer iterates over them.
TrainTestSplit takes an IDataView and produces a TrainTestData object containing two IDataView properties: TrainSet and TestSet. The split is performed randomly according to the specified test fraction. An optional samplingKeyColumnName enables stratified splitting to preserve class distributions.
Usage
Use LoadFromTextFile when source data is in CSV, TSV, or other delimited text format. Use TrainTestSplit immediately after loading to establish an evaluation protocol before any data leakage can occur through feature engineering.
Code Reference
Source Location
- Repository: ML.NET
- File:
src/Microsoft.ML.Data/DataLoadSave/Text/TextLoaderSaverCatalog.cs:L160-265 - File:
src/Microsoft.ML.Data/DataLoadSave/DataOperationsCatalog.cs:L411-438
Signature
// Load from text file with schema inferred from TInput
public static IDataView LoadFromTextFile<TInput>(
this DataOperationsCatalog catalog,
string path,
char separatorChar = '\t',
bool hasHeader = false,
bool allowQuoting = false,
bool trimWhitespace = false,
bool allowSparse = false)
// Split into train and test sets
public TrainTestData TrainTestSplit(
IDataView data,
double testFraction = 0.1,
string samplingKeyColumnName = null,
int? seed = null)
Import
using Microsoft.ML;
I/O Contract
Inputs
LoadFromTextFile:
| Name | Type | Required | Description |
|---|---|---|---|
| path | string | Yes | File path to the delimited text file (CSV, TSV, etc.). |
| separatorChar | char | No | Column delimiter character. Default: '\t' (tab). |
| hasHeader | bool | No | Whether the first line is a header row. Default: false. |
| allowQuoting | bool | No | Whether fields may be quoted. Default: false. |
| trimWhitespace | bool | No | Whether to trim whitespace from values. Default: false. |
| allowSparse | bool | No | Whether to allow sparse format. Default: false. |
TrainTestSplit:
| Name | Type | Required | Description |
|---|---|---|---|
| data | IDataView | Yes | The dataset to split. |
| testFraction | double | No | Fraction of data for the test set. Default: 0.1. |
| samplingKeyColumnName | string | No | Column name for stratified splitting. Default: null. |
| seed | int? | No | Random seed for reproducibility. Default: null. |
Outputs
LoadFromTextFile:
| Name | Type | Description |
|---|---|---|
| (return) | IDataView | Lazy, cursorable data view with schema inferred from TInput. |
TrainTestSplit:
| Name | Type | Description |
|---|---|---|
| (return) | TrainTestData | Object containing TrainSet (IDataView) and TestSet (IDataView). |
Usage Examples
Basic Example
using Microsoft.ML;
using Microsoft.ML.Data;
// Define the input data schema
public class SentimentData
{
[LoadColumn(0)]
public string SentimentText { get; set; }
[LoadColumn(1), ColumnName("Label")]
public bool Sentiment { get; set; }
}
// Initialize context
var mlContext = new MLContext(seed: 42);
// Load data from a CSV file
IDataView dataView = mlContext.Data.LoadFromTextFile<SentimentData>(
path: "sentiment_data.csv",
separatorChar: ',',
hasHeader: true,
allowQuoting: true);
// Split into 80% train and 20% test
var splitData = mlContext.Data.TrainTestSplit(
dataView,
testFraction: 0.2);
IDataView trainData = splitData.TrainSet;
IDataView testData = splitData.TestSet;
// trainData and testData are ready for pipeline construction