Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Haifengl Smile Read Data

From Leeroopedia


Overview

The Read interface in Smile provides static methods for loading data from files into DataFrames. It is the primary entry point for all file-based data ingestion in the Smile library. The interface supports CSV, JSON, Parquet, Arrow (Feather), Avro, ARFF, SAS, and libsvm formats. Format detection is automatic based on file extension, with optional manual override.

API Summary

Method Return Type Description
Read.data(String path) DataFrame Auto-detect format from extension and read
Read.data(String path, String format) DataFrame Read with explicit format specification
Read.csv(String path) DataFrame Read CSV with default format
Read.csv(String path, String format) DataFrame Read CSV with format string (e.g., "delimiter=\t,header=true")
Read.csv(String path, CSVFormat format) DataFrame Read CSV with Apache Commons CSVFormat
Read.csv(String path, CSVFormat format, StructType schema) DataFrame Read CSV with explicit schema
Read.json(String path) DataFrame Read JSON (single-line mode)
Read.json(String path, JSON.Mode mode, StructType schema) DataFrame Read JSON with mode and schema
Read.parquet(String path) DataFrame Read Apache Parquet file
Read.arrow(String path) DataFrame Read Apache Arrow / Feather file
Read.arff(String path) DataFrame Read Weka ARFF file
Read.sas(String path) DataFrame Read SAS7BDAT file
Read.avro(String path, String schema) DataFrame Read Apache Avro with schema file path
Read.object(Path path) Object Deserialize a Java object from file
Read.libsvm(String path) SparseDataset<Integer> Read libsvm sparse format

Source Location

Property Value
File base/src/main/java/smile/io/Read.java
Lines L44-590
Package smile.io
Repository github.com/haifengl/smile

Import

import smile.io.Read;
import smile.data.DataFrame;
import smile.data.type.StructType;
import org.apache.commons.csv.CSVFormat;

External Dependencies

Dependency Usage
Apache Commons CSV Parsing CSV/TSV files with configurable delimiters, quotes, and headers
Apache Parquet Reading columnar Parquet files
Apache Arrow Reading Arrow IPC / Feather files
Apache Avro Reading Avro serialized files with external schema

Type: API Doc

Signature

public interface Read {
    // Auto-detection by file extension
    static DataFrame data(String path) throws Exception
    static DataFrame data(String path, String format) throws Exception

    // CSV readers
    static DataFrame csv(String path) throws IOException, URISyntaxException
    static DataFrame csv(String path, String format) throws IOException, URISyntaxException
    static DataFrame csv(String path, CSVFormat format) throws IOException, URISyntaxException
    static DataFrame csv(String path, CSVFormat format, StructType schema) throws IOException, URISyntaxException
    static DataFrame csv(Path path) throws IOException
    static DataFrame csv(Path path, CSVFormat format) throws IOException
    static DataFrame csv(Path path, CSVFormat format, StructType schema) throws IOException

    // JSON readers
    static DataFrame json(String path) throws IOException, URISyntaxException
    static DataFrame json(String path, JSON.Mode mode, StructType schema) throws IOException, URISyntaxException
    static DataFrame json(Path path) throws IOException
    static DataFrame json(Path path, JSON.Mode mode, StructType schema) throws IOException

    // Binary format readers
    static DataFrame parquet(String uri) throws Exception
    static DataFrame parquet(Path path) throws Exception
    static DataFrame arrow(String path) throws IOException, URISyntaxException
    static DataFrame arrow(Path path) throws IOException
    static DataFrame arff(String path) throws IOException, ParseException, URISyntaxException
    static DataFrame arff(Path path) throws IOException, ParseException
    static DataFrame sas(String path) throws IOException, URISyntaxException
    static DataFrame sas(Path path) throws IOException
    static DataFrame avro(String path, String schema) throws IOException, URISyntaxException
    static DataFrame avro(String path, InputStream schema) throws IOException, URISyntaxException
    static DataFrame avro(Path path, InputStream schema) throws IOException
    static DataFrame avro(Path path, Path schema) throws IOException

    // Object deserialization
    static Object object(Path path) throws IOException, ClassNotFoundException

    // Sparse format
    static SparseDataset<Integer> libsvm(String path) throws IOException, URISyntaxException
    static SparseDataset<Integer> libsvm(Path path) throws IOException
    static SparseDataset<Integer> libsvm(BufferedReader reader) throws IOException
}

Inputs and Outputs

Parameter Type Description
path String or Path File path or URI to the data file
format String, CSVFormat, or JSON.Mode Optional format specification
schema StructType or InputStream Optional data schema (column names and types)
Returns DataFrame Unified in-memory tabular data structure

Format Detection Logic

The Read.data() method extracts the file extension and dispatches to the appropriate reader:

// From Read.java -- format detection switch
String ext = path.substring(path.lastIndexOf(".") + 1);
switch (ext) {
    case "dat":
    case "txt":
    case "csv": return csv(path, format);
    case "arff": return arff(path);
    case "json": return json(path, mode, null);
    case "sas7bdat": return sas(path);
    case "avro": return avro(path, format);
    case "parquet": return parquet(path);
    case "feather": return arrow(path);
}

Usage Examples

Example 1: Auto-detect format and load

import smile.io.Read;
import smile.data.DataFrame;

// Auto-detect: .csv extension -> CSV reader
DataFrame iris = Read.data("data/iris.csv");
System.out.println(iris.schema());
System.out.println(iris);

// Auto-detect: .arff extension -> ARFF reader
DataFrame weather = Read.data("data/weather.arff");

// Auto-detect: .parquet extension -> Parquet reader
DataFrame sales = Read.data("data/sales.parquet");

Example 2: CSV with custom format string

import smile.io.Read;
import smile.data.DataFrame;

// Tab-separated file with header row and comment lines
DataFrame data = Read.csv("data/gene_expression.tsv",
    "delimiter=\t,header=true,comment=#");

System.out.println("Columns: " + String.join(", ", data.names()));
System.out.println("Rows: " + data.nrow());

Example 3: CSV with Apache Commons CSVFormat and explicit schema

import smile.io.Read;
import smile.data.DataFrame;
import smile.data.type.DataTypes;
import smile.data.type.StructField;
import smile.data.type.StructType;
import org.apache.commons.csv.CSVFormat;

// Define explicit schema
StructType schema = new StructType(
    new StructField("sepal_length", DataTypes.DoubleType),
    new StructField("sepal_width", DataTypes.DoubleType),
    new StructField("petal_length", DataTypes.DoubleType),
    new StructField("petal_width", DataTypes.DoubleType),
    new StructField("species", DataTypes.StringType)
);

CSVFormat format = CSVFormat.Builder.create()
    .setHeader()
    .setSkipHeaderRecord(true)
    .build();

DataFrame iris = Read.csv("data/iris.csv", format, schema);
System.out.println(iris.head(5));

Example 4: JSON in multi-line mode

import smile.io.Read;
import smile.io.JSON;
import smile.data.DataFrame;

// Single-line JSON (one JSON object per line)
DataFrame logs = Read.json("data/access_logs.json");

// Multi-line JSON (entire file is one JSON array)
DataFrame config = Read.json("data/config.json",
    JSON.Mode.MULTI_LINE, null);

Example 5: Reading Avro with external schema

import smile.io.Read;
import smile.data.DataFrame;

// Avro file requires a separate schema file
DataFrame events = Read.avro("data/events.avro",
    "data/events.avsc");

System.out.println("Schema: " + events.schema());
System.out.println("Records: " + events.nrow());

Example 6: Loading a libsvm sparse dataset

import smile.io.Read;
import smile.data.SparseDataset;

// libsvm format: <label> <index>:<value> ...
SparseDataset<Integer> dataset = Read.libsvm("data/svmguide1.txt");
System.out.println("Samples: " + dataset.size());

Implementation Details

The Read interface delegates to format-specific reader classes:

Format Reader Class Key Implementation Detail
CSV smile.io.CSV Wraps Apache Commons CSV; infers types by scanning values
JSON smile.io.JSON Supports single-line (JSON Lines) and multi-line (JSON array) modes
Parquet smile.io.Parquet Uses Apache Parquet library; reads columnar data natively
Arrow smile.io.Arrow Uses Apache Arrow IPC reader; zero-copy when possible
ARFF smile.io.Arff Parses Weka ARFF header for attribute types; supports nominal, numeric, string, date
SAS smile.io.SAS Reads SAS7BDAT binary format
Avro smile.io.Avro Requires external Avro schema (JSON format)

All readers accept both String (URI/path) and java.nio.file.Path overloads. The String variants support classpath resources and URIs via the internal Input utility class.

Related Pages

Metadata

Property Value
Type API Doc
Language Java
Library Version 5.2.0
Last Updated 2026-02-08 22:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment