Overview
The Read interface in Smile provides static methods for loading data from files into DataFrames. It is the primary entry point for all file-based data ingestion in the Smile library. The interface supports CSV, JSON, Parquet, Arrow (Feather), Avro, ARFF, SAS, and libsvm formats. Format detection is automatic based on file extension, with optional manual override.
API Summary
| Method |
Return Type |
Description
|
Read.data(String path) |
DataFrame |
Auto-detect format from extension and read
|
Read.data(String path, String format) |
DataFrame |
Read with explicit format specification
|
Read.csv(String path) |
DataFrame |
Read CSV with default format
|
Read.csv(String path, String format) |
DataFrame |
Read CSV with format string (e.g., "delimiter=\t,header=true")
|
Read.csv(String path, CSVFormat format) |
DataFrame |
Read CSV with Apache Commons CSVFormat
|
Read.csv(String path, CSVFormat format, StructType schema) |
DataFrame |
Read CSV with explicit schema
|
Read.json(String path) |
DataFrame |
Read JSON (single-line mode)
|
Read.json(String path, JSON.Mode mode, StructType schema) |
DataFrame |
Read JSON with mode and schema
|
Read.parquet(String path) |
DataFrame |
Read Apache Parquet file
|
Read.arrow(String path) |
DataFrame |
Read Apache Arrow / Feather file
|
Read.arff(String path) |
DataFrame |
Read Weka ARFF file
|
Read.sas(String path) |
DataFrame |
Read SAS7BDAT file
|
Read.avro(String path, String schema) |
DataFrame |
Read Apache Avro with schema file path
|
Read.object(Path path) |
Object |
Deserialize a Java object from file
|
Read.libsvm(String path) |
SparseDataset<Integer> |
Read libsvm sparse format
|
Source Location
| Property |
Value
|
| File |
base/src/main/java/smile/io/Read.java
|
| Lines |
L44-590
|
| Package |
smile.io
|
| Repository |
github.com/haifengl/smile
|
Import
import smile.io.Read;
import smile.data.DataFrame;
import smile.data.type.StructType;
import org.apache.commons.csv.CSVFormat;
External Dependencies
| Dependency |
Usage
|
| Apache Commons CSV |
Parsing CSV/TSV files with configurable delimiters, quotes, and headers
|
| Apache Parquet |
Reading columnar Parquet files
|
| Apache Arrow |
Reading Arrow IPC / Feather files
|
| Apache Avro |
Reading Avro serialized files with external schema
|
Type: API Doc
Signature
public interface Read {
// Auto-detection by file extension
static DataFrame data(String path) throws Exception
static DataFrame data(String path, String format) throws Exception
// CSV readers
static DataFrame csv(String path) throws IOException, URISyntaxException
static DataFrame csv(String path, String format) throws IOException, URISyntaxException
static DataFrame csv(String path, CSVFormat format) throws IOException, URISyntaxException
static DataFrame csv(String path, CSVFormat format, StructType schema) throws IOException, URISyntaxException
static DataFrame csv(Path path) throws IOException
static DataFrame csv(Path path, CSVFormat format) throws IOException
static DataFrame csv(Path path, CSVFormat format, StructType schema) throws IOException
// JSON readers
static DataFrame json(String path) throws IOException, URISyntaxException
static DataFrame json(String path, JSON.Mode mode, StructType schema) throws IOException, URISyntaxException
static DataFrame json(Path path) throws IOException
static DataFrame json(Path path, JSON.Mode mode, StructType schema) throws IOException
// Binary format readers
static DataFrame parquet(String uri) throws Exception
static DataFrame parquet(Path path) throws Exception
static DataFrame arrow(String path) throws IOException, URISyntaxException
static DataFrame arrow(Path path) throws IOException
static DataFrame arff(String path) throws IOException, ParseException, URISyntaxException
static DataFrame arff(Path path) throws IOException, ParseException
static DataFrame sas(String path) throws IOException, URISyntaxException
static DataFrame sas(Path path) throws IOException
static DataFrame avro(String path, String schema) throws IOException, URISyntaxException
static DataFrame avro(String path, InputStream schema) throws IOException, URISyntaxException
static DataFrame avro(Path path, InputStream schema) throws IOException
static DataFrame avro(Path path, Path schema) throws IOException
// Object deserialization
static Object object(Path path) throws IOException, ClassNotFoundException
// Sparse format
static SparseDataset<Integer> libsvm(String path) throws IOException, URISyntaxException
static SparseDataset<Integer> libsvm(Path path) throws IOException
static SparseDataset<Integer> libsvm(BufferedReader reader) throws IOException
}
Inputs and Outputs
| Parameter |
Type |
Description
|
path |
String or Path |
File path or URI to the data file
|
format |
String, CSVFormat, or JSON.Mode |
Optional format specification
|
schema |
StructType or InputStream |
Optional data schema (column names and types)
|
| Returns |
DataFrame |
Unified in-memory tabular data structure
|
Format Detection Logic
The Read.data() method extracts the file extension and dispatches to the appropriate reader:
// From Read.java -- format detection switch
String ext = path.substring(path.lastIndexOf(".") + 1);
switch (ext) {
case "dat":
case "txt":
case "csv": return csv(path, format);
case "arff": return arff(path);
case "json": return json(path, mode, null);
case "sas7bdat": return sas(path);
case "avro": return avro(path, format);
case "parquet": return parquet(path);
case "feather": return arrow(path);
}
Usage Examples
Example 1: Auto-detect format and load
import smile.io.Read;
import smile.data.DataFrame;
// Auto-detect: .csv extension -> CSV reader
DataFrame iris = Read.data("data/iris.csv");
System.out.println(iris.schema());
System.out.println(iris);
// Auto-detect: .arff extension -> ARFF reader
DataFrame weather = Read.data("data/weather.arff");
// Auto-detect: .parquet extension -> Parquet reader
DataFrame sales = Read.data("data/sales.parquet");
Example 2: CSV with custom format string
import smile.io.Read;
import smile.data.DataFrame;
// Tab-separated file with header row and comment lines
DataFrame data = Read.csv("data/gene_expression.tsv",
"delimiter=\t,header=true,comment=#");
System.out.println("Columns: " + String.join(", ", data.names()));
System.out.println("Rows: " + data.nrow());
Example 3: CSV with Apache Commons CSVFormat and explicit schema
import smile.io.Read;
import smile.data.DataFrame;
import smile.data.type.DataTypes;
import smile.data.type.StructField;
import smile.data.type.StructType;
import org.apache.commons.csv.CSVFormat;
// Define explicit schema
StructType schema = new StructType(
new StructField("sepal_length", DataTypes.DoubleType),
new StructField("sepal_width", DataTypes.DoubleType),
new StructField("petal_length", DataTypes.DoubleType),
new StructField("petal_width", DataTypes.DoubleType),
new StructField("species", DataTypes.StringType)
);
CSVFormat format = CSVFormat.Builder.create()
.setHeader()
.setSkipHeaderRecord(true)
.build();
DataFrame iris = Read.csv("data/iris.csv", format, schema);
System.out.println(iris.head(5));
Example 4: JSON in multi-line mode
import smile.io.Read;
import smile.io.JSON;
import smile.data.DataFrame;
// Single-line JSON (one JSON object per line)
DataFrame logs = Read.json("data/access_logs.json");
// Multi-line JSON (entire file is one JSON array)
DataFrame config = Read.json("data/config.json",
JSON.Mode.MULTI_LINE, null);
Example 5: Reading Avro with external schema
import smile.io.Read;
import smile.data.DataFrame;
// Avro file requires a separate schema file
DataFrame events = Read.avro("data/events.avro",
"data/events.avsc");
System.out.println("Schema: " + events.schema());
System.out.println("Records: " + events.nrow());
Example 6: Loading a libsvm sparse dataset
import smile.io.Read;
import smile.data.SparseDataset;
// libsvm format: <label> <index>:<value> ...
SparseDataset<Integer> dataset = Read.libsvm("data/svmguide1.txt");
System.out.println("Samples: " + dataset.size());
Implementation Details
The Read interface delegates to format-specific reader classes:
| Format |
Reader Class |
Key Implementation Detail
|
| CSV |
smile.io.CSV |
Wraps Apache Commons CSV; infers types by scanning values
|
| JSON |
smile.io.JSON |
Supports single-line (JSON Lines) and multi-line (JSON array) modes
|
| Parquet |
smile.io.Parquet |
Uses Apache Parquet library; reads columnar data natively
|
| Arrow |
smile.io.Arrow |
Uses Apache Arrow IPC reader; zero-copy when possible
|
| ARFF |
smile.io.Arff |
Parses Weka ARFF header for attribute types; supports nominal, numeric, string, date
|
| SAS |
smile.io.SAS |
Reads SAS7BDAT binary format
|
| Avro |
smile.io.Avro |
Requires external Avro schema (JSON format)
|
All readers accept both String (URI/path) and java.nio.file.Path overloads. The String variants support classpath resources and URIs via the internal Input utility class.
Related Pages
Metadata
| Property |
Value
|
| Type |
API Doc
|
| Language |
Java
|
| Library Version |
5.2.0
|
| Last Updated |
2026-02-08 22:00 GMT
|