Implementation:Apache Spark WriteAheadLog
| Knowledge Sources | |
|---|---|
| Domains | Streaming, Fault_Tolerance |
| Last Updated | 2026-02-08 22:00 GMT |
Overview
Abstract base class defining the write-ahead log (WAL) interface used by Spark Streaming for reliable storage of received data and metadata.
Description
⚠️ DEPRECATED: DStream-based Spark Streaming is deprecated. Use Structured Streaming for new applications. See Heuristic:Apache_Spark_Warning_Deprecated_DStream_Streaming.
WriteAheadLog is a Java abstract class annotated with `@DeveloperApi` that defines the contract for a journal (write-ahead log) used by Spark Streaming receivers. It provides methods to write records with time-indexed handles, read individual records or all unprocessed records, clean old entries, and close the log. Implementations must ensure that written data is durable and readable by the time `write()` returns. Users can plug in custom implementations for alternative storage backends beyond the default file-based implementation.
Usage
Extend this class when you need a custom write-ahead log backend for Spark Streaming (e.g., writing to a database or distributed storage system instead of HDFS). Configure via `spark.streaming.driver.writeAheadLog.class` and `spark.streaming.receiver.writeAheadLog.class`.
Code Reference
Source Location
- Repository: Apache_Spark
- File: streaming/.../WriteAheadLog.java
- Lines: 1-62
Signature
@DeveloperApi
public abstract class WriteAheadLog {
public abstract WriteAheadLogRecordHandle write(ByteBuffer record, long time);
public abstract ByteBuffer read(WriteAheadLogRecordHandle handle);
public abstract Iterator<ByteBuffer> readAll();
public abstract void clean(long threshTime, boolean waitForCompletion);
public abstract void close();
}
Import
import org.apache.spark.streaming.util.WriteAheadLog;
import org.apache.spark.streaming.util.WriteAheadLogRecordHandle;
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| record | ByteBuffer | Yes (write) | Binary data to persist |
| time | long | Yes (write) | Timestamp index for the record |
| handle | WriteAheadLogRecordHandle | Yes (read) | Handle referencing a previously written record |
| threshTime | long | Yes (clean) | Clean records older than this timestamp |
| waitForCompletion | boolean | Yes (clean) | Whether to block until cleanup finishes |
Outputs
| Name | Type | Description |
|---|---|---|
| WriteAheadLogRecordHandle | Handle | Returned by write(); contains info to read back the record |
| ByteBuffer | Data | Returned by read(); the previously written record |
| Iterator of ByteBuffer | Data sequence | Returned by readAll(); all unprocessed records |
Usage Examples
Custom WriteAheadLog Implementation
import org.apache.spark.streaming.util.WriteAheadLog;
import org.apache.spark.streaming.util.WriteAheadLogRecordHandle;
import java.nio.ByteBuffer;
import java.util.Iterator;
public class MyCustomWAL extends WriteAheadLog {
@Override
public WriteAheadLogRecordHandle write(ByteBuffer record, long time) {
// Persist record to custom storage
// Return a handle that can be used to read it back
return new MyRecordHandle(/* storage reference */);
}
@Override
public ByteBuffer read(WriteAheadLogRecordHandle handle) {
// Read record using handle
return ByteBuffer.wrap(/* data from storage */);
}
@Override
public Iterator<ByteBuffer> readAll() {
// Return iterator over all unprocessed records
return myStorage.getAllRecords().iterator();
}
@Override
public void clean(long threshTime, boolean waitForCompletion) {
// Delete records older than threshTime
}
@Override
public void close() {
// Release resources
}
}