Implementation:Apache Spark WriteAheadLog

Knowledge Sources	Apache_Spark
Domains	Streaming, Fault_Tolerance
Last Updated	2026-02-08 22:00 GMT

Overview

Abstract base class defining the write-ahead log (WAL) interface used by Spark Streaming for reliable storage of received data and metadata.

Description

⚠️ DEPRECATED: DStream-based Spark Streaming is deprecated. Use Structured Streaming for new applications. See Heuristic:Apache_Spark_Warning_Deprecated_DStream_Streaming.

WriteAheadLog is a Java abstract class annotated with `@DeveloperApi` that defines the contract for a journal (write-ahead log) used by Spark Streaming receivers. It provides methods to write records with time-indexed handles, read individual records or all unprocessed records, clean old entries, and close the log. Implementations must ensure that written data is durable and readable by the time `write()` returns. Users can plug in custom implementations for alternative storage backends beyond the default file-based implementation.

Usage

Extend this class when you need a custom write-ahead log backend for Spark Streaming (e.g., writing to a database or distributed storage system instead of HDFS). Configure via `spark.streaming.driver.writeAheadLog.class` and `spark.streaming.receiver.writeAheadLog.class`.

Code Reference

Source Location

Repository: Apache_Spark
File: streaming/.../WriteAheadLog.java
Lines: 1-62

Signature

@DeveloperApi
public abstract class WriteAheadLog {

    public abstract WriteAheadLogRecordHandle write(ByteBuffer record, long time);

    public abstract ByteBuffer read(WriteAheadLogRecordHandle handle);

    public abstract Iterator<ByteBuffer> readAll();

    public abstract void clean(long threshTime, boolean waitForCompletion);

    public abstract void close();
}

Import

import org.apache.spark.streaming.util.WriteAheadLog;
import org.apache.spark.streaming.util.WriteAheadLogRecordHandle;

I/O Contract

Inputs

Name	Type	Required	Description
record	ByteBuffer	Yes (write)	Binary data to persist
time	long	Yes (write)	Timestamp index for the record
handle	WriteAheadLogRecordHandle	Yes (read)	Handle referencing a previously written record
threshTime	long	Yes (clean)	Clean records older than this timestamp
waitForCompletion	boolean	Yes (clean)	Whether to block until cleanup finishes

Outputs

Name	Type	Description
WriteAheadLogRecordHandle	Handle	Returned by write(); contains info to read back the record
ByteBuffer	Data	Returned by read(); the previously written record
Iterator of ByteBuffer	Data sequence	Returned by readAll(); all unprocessed records

Usage Examples

Custom WriteAheadLog Implementation

import org.apache.spark.streaming.util.WriteAheadLog;
import org.apache.spark.streaming.util.WriteAheadLogRecordHandle;
import java.nio.ByteBuffer;
import java.util.Iterator;

public class MyCustomWAL extends WriteAheadLog {

    @Override
    public WriteAheadLogRecordHandle write(ByteBuffer record, long time) {
        // Persist record to custom storage
        // Return a handle that can be used to read it back
        return new MyRecordHandle(/* storage reference */);
    }

    @Override
    public ByteBuffer read(WriteAheadLogRecordHandle handle) {
        // Read record using handle
        return ByteBuffer.wrap(/* data from storage */);
    }

    @Override
    public Iterator<ByteBuffer> readAll() {
        // Return iterator over all unprocessed records
        return myStorage.getAllRecords().iterator();
    }

    @Override
    public void clean(long threshTime, boolean waitForCompletion) {
        // Delete records older than threshTime
    }

    @Override
    public void close() {
        // Release resources
    }
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment