Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Spark WriteAheadLog

From Leeroopedia


Knowledge Sources
Domains Streaming, Fault_Tolerance
Last Updated 2026-02-08 22:00 GMT

Overview

Abstract base class defining the write-ahead log (WAL) interface used by Spark Streaming for reliable storage of received data and metadata.

Description

⚠️ DEPRECATED: DStream-based Spark Streaming is deprecated. Use Structured Streaming for new applications. See Heuristic:Apache_Spark_Warning_Deprecated_DStream_Streaming.

WriteAheadLog is a Java abstract class annotated with `@DeveloperApi` that defines the contract for a journal (write-ahead log) used by Spark Streaming receivers. It provides methods to write records with time-indexed handles, read individual records or all unprocessed records, clean old entries, and close the log. Implementations must ensure that written data is durable and readable by the time `write()` returns. Users can plug in custom implementations for alternative storage backends beyond the default file-based implementation.

Usage

Extend this class when you need a custom write-ahead log backend for Spark Streaming (e.g., writing to a database or distributed storage system instead of HDFS). Configure via `spark.streaming.driver.writeAheadLog.class` and `spark.streaming.receiver.writeAheadLog.class`.

Code Reference

Source Location

Signature

@DeveloperApi
public abstract class WriteAheadLog {

    public abstract WriteAheadLogRecordHandle write(ByteBuffer record, long time);

    public abstract ByteBuffer read(WriteAheadLogRecordHandle handle);

    public abstract Iterator<ByteBuffer> readAll();

    public abstract void clean(long threshTime, boolean waitForCompletion);

    public abstract void close();
}

Import

import org.apache.spark.streaming.util.WriteAheadLog;
import org.apache.spark.streaming.util.WriteAheadLogRecordHandle;

I/O Contract

Inputs

Name Type Required Description
record ByteBuffer Yes (write) Binary data to persist
time long Yes (write) Timestamp index for the record
handle WriteAheadLogRecordHandle Yes (read) Handle referencing a previously written record
threshTime long Yes (clean) Clean records older than this timestamp
waitForCompletion boolean Yes (clean) Whether to block until cleanup finishes

Outputs

Name Type Description
WriteAheadLogRecordHandle Handle Returned by write(); contains info to read back the record
ByteBuffer Data Returned by read(); the previously written record
Iterator of ByteBuffer Data sequence Returned by readAll(); all unprocessed records

Usage Examples

Custom WriteAheadLog Implementation

import org.apache.spark.streaming.util.WriteAheadLog;
import org.apache.spark.streaming.util.WriteAheadLogRecordHandle;
import java.nio.ByteBuffer;
import java.util.Iterator;

public class MyCustomWAL extends WriteAheadLog {

    @Override
    public WriteAheadLogRecordHandle write(ByteBuffer record, long time) {
        // Persist record to custom storage
        // Return a handle that can be used to read it back
        return new MyRecordHandle(/* storage reference */);
    }

    @Override
    public ByteBuffer read(WriteAheadLogRecordHandle handle) {
        // Read record using handle
        return ByteBuffer.wrap(/* data from storage */);
    }

    @Override
    public Iterator<ByteBuffer> readAll() {
        // Return iterator over all unprocessed records
        return myStorage.getAllRecords().iterator();
    }

    @Override
    public void clean(long threshTime, boolean waitForCompletion) {
        // Delete records older than threshTime
    }

    @Override
    public void close() {
        // Release resources
    }
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment