Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Hudi StreamerUtil ValidateWriteStatus

From Leeroopedia


Knowledge Sources
Domains Data_Lake, Stream_Processing
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for validating write operation results and checking commit existence on the Hudi timeline provided by Apache Hudi.

Description

StreamerUtil.validateWriteStatus() inspects a list of WriteStatus objects returned from a Hudi write operation and, when fail-fast mode is enabled, immediately throws a HoodieException if any write errors are found. This early detection prevents data loss by failing the Flink checkpoint before the data buffer is cleared.

StreamerUtil.haveSuccessfulCommits() queries the Hudi active timeline to check whether any completed commit instants exist, providing a simple health check for the table.

Together, these methods form the verification layer of the streaming write pipeline:

  • validateWriteStatus() is called after each bucket flush within the StreamWriteFunction (or append write function) to validate write outcomes before the write metadata event is sent to the coordinator.
  • haveSuccessfulCommits() is used during startup and recovery to determine whether the table has committed data.

The validation logic:

  1. Checks whether WRITE_FAIL_FAST is enabled in the configuration.
  2. If enabled, streams through the WriteStatus list looking for any status with non-empty errors.
  3. On finding the first error, extracts the HoodieKey and the associated Throwable from the errors map.
  4. Throws a HoodieException with the key, instant time, and root cause, which causes the Flink task to fail and triggers checkpoint failure.

Usage

Use validateWriteStatus() after every write operation in the streaming pipeline to detect failures early. Use haveSuccessfulCommits() at startup or during recovery to check table state.

Code Reference

Source Location

  • Repository: Apache Hudi
  • File: hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/StreamerUtil.java
  • Lines: 719-731 (validateWriteStatus), 614-616 (haveSuccessfulCommits)

Signature

public static void validateWriteStatus(
    Configuration config,
    String currentInstant,
    List<WriteStatus> writeStatusList) throws HoodieException

public static boolean haveSuccessfulCommits(HoodieTableMetaClient metaClient)

Import

import org.apache.hudi.util.StreamerUtil;
import org.apache.hudi.client.WriteStatus;
import org.apache.hudi.common.table.HoodieTableMetaClient;
import org.apache.hudi.exception.HoodieException;

I/O Contract

Inputs

Name Type Required Description
config Configuration Yes Flink Configuration; must contain WRITE_FAIL_FAST boolean option
currentInstant String Yes The current Hudi instant time string (e.g., "20260208120000000")
writeStatusList List<WriteStatus> Yes List of WriteStatus objects returned from the write client after flushing a batch of records
metaClient (for haveSuccessfulCommits) HoodieTableMetaClient Yes Meta client for the Hudi table to check timeline

Outputs

Name Type Description
(exception) HoodieException Thrown if WRITE_FAIL_FAST is true and any WriteStatus contains errors; includes the failing HoodieKey and root cause
return (haveSuccessfulCommits) boolean True if the table's commits timeline contains at least one completed instant; false otherwise

Usage Examples

// Validate write statuses after flushing a bucket
Configuration conf = new Configuration();
conf.set(FlinkOptions.WRITE_FAIL_FAST, true);

List<WriteStatus> statuses = writeClient.upsert(records, bucketInfo, instant);

// This will throw HoodieException immediately if any record failed to write
StreamerUtil.validateWriteStatus(conf, instant, statuses);

// Check if the table has any committed data
HoodieTableMetaClient metaClient = StreamerUtil.createMetaClient(conf);
boolean hasCommits = StreamerUtil.haveSuccessfulCommits(metaClient);
if (!hasCommits) {
    log.info("Table has no committed data yet; this is the first write.");
}

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment