Implementation:Apache Hudi StreamerUtil ValidateWriteStatus
| Knowledge Sources | |
|---|---|
| Domains | Data_Lake, Stream_Processing |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for validating write operation results and checking commit existence on the Hudi timeline provided by Apache Hudi.
Description
StreamerUtil.validateWriteStatus() inspects a list of WriteStatus objects returned from a Hudi write operation and, when fail-fast mode is enabled, immediately throws a HoodieException if any write errors are found. This early detection prevents data loss by failing the Flink checkpoint before the data buffer is cleared.
StreamerUtil.haveSuccessfulCommits() queries the Hudi active timeline to check whether any completed commit instants exist, providing a simple health check for the table.
Together, these methods form the verification layer of the streaming write pipeline:
validateWriteStatus()is called after each bucket flush within theStreamWriteFunction(or append write function) to validate write outcomes before the write metadata event is sent to the coordinator.haveSuccessfulCommits()is used during startup and recovery to determine whether the table has committed data.
The validation logic:
- Checks whether
WRITE_FAIL_FASTis enabled in the configuration. - If enabled, streams through the
WriteStatuslist looking for any status with non-empty errors. - On finding the first error, extracts the
HoodieKeyand the associatedThrowablefrom the errors map. - Throws a
HoodieExceptionwith the key, instant time, and root cause, which causes the Flink task to fail and triggers checkpoint failure.
Usage
Use validateWriteStatus() after every write operation in the streaming pipeline to detect failures early. Use haveSuccessfulCommits() at startup or during recovery to check table state.
Code Reference
Source Location
- Repository: Apache Hudi
- File:
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/StreamerUtil.java - Lines: 719-731 (validateWriteStatus), 614-616 (haveSuccessfulCommits)
Signature
public static void validateWriteStatus(
Configuration config,
String currentInstant,
List<WriteStatus> writeStatusList) throws HoodieException
public static boolean haveSuccessfulCommits(HoodieTableMetaClient metaClient)
Import
import org.apache.hudi.util.StreamerUtil;
import org.apache.hudi.client.WriteStatus;
import org.apache.hudi.common.table.HoodieTableMetaClient;
import org.apache.hudi.exception.HoodieException;
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | Configuration | Yes | Flink Configuration; must contain WRITE_FAIL_FAST boolean option |
| currentInstant | String | Yes | The current Hudi instant time string (e.g., "20260208120000000") |
| writeStatusList | List<WriteStatus> | Yes | List of WriteStatus objects returned from the write client after flushing a batch of records |
| metaClient (for haveSuccessfulCommits) | HoodieTableMetaClient | Yes | Meta client for the Hudi table to check timeline |
Outputs
| Name | Type | Description |
|---|---|---|
| (exception) | HoodieException | Thrown if WRITE_FAIL_FAST is true and any WriteStatus contains errors; includes the failing HoodieKey and root cause |
| return (haveSuccessfulCommits) | boolean | True if the table's commits timeline contains at least one completed instant; false otherwise |
Usage Examples
// Validate write statuses after flushing a bucket
Configuration conf = new Configuration();
conf.set(FlinkOptions.WRITE_FAIL_FAST, true);
List<WriteStatus> statuses = writeClient.upsert(records, bucketInfo, instant);
// This will throw HoodieException immediately if any record failed to write
StreamerUtil.validateWriteStatus(conf, instant, statuses);
// Check if the table has any committed data
HoodieTableMetaClient metaClient = StreamerUtil.createMetaClient(conf);
boolean hasCommits = StreamerUtil.haveSuccessfulCommits(metaClient);
if (!hasCommits) {
log.info("Table has no committed data yet; this is the first write.");
}