Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Apache Flink PendingSplitsCheckpoint FromCollectionSnapshot

From Leeroopedia
Revision as of 14:17, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Apache_Flink_PendingSplitsCheckpoint_FromCollectionSnapshot.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Fault_Tolerance, Stream_Processing
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for creating checkpoint snapshots of pending file source splits and discovered paths provided by the Apache Flink connector-files module.

Description

The PendingSplitsCheckpoint class captures the enumerators state at checkpoint time. The fromCollectionSnapshot factory method creates an immutable snapshot from the current collection of unassigned splits and optionally the set of already-processed paths (for continuous mode). The collections are defensively copied to ensure snapshot isolation. On recovery, the enumerator restores from this checkpoint to re-create its split pool.

Usage

This is an internal class used by StaticFileSplitEnumerator and ContinuousFileSplitEnumerator during checkpointing.

Code Reference

Source Location

  • Repository: Apache Flink
  • File: flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/PendingSplitsCheckpoint.java
  • Lines: L37-112

Signature

@PublicEvolving
public class PendingSplitsCheckpoint<SplitT extends FileSourceSplit> {

    protected PendingSplitsCheckpoint(
            Collection<SplitT> splits, Collection<Path> alreadyProcessedPaths);

    public Collection<SplitT> getSplits();
    public Collection<Path> getAlreadyProcessedPaths();

    public static <T extends FileSourceSplit> PendingSplitsCheckpoint<T> fromCollectionSnapshot(
            final Collection<T> splits);

    public static <T extends FileSourceSplit> PendingSplitsCheckpoint<T> fromCollectionSnapshot(
            final Collection<T> splits, final Collection<Path> alreadyProcessedPaths);
}

Import

import org.apache.flink.connector.file.src.PendingSplitsCheckpoint;

I/O Contract

Inputs

Name Type Required Description
splits Collection<SplitT> Yes Unassigned splits to checkpoint
alreadyProcessedPaths Collection<Path> No Paths already discovered (continuous mode)

Outputs

Name Type Description
checkpoint PendingSplitsCheckpoint<SplitT> Immutable snapshot for state persistence

Usage Examples

Checkpoint Creation

// In StaticFileSplitEnumerator.snapshotState():
Collection<FileSourceSplit> remainingSplits = splitAssigner.remainingSplits();
return PendingSplitsCheckpoint.fromCollectionSnapshot(remainingSplits);

// In ContinuousFileSplitEnumerator.snapshotState():
Collection<FileSourceSplit> remainingSplits = splitAssigner.remainingSplits();
Collection<Path> discoveredPaths = alreadyDiscoveredPaths;
return PendingSplitsCheckpoint.fromCollectionSnapshot(remainingSplits, discoveredPaths);

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment