Principle:Datahub project Datahub Emitter Lifecycle
| Property | Value |
|---|---|
| Principle Name | Emitter_Lifecycle |
| Category | Java_SDK_Metadata_Emission |
| Workflow | Java_SDK_Metadata_Emission |
| Repository | https://github.com/datahub-project/datahub |
| Last Updated | 2026-02-09 17:00 GMT |
Overview
Description
Emitter Lifecycle is the principle of deterministically managing the creation, usage, and disposal of metadata transport connections. Every Emitter implementation in the DataHub Java SDK holds onto external resources -- HTTP connection pools, Kafka producers, file handles, temporary files, or S3 client sessions -- that must be explicitly released when the emitter is no longer needed. The Emitter interface extends java.io.Closeable, establishing a contract that every emitter must implement a close() method that releases all held resources.
Proper lifecycle management prevents resource leaks (unclosed connections, file descriptor exhaustion), ensures data integrity (FileEmitter must write the closing JSON bracket), and triggers deferred operations (S3Emitter uploads the temporary file to S3 during close).
Usage
The emitter lifecycle follows a three-phase pattern:
- Instantiation -- Create the emitter via factory method or constructor, allocating transport resources.
- Emission -- Call
emit()one or more times to send metadata change proposals. - Closure -- Call
close()to release all resources. For FileEmitter, this writes the closing]bracket to produce valid JSON. For S3Emitter, this triggers the S3 upload.
The recommended approach is to use Java's try-with-resources statement, which guarantees that close() is called even if an exception occurs during emission:
try (RestEmitter emitter = RestEmitter.createWithDefaults()) {
emitter.emit(mcpw).get();
} // close() called automatically
Theoretical Basis
Emitter Lifecycle draws on the following theoretical foundations:
Resource Management Pattern (Dispose Pattern) -- External resources such as network connections, file handles, and producer instances have finite availability and must be explicitly released. The Java Closeable interface formalizes this pattern by defining a single close() method that the resource holder must implement. The DataHub SDK's decision to have Emitter extend Closeable signals to callers that every emitter holds resources requiring cleanup and enables the use of try-with-resources for automatic disposal.
Try-With-Resources (Automatic Resource Management) -- Java's try-with-resources statement (try (Resource r = ...) { ... }) guarantees that close() is called on the resource when the block exits, whether normally or via exception. This eliminates the risk of forgetting to close the emitter and provides exception-safe cleanup. The pattern is especially important for the FileEmitter and S3Emitter, where failure to call close() results in invalid output (missing closing bracket) or lost data (file never uploaded to S3).
Deterministic Cleanup vs. Garbage Collection -- While Java's garbage collector can reclaim memory, it does not guarantee timely cleanup of external resources. An unclosed CloseableHttpAsyncClient may hold open TCP connections indefinitely, an unclosed KafkaProducer may retain threads and network sockets, and an unclosed BufferedWriter may leave data in unflushed buffers. Explicit close() calls provide deterministic, predictable resource release that does not depend on GC timing.
Close-Triggered Side Effects -- For some emitter implementations, close() performs essential operations beyond simple resource release:
- FileEmitter.close() writes a newline and the closing
]bracket to produce a valid JSON array, then closes theBufferedWriter. Without this step, the output file is syntactically invalid. - S3Emitter.close() first closes the internal FileEmitter (writing the closing bracket), then constructs a
PutObjectRequestand uploads the temporary file to S3 using the configured bucket, path prefix, and optional file name. After a successful upload, it deletes the temporary file. If the upload fails, anIOExceptionis thrown. - KafkaEmitter.close() closes the underlying
KafkaProducer, which flushes any buffered records and releases all threads. - RestEmitter.close() closes the
CloseableHttpAsyncClient, releasing the connection pool and IO reactor threads.
Lifecycle State Tracking -- The FileEmitter tracks its closed state via an AtomicBoolean. After close() is called, subsequent emit() calls return a failure future with the message "File Emitter is already closed" rather than throwing an exception, providing graceful degradation.