Heuristic:Datahub project Datahub Warning Deprecated Spark Lineage Legacy
| Knowledge Sources | |
|---|---|
| Domains | Spark_Lineage, Deprecation |
| Last Updated | 2026-02-10 05:00 GMT |
Overview
Deprecation warning for the spark-lineage-legacy module, which has been superseded by the acryl-spark-lineage module using the OpenLineage standard.
Description
The spark-lineage-legacy module (located at metadata-integration/java/spark-lineage-legacy/) contains the original Spark lineage listener and dataset extractor implementations. These components directly parsed Spark logical plans to extract lineage information and emitted metadata to DataHub via custom LineageConsumer implementations.
This module has been fully replaced by the acryl-spark-lineage module, which uses the OpenLineage standard for event capture and provides broader coverage of Spark plan types including Structured Streaming, Delta Lake MERGE operations, and Databricks-specific plans.
Usage
Do NOT use the legacy module for new deployments. It remains in the codebase for backward compatibility with existing installations that have not yet migrated to the OpenLineage-based implementation.
The Insight (Rule of Thumb)
- Action: Migrate from
datahub-spark-lineage(legacy) toacryl-spark-lineage(current). - Value: The new module supports OpenLineage events, Structured Streaming, Delta Lake MERGE, Databricks, and Spark 3.x features.
- Trade-off: Migration requires updating the JAR dependency and Spark listener class configuration; existing lineage data is preserved.
Reasoning
The legacy module was designed for Spark 2.x and has limited plan type coverage. The OpenLineage-based replacement provides:
- Standard event format compatible with other lineage tools
- Support for Spark 3.x plan types
- Databricks and cloud-native integrations
- Active maintenance and new feature development