Heuristic:Datahub project Datahub Validation Cross API
| Knowledge Sources | |
|---|---|
| Domains | Architecture, Validation |
| Last Updated | 2026-02-09 17:00 GMT |
Overview
Never add validation in API-specific layers (GraphQL resolvers, REST controllers). Always implement AspectPayloadValidators that work across all APIs.
Description
DataHub exposes metadata through multiple API layers: GraphQL, OpenAPI/REST, and RestLI. A common mistake is adding validation logic in one specific API layer (e.g., a GraphQL resolver), which leaves the other APIs unprotected. The correct pattern is to implement AspectPayloadValidator classes that are invoked at the metadata storage layer, ensuring all APIs benefit from the same validation rules.
Usage
Apply this heuristic whenever implementing new validation logic for metadata aspects. Before writing validation code, ask: "Will this check run regardless of which API the user calls?" If the answer is no, refactor to use the AspectPayloadValidator pattern.
The Insight (Rule of Thumb)
- Action: Implement validators in
metadata-io/src/main/java/com/linkedin/metadata/aspect/validation/ - Pattern: Create a class implementing
AspectPayloadValidator, register it as a Spring bean inSpringStandardPluginConfiguration.java - Reference implementations:
SystemPolicyValidator.java,PolicyFieldTypeValidator.java - Trade-off: Slightly more complex to implement than an inline check in a resolver, but guarantees consistency across all API surfaces.
Reasoning
DataHub users access metadata through different APIs depending on their use case: the UI uses GraphQL, automated pipelines often use REST/OpenAPI, and legacy systems may use RestLI. If validation only exists in GraphQL, a REST client can bypass it entirely. This is a security and data integrity risk. The AspectPayloadValidator pattern hooks into the common metadata storage path, making it impossible to bypass regardless of the entry point.