Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datahub project Datahub MLModel Entity

From Leeroopedia
Revision as of 14:43, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Datahub_project_Datahub_MLModel_Entity.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Java_SDK, Metadata_Management
Last Updated 2026-02-10 00:00 GMT

Overview

MLModel is a Java SDK V2 entity class representing a DataHub ML Model entity, a machine learning model with support for training metrics, hyperparameters, model groups, training jobs, downstream jobs, and deployments.

Description

The MLModel class extends Entity and implements mixin interfaces (HasTags, HasGlossaryTerms, HasOwners, HasDomains, HasSubTypes, HasStructuredProperties). At 1086 lines, it is one of the largest entity classes, providing comprehensive ML-specific metadata operations.

Key characteristics:

  • Entity type: "mlModel"
  • URN format: urn:li:mlModel:(urn:li:dataPlatform:platform,modelName,FABRIC_TYPE) via MLModelUrn
  • ML-specific operations:
    • Training metrics: addTrainingMetric() (patch-based via accumulated MLModelPropertiesPatchBuilder), setTrainingMetrics() (direct aspect mutation), getTrainingMetrics()
    • Hyperparameters: addHyperParam() (patch-based), setHyperParams() (direct), getHyperParams()
    • Model group: setModelGroup(), getModelGroup() - links the model to an MLModelGroup
    • Training jobs: addTrainingJob(), removeTrainingJob(), getTrainingJobs() - tracks the jobs used to train the model
    • Downstream jobs: addDownstreamJob(), removeDownstreamJob(), getDownstreamJobs() - tracks jobs that consume the model
    • Deployments: addDeployment(), removeDeployment(), getDeployments() - tracks where the model is deployed
  • Patch transformation: Includes a static transformPatchToFullAspect() with optimistic locking via If-Version-Match headers and full JSON Patch application via applyPatchOperations(). Supports ADD/REMOVE on paths including /description, /externalUrl, /customProperties/{key}, and array appends.
  • Builder: Requires platform and name. Environment defaults to "PROD" and must be a valid FabricType value. Optionally accepts displayName, description, externalUrl, trainingMetrics, hyperParams, modelGroup, and customProperties.

Default aspects fetched: Ownership, GlobalTags, GlossaryTerms, Domains, Status, InstitutionalMemory, MLModelProperties, EditableMLModelProperties.

Usage

Use the MLModel entity to track machine learning models in DataHub with rich ML-specific metadata. It is especially valuable for ML platforms that need to record training metrics, hyperparameters, model versioning (via groups), and deployment information. Construct via its Builder and upsert through EntityClient.upsert(model).

Code Reference

Source Location

Signature

public class MLModel extends Entity
    implements HasTags<MLModel>, HasGlossaryTerms<MLModel>, HasOwners<MLModel>,
               HasDomains<MLModel>, HasSubTypes<MLModel>, HasStructuredProperties<MLModel> {

    // Factory
    public static Builder builder();

    // Identity
    public String getEntityType();           // returns "mlModel"
    public MLModelUrn getMLModelUrn();
    public MLModel mutable();

    // ML-specific operations
    public MLModel addTrainingMetric(String name, String value);
    public MLModel setTrainingMetrics(List<MLMetric> metrics);
    public List<MLMetric> getTrainingMetrics();
    public MLModel addHyperParam(String name, String value);
    public MLModel setHyperParams(List<MLHyperParam> params);
    public List<MLHyperParam> getHyperParams();
    public MLModel setModelGroup(String groupUrn);
    public String getModelGroup();
    public MLModel addTrainingJob(String jobUrn);
    public MLModel removeTrainingJob(String jobUrn);
    public List<String> getTrainingJobs();
    public MLModel addDownstreamJob(String jobUrn);
    public MLModel removeDownstreamJob(String jobUrn);
    public List<String> getDownstreamJobs();
    public MLModel addDeployment(String deployment);
    public MLModel removeDeployment(String deployment);
    public List<String> getDeployments();

    // Standard operations
    public MLModel setDisplayName(String name);
    public String getDisplayName();
    public MLModel setDescription(String description);
    public String getDescription();
    public MLModel setExternalUrl(String url);
    public String getExternalUrl();
    public MLModel addCustomProperty(String key, String value);
    public MLModel setCustomProperties(Map<String, String> properties);
    public Map<String, String> getCustomProperties();

    // Server compatibility
    public static MetadataChangeProposal transformPatchToFullAspect(
        MetadataChangeProposal patch, EntityClient entityClient);
}

Import

import datahub.client.v2.entity.MLModel;

I/O Contract

Builder Inputs

Parameter Type Required Description
platform String Yes ML platform (e.g., "tensorflow", "pytorch", "sklearn", "mlflow")
name String Yes Model name/identifier
env String No Environment, defaults to "PROD". Must be a valid FabricType
displayName String No Human-readable display name
description String No Model description
externalUrl String No External URL (e.g., MLflow experiment link)
trainingMetrics List<MLMetric> No Training metrics (accuracy, F1, etc.)
hyperParams List<MLHyperParam> No Hyperparameters (learning rate, etc.)
modelGroup String No Parent model group URN
customProperties Map<String, String> No Custom key-value properties

Outputs

Method Return Type Description
build() MLModel New MLModel entity with MLModelUrn
getTrainingMetrics() List<MLMetric> Training metrics list
getHyperParams() List<MLHyperParam> Hyperparameters list
getDeployments() List<String> Deployment URN strings

Usage Examples

// Create an ML model
MLModel model = MLModel.builder()
    .platform("tensorflow")
    .name("churn_predictor_v3")
    .env("PROD")
    .displayName("Customer Churn Predictor v3")
    .description("XGBoost model for predicting customer churn")
    .build();

// Add ML-specific metadata via fluent API
model.addTrainingMetric("accuracy", "0.94")
     .addTrainingMetric("f1_score", "0.91")
     .addHyperParam("learning_rate", "0.01")
     .addHyperParam("max_depth", "6")
     .addHyperParam("n_estimators", "500");

// Set model group and relationships
model.setModelGroup("urn:li:mlModelGroup:(urn:li:dataPlatform:mlflow,churn_predictor,PROD)");
model.addTrainingJob("urn:li:dataJob:(urn:li:dataFlow:(airflow,ml_pipeline,prod),train_model)");
model.addDeployment("urn:li:dataProcessInstance:serving_endpoint_v3");

// Add standard metadata
model.addTag("production");
model.addOwner("urn:li:corpuser:ml_team", OwnershipType.TECHNICAL_OWNER);
model.setDomain("urn:li:domain:MachineLearning");

// Upsert to DataHub
client.entities().upsert(model);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment