Implementation:Datahub project Datahub MLModel Entity
| Knowledge Sources | |
|---|---|
| Domains | Java_SDK, Metadata_Management |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
MLModel is a Java SDK V2 entity class representing a DataHub ML Model entity, a machine learning model with support for training metrics, hyperparameters, model groups, training jobs, downstream jobs, and deployments.
Description
The MLModel class extends Entity and implements mixin interfaces (HasTags, HasGlossaryTerms, HasOwners, HasDomains, HasSubTypes, HasStructuredProperties). At 1086 lines, it is one of the largest entity classes, providing comprehensive ML-specific metadata operations.
Key characteristics:
- Entity type:
"mlModel" - URN format:
urn:li:mlModel:(urn:li:dataPlatform:platform,modelName,FABRIC_TYPE)viaMLModelUrn - ML-specific operations:
- Training metrics:
addTrainingMetric()(patch-based via accumulatedMLModelPropertiesPatchBuilder),setTrainingMetrics()(direct aspect mutation),getTrainingMetrics() - Hyperparameters:
addHyperParam()(patch-based),setHyperParams()(direct),getHyperParams() - Model group:
setModelGroup(),getModelGroup()- links the model to anMLModelGroup - Training jobs:
addTrainingJob(),removeTrainingJob(),getTrainingJobs()- tracks the jobs used to train the model - Downstream jobs:
addDownstreamJob(),removeDownstreamJob(),getDownstreamJobs()- tracks jobs that consume the model - Deployments:
addDeployment(),removeDeployment(),getDeployments()- tracks where the model is deployed
- Training metrics:
- Patch transformation: Includes a static
transformPatchToFullAspect()with optimistic locking viaIf-Version-Matchheaders and full JSON Patch application viaapplyPatchOperations(). Supports ADD/REMOVE on paths including/description,/externalUrl,/customProperties/{key}, and array appends. - Builder: Requires
platformandname. Environment defaults to"PROD"and must be a validFabricTypevalue. Optionally accepts displayName, description, externalUrl, trainingMetrics, hyperParams, modelGroup, and customProperties.
Default aspects fetched: Ownership, GlobalTags, GlossaryTerms, Domains, Status, InstitutionalMemory, MLModelProperties, EditableMLModelProperties.
Usage
Use the MLModel entity to track machine learning models in DataHub with rich ML-specific metadata. It is especially valuable for ML platforms that need to record training metrics, hyperparameters, model versioning (via groups), and deployment information. Construct via its Builder and upsert through EntityClient.upsert(model).
Code Reference
Source Location
- Repository: Datahub_project_Datahub
- File: metadata-integration/java/datahub-client/src/main/java/datahub/client/v2/entity/MLModel.java
Signature
public class MLModel extends Entity
implements HasTags<MLModel>, HasGlossaryTerms<MLModel>, HasOwners<MLModel>,
HasDomains<MLModel>, HasSubTypes<MLModel>, HasStructuredProperties<MLModel> {
// Factory
public static Builder builder();
// Identity
public String getEntityType(); // returns "mlModel"
public MLModelUrn getMLModelUrn();
public MLModel mutable();
// ML-specific operations
public MLModel addTrainingMetric(String name, String value);
public MLModel setTrainingMetrics(List<MLMetric> metrics);
public List<MLMetric> getTrainingMetrics();
public MLModel addHyperParam(String name, String value);
public MLModel setHyperParams(List<MLHyperParam> params);
public List<MLHyperParam> getHyperParams();
public MLModel setModelGroup(String groupUrn);
public String getModelGroup();
public MLModel addTrainingJob(String jobUrn);
public MLModel removeTrainingJob(String jobUrn);
public List<String> getTrainingJobs();
public MLModel addDownstreamJob(String jobUrn);
public MLModel removeDownstreamJob(String jobUrn);
public List<String> getDownstreamJobs();
public MLModel addDeployment(String deployment);
public MLModel removeDeployment(String deployment);
public List<String> getDeployments();
// Standard operations
public MLModel setDisplayName(String name);
public String getDisplayName();
public MLModel setDescription(String description);
public String getDescription();
public MLModel setExternalUrl(String url);
public String getExternalUrl();
public MLModel addCustomProperty(String key, String value);
public MLModel setCustomProperties(Map<String, String> properties);
public Map<String, String> getCustomProperties();
// Server compatibility
public static MetadataChangeProposal transformPatchToFullAspect(
MetadataChangeProposal patch, EntityClient entityClient);
}
Import
import datahub.client.v2.entity.MLModel;
I/O Contract
Builder Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
platform |
String |
Yes | ML platform (e.g., "tensorflow", "pytorch", "sklearn", "mlflow") |
name |
String |
Yes | Model name/identifier |
env |
String |
No | Environment, defaults to "PROD". Must be a valid FabricType |
displayName |
String |
No | Human-readable display name |
description |
String |
No | Model description |
externalUrl |
String |
No | External URL (e.g., MLflow experiment link) |
trainingMetrics |
List<MLMetric> |
No | Training metrics (accuracy, F1, etc.) |
hyperParams |
List<MLHyperParam> |
No | Hyperparameters (learning rate, etc.) |
modelGroup |
String |
No | Parent model group URN |
customProperties |
Map<String, String> |
No | Custom key-value properties |
Outputs
| Method | Return Type | Description |
|---|---|---|
build() |
MLModel |
New MLModel entity with MLModelUrn
|
getTrainingMetrics() |
List<MLMetric> |
Training metrics list |
getHyperParams() |
List<MLHyperParam> |
Hyperparameters list |
getDeployments() |
List<String> |
Deployment URN strings |
Usage Examples
// Create an ML model
MLModel model = MLModel.builder()
.platform("tensorflow")
.name("churn_predictor_v3")
.env("PROD")
.displayName("Customer Churn Predictor v3")
.description("XGBoost model for predicting customer churn")
.build();
// Add ML-specific metadata via fluent API
model.addTrainingMetric("accuracy", "0.94")
.addTrainingMetric("f1_score", "0.91")
.addHyperParam("learning_rate", "0.01")
.addHyperParam("max_depth", "6")
.addHyperParam("n_estimators", "500");
// Set model group and relationships
model.setModelGroup("urn:li:mlModelGroup:(urn:li:dataPlatform:mlflow,churn_predictor,PROD)");
model.addTrainingJob("urn:li:dataJob:(urn:li:dataFlow:(airflow,ml_pipeline,prod),train_model)");
model.addDeployment("urn:li:dataProcessInstance:serving_endpoint_v3");
// Add standard metadata
model.addTag("production");
model.addOwner("urn:li:corpuser:ml_team", OwnershipType.TECHNICAL_OWNER);
model.setDomain("urn:li:domain:MachineLearning");
// Upsert to DataHub
client.entities().upsert(model);