Implementation:Datahub project Datahub MLModelGroup Entity
| Knowledge Sources | |
|---|---|
| Domains | Java_SDK, Metadata_Management |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
MLModelGroup is a Java SDK V2 entity class representing a DataHub ML Model Group entity, a collection of versioned ML models belonging to the same model family, with support for training jobs, downstream jobs, and mode-aware description editing.
Description
The MLModelGroup class extends Entity and implements mixin interfaces (HasTags, HasGlossaryTerms, HasOwners, HasDomains, HasSubTypes, HasStructuredProperties).
Key characteristics:
- Entity type:
"mlModelGroup" - URN format:
urn:li:mlModelGroup:(urn:li:dataPlatform:platform,groupId,FABRIC_TYPE)viaMlModelGroupUrn - Model versioning: Groups related ML models (e.g., different versions of a recommendation model) under a single logical entity.
- Mode-aware descriptions: The
setDescription()method writes to theeditableMLModelGroupPropertiesaspect (for SDK/user mode) usingEditableMLModelGroupPropertiesPatchBuilder. - Training and downstream jobs: Supports managing lists of training job URNs and downstream job URNs. Unlike patch-based operations, these use direct aspect mutation with
UrnArrayandSetMode.IGNORE_NULL. Operations include add, remove, set (replace all), and get for both job types. - Timestamp support: Provides getters for
createdandlastModifiedTimeStampvalues fromMLModelGroupProperties. - Server compatibility: Includes a static
transformPatchToFullAspect()method for backward compatibility with older DataHub servers, convertingeditableMLModelGroupPropertiespatch MCPs to full aspect replacement MCPs via read-modify-write. - Builder: Requires
platformandgroupId. Environment defaults to"PROD". Optionally accepts name, description, externalUrl, customProperties, created/lastModified timestamps, trainingJobs, and downstreamJobs.
Default aspects fetched: Ownership, GlobalTags, GlossaryTerms, Domains, Status, InstitutionalMemory, MLModelGroupProperties, EditableMLModelGroupProperties.
Usage
Use the MLModelGroup entity to represent a family of related ML models in DataHub, providing a parent grouping for versioned MLModel entities. It tracks which jobs train and consume models in this group. Construct via its Builder and upsert through EntityClient.upsert(modelGroup).
Code Reference
Source Location
- Repository: Datahub_project_Datahub
- File: metadata-integration/java/datahub-client/src/main/java/datahub/client/v2/entity/MLModelGroup.java
Signature
public class MLModelGroup extends Entity
implements HasTags<MLModelGroup>, HasGlossaryTerms<MLModelGroup>, HasOwners<MLModelGroup>,
HasDomains<MLModelGroup>, HasSubTypes<MLModelGroup>,
HasStructuredProperties<MLModelGroup> {
// Factory
public static Builder builder();
// Identity
public String getEntityType(); // returns "mlModelGroup"
public MlModelGroupUrn getMlModelGroupUrn();
public MLModelGroup mutable();
// Description (mode-aware)
public MLModelGroup setDescription(String description);
public String getDescription();
// Read-only properties
public String getName();
public String getExternalUrl();
public Map<String, String> getCustomProperties();
public TimeStamp getCreated();
public TimeStamp getLastModified();
// Training jobs
public MLModelGroup setTrainingJobs(List<String> trainingJobUrns);
public MLModelGroup addTrainingJob(String trainingJobUrn);
public MLModelGroup removeTrainingJob(String trainingJobUrn);
public List<String> getTrainingJobs();
// Downstream jobs
public MLModelGroup setDownstreamJobs(List<String> downstreamJobUrns);
public MLModelGroup addDownstreamJob(String downstreamJobUrn);
public MLModelGroup removeDownstreamJob(String downstreamJobUrn);
public List<String> getDownstreamJobs();
// Server compatibility
public static MetadataChangeProposal transformPatchToFullAspect(
MetadataChangeProposal patch, EntityClient client);
}
Import
import datahub.client.v2.entity.MLModelGroup;
I/O Contract
Builder Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
platform |
String |
Yes | ML platform (e.g., "mlflow", "sagemaker", "vertexai") |
groupId |
String |
Yes | Model group identifier |
env |
String |
No | Environment, defaults to "PROD" |
name |
String |
No | Display name |
description |
String |
No | Group description |
externalUrl |
String |
No | External URL |
customProperties |
Map<String, String> |
No | Custom key-value properties |
created |
TimeStamp |
No | Created timestamp |
lastModified |
TimeStamp |
No | Last modified timestamp |
trainingJobs |
List<String> |
No | Training job URNs |
downstreamJobs |
List<String> |
No | Downstream job URNs |
Outputs
| Method | Return Type | Description |
|---|---|---|
build() |
MLModelGroup |
New MLModelGroup entity with MlModelGroupUrn
|
getTrainingJobs() |
List<String> |
Training job URN strings |
getDownstreamJobs() |
List<String> |
Downstream job URN strings |
getCreated() |
TimeStamp |
Creation timestamp |
Usage Examples
// Create an ML model group
MLModelGroup group = MLModelGroup.builder()
.platform("mlflow")
.groupId("recommender-model")
.env("PROD")
.name("Recommender Model Family")
.description("Customer product recommendation models")
.build();
// Manage training and downstream jobs
group.addTrainingJob("urn:li:dataProcessInstance:training_run_001");
group.addTrainingJob("urn:li:dataProcessInstance:training_run_002");
group.addDownstreamJob("urn:li:dataJob:(urn:li:dataFlow:(airflow,inference,prod),serve_model)");
// Add metadata
group.addTag("recommendation");
group.addOwner("urn:li:corpuser:ml_team", OwnershipType.TECHNICAL_OWNER);
group.setDescription("Family of product recommendation models based on collaborative filtering");
group.setDomain("urn:li:domain:MachineLearning");
// Upsert to DataHub
client.entities().upsert(group);
// Read from server
MLModelGroup fetched = client.entities().get(
"urn:li:mlModelGroup:(urn:li:dataPlatform:mlflow,recommender-model,PROD)",
MLModelGroup.class);
List<String> jobs = fetched.getTrainingJobs();