Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datahub project Datahub MLModelGroup Entity

From Leeroopedia
Revision as of 14:43, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Datahub_project_Datahub_MLModelGroup_Entity.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Java_SDK, Metadata_Management
Last Updated 2026-02-10 00:00 GMT

Overview

MLModelGroup is a Java SDK V2 entity class representing a DataHub ML Model Group entity, a collection of versioned ML models belonging to the same model family, with support for training jobs, downstream jobs, and mode-aware description editing.

Description

The MLModelGroup class extends Entity and implements mixin interfaces (HasTags, HasGlossaryTerms, HasOwners, HasDomains, HasSubTypes, HasStructuredProperties).

Key characteristics:

  • Entity type: "mlModelGroup"
  • URN format: urn:li:mlModelGroup:(urn:li:dataPlatform:platform,groupId,FABRIC_TYPE) via MlModelGroupUrn
  • Model versioning: Groups related ML models (e.g., different versions of a recommendation model) under a single logical entity.
  • Mode-aware descriptions: The setDescription() method writes to the editableMLModelGroupProperties aspect (for SDK/user mode) using EditableMLModelGroupPropertiesPatchBuilder.
  • Training and downstream jobs: Supports managing lists of training job URNs and downstream job URNs. Unlike patch-based operations, these use direct aspect mutation with UrnArray and SetMode.IGNORE_NULL. Operations include add, remove, set (replace all), and get for both job types.
  • Timestamp support: Provides getters for created and lastModified TimeStamp values from MLModelGroupProperties.
  • Server compatibility: Includes a static transformPatchToFullAspect() method for backward compatibility with older DataHub servers, converting editableMLModelGroupProperties patch MCPs to full aspect replacement MCPs via read-modify-write.
  • Builder: Requires platform and groupId. Environment defaults to "PROD". Optionally accepts name, description, externalUrl, customProperties, created/lastModified timestamps, trainingJobs, and downstreamJobs.

Default aspects fetched: Ownership, GlobalTags, GlossaryTerms, Domains, Status, InstitutionalMemory, MLModelGroupProperties, EditableMLModelGroupProperties.

Usage

Use the MLModelGroup entity to represent a family of related ML models in DataHub, providing a parent grouping for versioned MLModel entities. It tracks which jobs train and consume models in this group. Construct via its Builder and upsert through EntityClient.upsert(modelGroup).

Code Reference

Source Location

Signature

public class MLModelGroup extends Entity
    implements HasTags<MLModelGroup>, HasGlossaryTerms<MLModelGroup>, HasOwners<MLModelGroup>,
               HasDomains<MLModelGroup>, HasSubTypes<MLModelGroup>,
               HasStructuredProperties<MLModelGroup> {

    // Factory
    public static Builder builder();

    // Identity
    public String getEntityType();           // returns "mlModelGroup"
    public MlModelGroupUrn getMlModelGroupUrn();
    public MLModelGroup mutable();

    // Description (mode-aware)
    public MLModelGroup setDescription(String description);
    public String getDescription();

    // Read-only properties
    public String getName();
    public String getExternalUrl();
    public Map<String, String> getCustomProperties();
    public TimeStamp getCreated();
    public TimeStamp getLastModified();

    // Training jobs
    public MLModelGroup setTrainingJobs(List<String> trainingJobUrns);
    public MLModelGroup addTrainingJob(String trainingJobUrn);
    public MLModelGroup removeTrainingJob(String trainingJobUrn);
    public List<String> getTrainingJobs();

    // Downstream jobs
    public MLModelGroup setDownstreamJobs(List<String> downstreamJobUrns);
    public MLModelGroup addDownstreamJob(String downstreamJobUrn);
    public MLModelGroup removeDownstreamJob(String downstreamJobUrn);
    public List<String> getDownstreamJobs();

    // Server compatibility
    public static MetadataChangeProposal transformPatchToFullAspect(
        MetadataChangeProposal patch, EntityClient client);
}

Import

import datahub.client.v2.entity.MLModelGroup;

I/O Contract

Builder Inputs

Parameter Type Required Description
platform String Yes ML platform (e.g., "mlflow", "sagemaker", "vertexai")
groupId String Yes Model group identifier
env String No Environment, defaults to "PROD"
name String No Display name
description String No Group description
externalUrl String No External URL
customProperties Map<String, String> No Custom key-value properties
created TimeStamp No Created timestamp
lastModified TimeStamp No Last modified timestamp
trainingJobs List<String> No Training job URNs
downstreamJobs List<String> No Downstream job URNs

Outputs

Method Return Type Description
build() MLModelGroup New MLModelGroup entity with MlModelGroupUrn
getTrainingJobs() List<String> Training job URN strings
getDownstreamJobs() List<String> Downstream job URN strings
getCreated() TimeStamp Creation timestamp

Usage Examples

// Create an ML model group
MLModelGroup group = MLModelGroup.builder()
    .platform("mlflow")
    .groupId("recommender-model")
    .env("PROD")
    .name("Recommender Model Family")
    .description("Customer product recommendation models")
    .build();

// Manage training and downstream jobs
group.addTrainingJob("urn:li:dataProcessInstance:training_run_001");
group.addTrainingJob("urn:li:dataProcessInstance:training_run_002");
group.addDownstreamJob("urn:li:dataJob:(urn:li:dataFlow:(airflow,inference,prod),serve_model)");

// Add metadata
group.addTag("recommendation");
group.addOwner("urn:li:corpuser:ml_team", OwnershipType.TECHNICAL_OWNER);
group.setDescription("Family of product recommendation models based on collaborative filtering");
group.setDomain("urn:li:domain:MachineLearning");

// Upsert to DataHub
client.entities().upsert(group);

// Read from server
MLModelGroup fetched = client.entities().get(
    "urn:li:mlModelGroup:(urn:li:dataPlatform:mlflow,recommender-model,PROD)",
    MLModelGroup.class);
List<String> jobs = fetched.getTrainingJobs();

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment