Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:DataExpert io Data engineer handbook Graph Vertex Generation

From Leeroopedia


Overview

Graph Vertex Generation is the process of converting relational data into a vertex format suitable for graph databases and graph-based analytics. Each vertex represents an entity with an identifier, a type label, and a flexible property map constructed from relational columns.

Theory

Transforming relational data into graph vertices involves several key operations:

  • Deduplication via ROW_NUMBER() Window Function - Relational source tables often contain duplicate rows for the same entity (e.g., a team appearing in multiple seasons). The ROW_NUMBER() window function, partitioned by the entity's natural key and ordered deterministically, assigns a sequence number to each duplicate. Filtering to ROW_NUMBER() = 1 retains exactly one representative row per entity.
  • Map Construction from Multiple Columns - Graph vertex properties are stored as a key-value map rather than fixed columns. The SQL MAP() function (or equivalent) constructs this map from multiple relational columns, enabling flexible property storage without schema rigidity.
  • Type Labeling - Each vertex receives a type label (e.g., 'team', 'player') that categorizes it within the graph schema, enabling type-specific traversals and queries.

When to Apply

This pattern is appropriate when:

  • Building graph databases from relational sources (e.g., converting NBA team data into graph vertices)
  • Source data contains duplicates that must be resolved before vertex creation
  • The target graph schema requires flexible property maps rather than fixed columns
  • Multiple entity types need to be unified under a common vertex format (identifier, type, properties)

Theoretical Basis

The transformation relies on:

  • ROW_NUMBER() Window Function - deterministic deduplication by assigning ordinal positions within each partition
  • MAP() Construction - converting named columns into a single key-value property map for schema-flexible storage
  • Common Vertex Schema - standardizing diverse entities into a uniform (identifier, type, properties) format for graph ingestion
-- Conceptual pattern
WITH entities_deduped AS (
    SELECT *,
        ROW_NUMBER() OVER (PARTITION BY entity_id ORDER BY some_column) AS row_num
    FROM raw_entities
)
SELECT
    entity_id AS identifier,
    'entity_type' AS type,
    MAP('prop1', col1, 'prop2', col2) AS properties
FROM entities_deduped
WHERE row_num = 1

Related Pages

Knowledge Sources

Metadata

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment