Principle:DataExpert io Data engineer handbook Graph Vertex Generation
Appearance
Overview
Graph Vertex Generation is the process of converting relational data into a vertex format suitable for graph databases and graph-based analytics. Each vertex represents an entity with an identifier, a type label, and a flexible property map constructed from relational columns.
Theory
Transforming relational data into graph vertices involves several key operations:
- Deduplication via ROW_NUMBER() Window Function - Relational source tables often contain duplicate rows for the same entity (e.g., a team appearing in multiple seasons). The
ROW_NUMBER()window function, partitioned by the entity's natural key and ordered deterministically, assigns a sequence number to each duplicate. Filtering toROW_NUMBER() = 1retains exactly one representative row per entity. - Map Construction from Multiple Columns - Graph vertex properties are stored as a key-value map rather than fixed columns. The SQL
MAP()function (or equivalent) constructs this map from multiple relational columns, enabling flexible property storage without schema rigidity. - Type Labeling - Each vertex receives a type label (e.g.,
'team','player') that categorizes it within the graph schema, enabling type-specific traversals and queries.
When to Apply
This pattern is appropriate when:
- Building graph databases from relational sources (e.g., converting NBA team data into graph vertices)
- Source data contains duplicates that must be resolved before vertex creation
- The target graph schema requires flexible property maps rather than fixed columns
- Multiple entity types need to be unified under a common vertex format (identifier, type, properties)
Theoretical Basis
The transformation relies on:
- ROW_NUMBER() Window Function - deterministic deduplication by assigning ordinal positions within each partition
- MAP() Construction - converting named columns into a single key-value property map for schema-flexible storage
- Common Vertex Schema - standardizing diverse entities into a uniform
(identifier, type, properties)format for graph ingestion
-- Conceptual pattern
WITH entities_deduped AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY entity_id ORDER BY some_column) AS row_num
FROM raw_entities
)
SELECT
entity_id AS identifier,
'entity_type' AS type,
MAP('prop1', col1, 'prop2', col2) AS properties
FROM entities_deduped
WHERE row_num = 1
Related Pages
Knowledge Sources
Metadata
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment