Implementation:MaterializeInc Materialize CLI Workload Anonymize
| Knowledge Sources | |
|---|---|
| Domains | CLI, Privacy, Testing |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
The CLI Workload Anonymize tool strips sensitive identifiers and literals from captured workload YAML files while preserving structural properties needed for replay testing.
Description
This module provides the mz-workload-anonymize CLI that reads a captured workload YAML file and replaces user-defined identifiers (database names, schema names, table names, column names) and literal values with anonymous substitutes. It preserves SQL keywords and known Materialize system object names loaded from src/sql-lexer/src/keywords.txt and test/workload-replay/objects.txt. Anonymization of identifiers and literals can be independently toggled via --identifiers and --literals flags. The mapping is consistent within a single anonymization run (the same original name always maps to the same anonymous name).
Usage
Use this tool to anonymize production workload captures before sharing them across teams or committing them to the repository, ensuring customer data privacy while maintaining workload replay fidelity.
Code Reference
Source Location
Signature
def keywords() -> set[str]: ...
def main() -> int: ...
Import
from materialize.cli.mz_workload_anonymize import main
I/O Contract
| Input | Type | Description |
|---|---|---|
| file | str |
Path to input workload.yml to anonymize |
| -o / --output | str |
Output path (defaults to overwriting input file) |
| --identifiers | bool |
Anonymize identifiers (default: True) |
| --literals | bool |
Anonymize literal values (default: True) |
| Output | Type | Description |
|---|---|---|
| workload.yml | YAML file | Anonymized workload with mz_workload_version: "1.0.0"
|
Usage Examples
# Anonymize a workload capture (overwrite in place)
mz-workload-anonymize workload.yml
# Anonymize to a new file, only anonymize identifiers
mz-workload-anonymize --no-literals -o anonymized.yml workload.yml