Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Spotify Luigi URI Target Resolution

From Leeroopedia


Knowledge Sources
Domains File_System, Abstraction
Last Updated 2026-02-10 08:00 GMT

Overview

Resolving URI schemes to appropriate target implementations automatically for protocol-agnostic target creation.

Description

URI target resolution is the practice of using Uniform Resource Identifier (URI) strings to create the appropriate target object for a given storage location without the caller needing to know which specific target class to instantiate. A URI such as hdfs:///data/output.csv, s3://bucket/key, or file:///tmp/result.txt encodes both the storage protocol and the path. The resolution system parses the URI scheme (hdfs, s3, file, ftp, etc.) and maps it to the corresponding target implementation that knows how to interact with that storage backend. This creates a level of indirection that decouples pipeline task definitions from specific storage technologies, enabling the same task logic to work with different storage backends by simply changing the URI.

Usage

Use URI target resolution when pipeline configurations need to specify target locations as simple strings (in configuration files, command-line arguments, or databases), when the same pipeline should work across different storage backends without code changes, or when building reusable task libraries that should not be coupled to a specific storage implementation.

Theoretical Basis

URI target resolution implements the abstract factory pattern driven by URI scheme parsing:

1. URI Parsing -- The input URI string is decomposed into its standard components per RFC 3986:
   URI = scheme "://" authority "/" path ["?" query] ["#" fragment]
   The scheme component (hdfs, s3, file, ftp, etc.) determines which target factory to invoke.
2. Scheme Registry -- A registry maps URI schemes to target factory functions or classes:
   registry = {
     "hdfs" -> HdfsTargetFactory,
     "s3"   -> S3TargetFactory,
     "file" -> LocalTargetFactory,
     "ftp"  -> FtpTargetFactory,
     ...
   }
   The registry is extensible: new schemes can be registered at runtime to support additional storage backends.
3. Factory Dispatch -- Given the parsed scheme, the resolution system looks up the corresponding factory in the registry and invokes it with the remaining URI components (authority, path, query parameters):
   target = registry[scheme].create(authority, path, query_params)
4. Parameter Extraction -- Additional configuration may be encoded in the URI's query string or authority component:
   * s3://bucket/key?region=us-east-1 -- region parameter passed to S3 target
   * hdfs://namenode:8020/path -- namenode host and port extracted from authority
5. Default Scheme -- If no scheme is specified, the system applies a configurable default (typically file:// for local filesystem), maintaining backward compatibility with plain path strings.
6. Opener Integration -- The resolution system may integrate with file opener libraries that provide a unified open() interface across storage backends, allowing targets to leverage existing multi-protocol file access implementations.

The fundamental design principle is late binding: the decision of which target implementation to use is deferred from code-writing time to configuration time, enabling greater flexibility and reusability in pipeline definitions.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment