Workflow:Kornia Kornia Image Feature Matching

Knowledge Sources	Kornia Kornia Feature Docs Image Matching Application
Domains	Computer_Vision, Feature_Matching, 3D_Reconstruction
Last Updated	2026-02-09 15:00 GMT

Overview

End-to-end process for finding pixel correspondences between two images using local feature detection, description, and matching with Kornia's feature module.

Description

This workflow covers the complete pipeline for image matching: from loading a pair of images, through detecting keypoints, computing descriptors, matching features, and filtering matches using geometric verification. Kornia provides both classical (SIFT, Harris, GFTT) and learned (DISK, DeDoDe, KeyNet, LoFTR, LightGlue) approaches. The detector-free LoFTR matcher operates directly on image pairs without explicit keypoint detection. All operations maintain differentiability and run on GPU, making them suitable for end-to-end training of matching pipelines. The output is a set of verified point correspondences that can be used for homography estimation, fundamental matrix computation, 3D reconstruction, or visual localization.

Usage

Execute this workflow when you have two images of the same scene taken from different viewpoints, times, or sensors, and need to establish pixel-level correspondences between them. Common applications include Structure from Motion (SfM), visual SLAM, image stitching, image registration, and augmented reality pose estimation.

Execution Steps

Step 1: Load and Preprocess Image Pair

Load two images and convert them to the tensor format required by Kornia's feature module. Most feature detectors and matchers operate on grayscale images, so convert from RGB to grayscale. Ensure images are float tensors with values in [0, 1].

Key considerations:

LoFTR and most matchers expect grayscale input
Images should be float32 tensors in [0, 1] range
Consider resizing large images to reduce computation time
Both images must be on the same device (CPU or CUDA)

Step 2: Select Feature Detection Strategy

Choose between two approaches: a detect-then-describe pipeline (SIFT, DISK, KeyNet+HardNet, GFTT+HardNet) that first detects keypoints then computes descriptors, or a detector-free matcher (LoFTR, LightGlue) that directly finds correspondences between image pairs. The detect-then-describe approach provides Local Affine Frames (LAFs) for each keypoint.

Key considerations:

LoFTR is recommended for its balance of accuracy and ease of use
DISK and DeDoDe are learned detectors with strong performance
Classical detectors (Harris, GFTT, Hessian) are faster but less robust
ScaleSpaceDetector enables multi-scale detection with configurable response functions

Step 3: Detect Keypoints and Compute Descriptors

For detect-then-describe pipelines, run the detector on each image to obtain keypoints represented as Local Affine Frames (LAFs). Then compute descriptors (HardNet, SOSNet, TFeat, HyNet, or SIFT) at each keypoint location. Descriptors are normalized vectors that encode the local image patch appearance.

Key considerations:

LAFs encode position, scale, and orientation of each keypoint
Descriptors are typically 128-dimensional normalized vectors
Use the integrated LocalFeature pipeline for convenience
For LoFTR, this step is implicit in the matching step

Step 4: Match Features Between Images

Establish correspondences by matching descriptors from the two images. Available strategies include nearest neighbor matching, mutual nearest neighbor (both images must agree), ratio test (Lowe's ratio), and geometrically-aware matching (AdaLAM, LightGlue). For LoFTR, pass both images directly to get correspondences with confidence scores.

Key considerations:

Mutual nearest neighbor matching reduces false matches
Ratio test (threshold ~0.8) filters ambiguous matches
AdaLAM uses local affine consistency for robust matching
LightGlue provides end-to-end learned matching with attention

Step 5: Geometric Verification with RANSAC

Filter correspondences by estimating a geometric model (homography or fundamental matrix) using RANSAC. This removes outlier matches that are not consistent with the underlying geometric relationship between images. RANSAC iteratively samples minimal point sets, estimates the model, and identifies the largest consensus set.

Key considerations:

Use homography model for planar scenes or dominant planes
Use fundamental matrix model for general 3D scenes
RANSAC returns both the estimated model and inlier mask
Adjust inlier threshold based on image resolution and expected accuracy

Step 6: Extract Verified Correspondences

Apply the inlier mask from RANSAC to retain only geometrically verified correspondences. These filtered matches represent reliable point-to-point correspondences between the two images, suitable for downstream tasks like pose estimation, 3D reconstruction, or image alignment.

Key considerations:

Verified correspondences are the final output of this workflow
The estimated geometric model (homography/fundamental matrix) is also a useful output
Correspondences can be visualized by drawing lines between matched points
Quality can be assessed by the number of inliers and reprojection error

Execution Diagram

GitHub URL

Workflow Repository