Principle:Tencent Ncnn Model Merging

Knowledge Sources	Tencent_Ncnn
Domains	Model Optimization, Deployment Engineering
Last Updated	2026-02-09 19:00 GMT

Overview

The process of combining multiple separate neural network model files into a single unified model file by renaming layers with namespace prefixes to prevent name collisions and enabling multi-model inference from one loaded artifact.

Description

Model merging is a deployment optimization technique that consolidates multiple independent neural network models into a single pair of model files (a parameter file describing the graph topology and a binary file containing the weights). This is particularly relevant in inference frameworks that represent models as flat graphs of named layers, where loading multiple models separately incurs overhead in file I/O, memory allocation, and model management complexity.

The core challenge in merging is layer name collision. When two independently trained models are combined, they frequently share identical layer names (e.g., both may have layers named "conv1", "relu1", "fc"). To resolve this, model merging applies namespace prefixes to every layer name within each constituent model. For example, layers from a face detection model might be prefixed with "detect_" while layers from a recognition model receive the prefix "recognize_", transforming "conv1" into "detect_conv1" and "recognize_conv1" respectively.

Beyond simple concatenation, the merging process must also update all internal references between layers. Each layer's input and output blob names must be consistently renamed so that the graph connectivity within each original model is preserved. The merged model's parameter file contains the union of all layers from all input models, each with updated names, while the binary weight file is the concatenation of all weight data with proper alignment.

The resulting merged model can be loaded once and used to run multiple inference subgraphs by specifying the appropriate namespaced input and output blob names, eliminating the need to manage multiple model objects in application code.

Usage

This principle applies in deployment scenarios involving multiple cooperating models:

Multi-stage pipelines: Merging detection and classification models used in sequence.
Ensemble inference: Combining multiple models whose outputs are aggregated for improved accuracy.
Resource-constrained devices: Reducing the number of file handles, memory mappings, and initialization calls on embedded platforms.
Simplified deployment: Distributing a single model file instead of managing multiple model artifacts.

Theoretical Basis

The model merging algorithm in pseudo-code:

function merge_models(models_with_prefixes):
    merged_layers = []
    merged_weights = ByteBuffer()

    for (model, prefix) in models_with_prefixes:
        param = parse_param_file(model.param)
        bin_data = read_binary_file(model.bin)
        weight_offset = 0

        for layer in param.layers:
            // Rename layer with namespace prefix
            new_layer = copy(layer)
            new_layer.name = prefix + layer.name

            // Rename all input blob references
            for i in range(len(new_layer.inputs)):
                new_layer.inputs[i] = prefix + layer.inputs[i]

            // Rename all output blob references
            for i in range(len(new_layer.outputs)):
                new_layer.outputs[i] = prefix + layer.outputs[i]

            // Copy weight data
            layer_weight_size = get_weight_size(layer)
            new_layer.weight_offset = len(merged_weights)
            merged_weights.append(bin_data[weight_offset : weight_offset + layer_weight_size])
            weight_offset += layer_weight_size

            merged_layers.append(new_layer)

    // Write merged parameter file
    write_param(merged_layers, total_blob_count, total_layer_count)

    // Write merged binary weights
    write_binary(merged_weights)

Using the merged model at inference time:

net = load_model("merged.param", "merged.bin")

// Run detection subgraph
extractor = net.create_extractor()
extractor.input("detect_input", image_data)
extractor.extract("detect_output", detection_result)

// Run recognition subgraph
extractor2 = net.create_extractor()
extractor2.input("recognize_input", aligned_face)
extractor2.extract("recognize_output", embedding)

Related Pages

Implementation:Tencent_Ncnn_Ncnnmerge

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment