Principle:Tencent Ncnn Model Merging
| Knowledge Sources | |
|---|---|
| Domains | Model Optimization, Deployment Engineering |
| Last Updated | 2026-02-09 19:00 GMT |
Overview
The process of combining multiple separate neural network model files into a single unified model file by renaming layers with namespace prefixes to prevent name collisions and enabling multi-model inference from one loaded artifact.
Description
Model merging is a deployment optimization technique that consolidates multiple independent neural network models into a single pair of model files (a parameter file describing the graph topology and a binary file containing the weights). This is particularly relevant in inference frameworks that represent models as flat graphs of named layers, where loading multiple models separately incurs overhead in file I/O, memory allocation, and model management complexity.
The core challenge in merging is layer name collision. When two independently trained models are combined, they frequently share identical layer names (e.g., both may have layers named "conv1", "relu1", "fc"). To resolve this, model merging applies namespace prefixes to every layer name within each constituent model. For example, layers from a face detection model might be prefixed with "detect_" while layers from a recognition model receive the prefix "recognize_", transforming "conv1" into "detect_conv1" and "recognize_conv1" respectively.
Beyond simple concatenation, the merging process must also update all internal references between layers. Each layer's input and output blob names must be consistently renamed so that the graph connectivity within each original model is preserved. The merged model's parameter file contains the union of all layers from all input models, each with updated names, while the binary weight file is the concatenation of all weight data with proper alignment.
The resulting merged model can be loaded once and used to run multiple inference subgraphs by specifying the appropriate namespaced input and output blob names, eliminating the need to manage multiple model objects in application code.
Usage
This principle applies in deployment scenarios involving multiple cooperating models:
- Multi-stage pipelines: Merging detection and classification models used in sequence.
- Ensemble inference: Combining multiple models whose outputs are aggregated for improved accuracy.
- Resource-constrained devices: Reducing the number of file handles, memory mappings, and initialization calls on embedded platforms.
- Simplified deployment: Distributing a single model file instead of managing multiple model artifacts.
Theoretical Basis
The model merging algorithm in pseudo-code:
function merge_models(models_with_prefixes):
merged_layers = []
merged_weights = ByteBuffer()
for (model, prefix) in models_with_prefixes:
param = parse_param_file(model.param)
bin_data = read_binary_file(model.bin)
weight_offset = 0
for layer in param.layers:
// Rename layer with namespace prefix
new_layer = copy(layer)
new_layer.name = prefix + layer.name
// Rename all input blob references
for i in range(len(new_layer.inputs)):
new_layer.inputs[i] = prefix + layer.inputs[i]
// Rename all output blob references
for i in range(len(new_layer.outputs)):
new_layer.outputs[i] = prefix + layer.outputs[i]
// Copy weight data
layer_weight_size = get_weight_size(layer)
new_layer.weight_offset = len(merged_weights)
merged_weights.append(bin_data[weight_offset : weight_offset + layer_weight_size])
weight_offset += layer_weight_size
merged_layers.append(new_layer)
// Write merged parameter file
write_param(merged_layers, total_blob_count, total_layer_count)
// Write merged binary weights
write_binary(merged_weights)
Using the merged model at inference time:
net = load_model("merged.param", "merged.bin")
// Run detection subgraph
extractor = net.create_extractor()
extractor.input("detect_input", image_data)
extractor.extract("detect_output", detection_result)
// Run recognition subgraph
extractor2 = net.create_extractor()
extractor2.input("recognize_input", aligned_face)
extractor2.extract("recognize_output", embedding)