Implementation:Pytorch Serve SAM Fast Handler
Overview
SegmentAnythingFastHandler is a TorchServe handler for serving the Segment Anything Fast model for automatic mask generation. It extends BaseHandler and provides image-to-mask inference with COCO RLE output encoding. The handler accepts base64-encoded or raw bytes images, converts them to BGR via OpenCV, generates segmentation masks, and returns pickle-serialized base64-encoded results.
| Field | Value |
|---|---|
| Implementation Name | SAM_Fast_Handler |
| Type | Example Handler |
| Workflow | Instance_Segmentation_Serving |
| Domains | Computer_Vision, Instance_Segmentation |
| Knowledge Sources | Pytorch_Serve |
| Last Updated | 2026-02-13 18:52 GMT |
Description
The SegmentAnythingFastHandler class implements the full inference lifecycle for the Segment Anything Fast model. During initialization, it loads the SAM model from a checkpoint using the sam_model_fast_registry, configures the SamAutomaticMaskGenerator with COCO RLE output mode, and selects CUDA or CPU device. All pipeline methods are decorated with @timed for performance instrumentation.
Key Responsibilities
- Model Loading: Loads SAM checkpoint via
sam_model_fast_registry[model_type](checkpoint=sam_checkpoint)and moves to device - Mask Generator Setup: Creates
SamAutomaticMaskGeneratorwith configurableprocess_batch_sizeandoutput_mode="coco_rle" - Image Preprocessing: Accepts base64 strings, raw bytes, or list inputs; converts to BGR numpy arrays via OpenCV
- Mask Generation: Currently supports batch size of 1 per SAM limitation; calls
mask_generator.generate() - Serialization: Pickle-serializes mask data and base64-encodes for transport
Usage
from custom_handler import SegmentAnythingFastHandler
The handler is configured through a model YAML config:
# model-config.yaml for SAM Fast
handler:
model_type: "vit_h"
sam_checkpoint: "sam_vit_h_4b8939.pth"
process_batch_size: 4
Code Reference
Source Location
| File | Lines | Description |
|---|---|---|
examples/large_models/segment_anything_fast/custom_handler.py |
L1-91 | Full handler module (90 lines) |
examples/large_models/segment_anything_fast/custom_handler.py |
L19-91 | SegmentAnythingFastHandler class definition
|
examples/large_models/segment_anything_fast/custom_handler.py |
L25-52 | initialize(ctx) -- model loading and mask generator setup
|
examples/large_models/segment_anything_fast/custom_handler.py |
L54-73 | preprocess(data) -- base64/bytes to BGR OpenCV array
|
examples/large_models/segment_anything_fast/custom_handler.py |
L75-80 | inference(data) -- mask generation (batch size 1)
|
examples/large_models/segment_anything_fast/custom_handler.py |
L82-90 | postprocess(data) -- pickle + base64 encoding
|
Signature
class SegmentAnythingFastHandler(BaseHandler):
def __init__(self):
super().__init__()
self.mask_generator = None
self.initialized = False
def initialize(self, ctx):
"""
Load SAM model and create mask generator.
Reads model_type, sam_checkpoint, process_batch_size from
ctx.model_yaml_config["handler"]. Loads model via
sam_model_fast_registry and creates SamAutomaticMaskGenerator.
Args:
ctx: TorchServe context with system_properties and model_yaml_config.
"""
...
@timed
def preprocess(self, data):
"""
Convert input images to BGR numpy arrays.
Accepts base64 strings (decoded), raw bytes (opened via PIL),
or list data (converted to FloatTensor). All converted to
BGR via cv2.cvtColor.
Args:
data (list): List of dicts with "data" or "body" keys.
Returns:
list: List of BGR numpy arrays.
"""
...
@timed
def inference(self, data):
"""
Generate segmentation masks.
Currently enforces batch size of 1 via assertion.
Calls self.mask_generator.generate() on the single image.
Args:
data (list): Single-element list of BGR numpy arrays.
Returns:
list: COCO RLE encoded mask annotations.
"""
...
@timed
def postprocess(self, data):
"""
Serialize mask data for transport.
Pickle-serializes the mask annotations and base64-encodes
the result for JSON-safe transport.
Args:
data: Mask annotation data from inference.
Returns:
list: Single-element list with base64-encoded pickled string.
"""
...
Import
# Handler imports
import base64
import io
import pickle
import cv2
import numpy as np
import torch
from PIL import Image
from segment_anything_fast import SamAutomaticMaskGenerator, sam_model_fast_registry
from ts.handler_utils.timer import timed
from ts.torch_handler.base_handler import BaseHandler
I/O Contract
| Method | Input | Output | Notes |
|---|---|---|---|
initialize(ctx) |
Context with handler.model_type, handler.sam_checkpoint, handler.process_batch_size |
None (sets self.model, self.mask_generator, self.initialized = True) |
Auto-selects CUDA or CPU |
preprocess(data) |
list[dict] with "data"/"body" containing base64 string, bytes, or list |
list[numpy.ndarray] -- BGR images |
Converts RGB to BGR via cv2.cvtColor
|
inference(data) |
list[numpy.ndarray] (must be length 1) |
COCO RLE mask annotations | Asserts len(data) == 1
|
postprocess(data) |
Mask annotations from inference | list[str] -- single base64-encoded pickled string |
Uses pickle.dumps then base64.b64encode
|
Usage Examples
Example 1: Model Initialization
# From custom_handler.py L25-52: initialize() loads SAM model
def initialize(self, ctx):
properties = ctx.system_properties
model_dir = properties.get("model_dir")
self.device = "cpu"
if torch.cuda.is_available() and properties.get("gpu_id") is not None:
self.map_location = "cuda"
self.device = torch.device(
self.map_location + ":" + str(properties.get("gpu_id"))
)
torch.cuda.set_device(self.device)
model_type = ctx.model_yaml_config["handler"]["model_type"]
sam_checkpoint = os.path.join(
model_dir, ctx.model_yaml_config["handler"]["sam_checkpoint"]
)
process_batch_size = ctx.model_yaml_config["handler"]["process_batch_size"]
self.model = sam_model_fast_registry[model_type](checkpoint=sam_checkpoint)
self.model.to(self.device)
self.mask_generator = SamAutomaticMaskGenerator(
self.model, process_batch_size=process_batch_size, output_mode="coco_rle"
)
self.initialized = True
Example 2: Image Preprocessing Pipeline
# From custom_handler.py L54-73: preprocess() handles multiple input formats
@timed
def preprocess(self, data):
images = []
for row in data:
image = row.get("data") or row.get("body")
if isinstance(image, str):
image = base64.b64decode(image)
if isinstance(image, (bytearray, bytes)):
image = Image.open(io.BytesIO(image))
else:
image = torch.FloatTensor(image)
image = cv2.cvtColor(np.array(image), cv2.COLOR_RGB2BGR)
images.append(image)
return images
Example 3: Postprocess Serialization
# From custom_handler.py L82-90: postprocess() serializes mask data
@timed
def postprocess(self, data):
serialized_data = pickle.dumps(data)
base64_encoded_data = base64.b64encode(serialized_data).decode("utf-8")
return [base64_encoded_data]
Related Pages
- Principle:Pytorch_Serve_Instance_Segmentation -- principle for serving instance segmentation models with mask generation
- Implementation:Pytorch_Serve_BaseHandler - Parent class providing the
handle()orchestration