Implementation:Pytorch Serve Management API

Overview

Management API is the REST-based control plane in TorchServe for dynamic model lifecycle management. It runs on port 8081 by default and provides endpoints for registering, scaling, describing, and unregistering models on a running server. TorchServe also provides Python helper functions (register_model and register_model_with_params) in the ts.launcher module for programmatic access.

Field	Value
Implementation Name	Management API
Type	External Tool Doc
Workflow	Model_Deployment
Domains	Model_Serving, API_Design
Knowledge Sources	TorchServe
Last Updated	2026-02-13 00:00 GMT

Description

The Management API is served by the Java frontend (Netty server) and provides RESTful endpoints following the gRPC ManagementAPIsService proto definition. It requires the --enable-model-api flag to be set at server startup for model registration to work.

Endpoints

Endpoint	Method	Description	Default Port
`/models`	`POST`	Register a new model	8081
`/models`	`GET`	List all registered models	8081
`/models/{model_name}`	`GET`	Describe model status and workers	8081
`/models/{model_name}`	`PUT`	Scale workers for a model	8081
`/models/{model_name}`	`DELETE`	Unregister a model	8081
`/models/{model_name}/{version}`	`PUT`	Scale workers for a specific version	8081
`/models/{model_name}/{version}`	`DELETE`	Unregister a specific version	8081
`/models/{model_name}/all`	`GET`	Describe all versions of a model	8081

Python Helper Functions

The ts.launcher module provides two convenience functions that wrap HTTP calls to the Management API:

register_model(model_name, url): Registers a model with 1 initial worker synchronously.
register_model_with_params(params): Registers a model with arbitrary parameters.

Usage

from ts.launcher import register_model, register_model_with_params

Or use curl / requests directly against the REST endpoints.

Code Reference

Source Location

File	Lines	Description	Repository
`docs/management_api.md`	L27-188	REST API documentation	pytorch/serve
`ts/launcher.py`	L107-119	Python helper functions	pytorch/serve

Signature

REST Endpoints

POST /models
  Parameters:
    url (str):              Required. Model archive URL or local .mar filename.
    model_name (str):       Optional. Override model name from manifest.
    handler (str):          Optional. Override handler from manifest.
    runtime (str):          Optional. Runtime type. Default: "PYTHON".
    batch_size (int):       Optional. Inference batch size. Default: 1.
    max_batch_delay (int):  Optional. Max batch wait in ms. Default: 100.
    initial_workers (int):  Optional. Initial worker count. Default: 0.
    synchronous (bool):     Optional. Wait for workers. Default: false.
    response_timeout (int): Optional. Worker response timeout in seconds. Default: 120.
    startup_timeout (int):  Optional. Model load timeout in seconds. Default: 120.

PUT /models/{model_name}
  Parameters:
    min_worker (int):       Optional. Minimum workers. Default: 1.
    max_worker (int):       Optional. Maximum workers. Default: same as min_worker.
    synchronous (bool):     Optional. Wait for scaling. Default: false.
    timeout (int):          Optional. Worker drain timeout in seconds. Default: -1.

DELETE /models/{model_name}

GET /models/{model_name}

GET /models

Python Helpers

def register_model(model_name: str, url: str) -> requests.Response:
    """
    Register a model with 1 initial worker, synchronous.

    Sends POST to http://localhost:8081/models with params:
      model_name, url, initial_workers=1, synchronous=true

    Args:
        model_name (str): Name for the model.
        url (str): URL or local filename of the .mar archive.

    Returns:
        requests.Response: HTTP response from the Management API.
    """
    ...


def register_model_with_params(params) -> requests.Response:
    """
    Register a model with arbitrary parameters.

    Sends POST to http://localhost:8081/models with the given params.

    Args:
        params: dict or list of tuples of query parameters.

    Returns:
        requests.Response: HTTP response from the Management API.
    """
    ...

Import

from ts.launcher import register_model, register_model_with_params

I/O Contract

POST /models (Register)

Input	Type	Description
Query parameters	See signature above	Model registration parameters

Response Code	Condition	Body
200	Synchronous registration success	`{"status": "Model \"{name}\" Version: {ver} registered with {n} initial workers"}`
202	Asynchronous registration accepted	`{"status": "Processing worker updates..."}`
400	Invalid parameters	Error message
409	Model already registered	Error message

PUT /models/{model_name} (Scale)

Response Code	Condition	Body
200	Synchronous scaling success	`{"status": "Workers scaled to {n} for model: {name}"}`
202	Asynchronous scaling accepted	`{"status": "Processing worker updates..."}`
404	Model not found	Error message

DELETE /models/{model_name} (Unregister)

Response Code	Condition	Body
200	Model unregistered	`{"status": "Model \"{name}\" unregistered"}`
404	Model not found	Error message

GET /models/{model_name} (Describe)

Response Code	Body Fields
200	`modelName`, `modelVersion`, `modelUrl`, `runtime`, `minWorkers`, `maxWorkers`, `batchSize`, `maxBatchDelay`, `workers[]` (id, startTime, status, gpu, memoryUsage), `jobQueueStatus` (remainingCapacity, pendingRequests)

register_model()

Parameter	Type	Required	Description
`model_name`	`str`	Yes	Name for the model
`url`	`str`	Yes	URL or local `.mar` filename

Return	Type	Description
response	`requests.Response`	HTTP response from Management API

Usage Examples

Example 1: Register a model from a remote URL

curl -X POST "http://localhost:8081/models?url=https://torchserve.pytorch.org/mar_files/squeezenet1_1.mar"

# Response (202):
{
  "status": "Model \"squeezenet_v1.1\" Version: 1.0 registered with 0 initial workers. Use scale workers API to add workers for the model."
}

Example 2: Register with workers synchronously

curl -v -X POST "http://localhost:8081/models?initial_workers=1&synchronous=true&url=https://torchserve.pytorch.org/mar_files/squeezenet1_1.mar"

# Response (200):
{
  "status": "Model \"squeezenet1_1\" Version: 1.0 registered with 1 initial workers"
}

Example 3: Scale workers

# Scale to 3 workers synchronously
curl -v -X PUT "http://localhost:8081/models/noop?min_worker=3&synchronous=true"

# Response (200):
{
  "status": "Workers scaled to 3 for model: noop"
}

# Scale a specific version
curl -v -X PUT "http://localhost:8081/models/noop/2.0?min_worker=3&synchronous=true"

# Response (200):
{
  "status": "Workers scaled to 3 for model: noop, version: 2.0"
}

Example 4: Describe model status

curl http://localhost:8081/models/noop

# Response (200):
[
  {
    "modelName": "noop",
    "modelVersion": "1.0",
    "modelUrl": "noop.mar",
    "engine": "Torch",
    "runtime": "python",
    "minWorkers": 1,
    "maxWorkers": 1,
    "batchSize": 1,
    "maxBatchDelay": 100,
    "workers": [
      {
        "id": "9000",
        "startTime": "2018-10-02T13:44:53.034Z",
        "status": "READY",
        "gpu": false,
        "memoryUsage": 89247744
      }
    ],
    "jobQueueStatus": {
      "remainingCapacity": 100,
      "pendingRequests": 0
    }
  }
]

Example 5: Python programmatic registration

from ts.launcher import register_model, register_model_with_params

# Simple registration: 1 worker, synchronous
response = register_model("squeezenet", "squeezenet1_1.mar")
print(response.status_code)  # 200
print(response.json())       # {"status": "Model ..."}

# Registration with custom parameters
params = {
    "model_name": "bert",
    "url": "bert.mar",
    "initial_workers": "4",
    "batch_size": "16",
    "max_batch_delay": "200",
    "synchronous": "true",
    "response_timeout": "300",
}
response = register_model_with_params(params)
print(response.status_code)  # 200

Example 6: Full lifecycle workflow

import requests

BASE = "http://localhost:8081"

# 1. Register model
requests.post(f"{BASE}/models", params={
    "url": "resnet18.mar",
    "initial_workers": "2",
    "synchronous": "true",
})

# 2. Check status
status = requests.get(f"{BASE}/models/resnet18").json()
print(f"Workers: {len(status[0]['workers'])}")

# 3. Scale up for peak traffic
requests.put(f"{BASE}/models/resnet18", params={
    "min_worker": "8",
    "synchronous": "true",
})

# 4. Scale down after peak
requests.put(f"{BASE}/models/resnet18", params={
    "min_worker": "2",
    "synchronous": "true",
})

# 5. Unregister when no longer needed
requests.delete(f"{BASE}/models/resnet18")

Related Pages

Principle:Pytorch_Serve_Model_Registration - The model registration principle this API implements
Implementation:Pytorch_Serve_Model_Server_Start - Server must be started before the Management API is available
Implementation:Pytorch_Serve_Generate_Model_Archive - Creates the .mar files that are registered
Implementation:Pytorch_Serve_Service_Predict - Registered models become available through the inference pipeline

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment