Implementation:SqueezeAILab ETS Sglang Launch Reward Server

Knowledge Sources	ETS sglang-ets
Domains	Inference, Model_Serving, Reward_Modeling
Last Updated	2026-02-14 02:00 GMT

Overview

Concrete tool for launching an SGLang HTTP server hosting the Process Reward Model (PRM) on a dedicated GPU.

Description

This shell script launches the SGLang inference server for the reward model. It differs from the policy server by using --mem-fraction-static 0.85, which reserves 15% of GPU memory for a collocated SentenceTransformer embedding model used for trajectory diversity scoring. The reward server is typically deployed on GPU 1 (separate from the policy model on GPU 0).

Usage

Run this script in a separate terminal or background process before starting the ETS tree search. Must be running alongside the policy model server.

Code Reference

Source Location

Repository: ETS
File: scripts/run_reward.sh
Lines: 1-11

Signature

CUDA_VISIBLE_DEVICES=1 python3 -m sglang.launch_server \
    --model-path $MODEL_REPO \
    --port $PORT \
    --tp-size $tensor_parellel_size \
    --trust-remote-code \
    --mem-fraction-static 0.85

Import

# Client-side connection from rebase.py:
from sglang import RuntimeEndpoint
reward_endpoint = RuntimeEndpoint("http://localhost:30020")

I/O Contract

Inputs

Name	Type	Required	Description
MODEL_REPO	str	Yes	Path to PRM weights (HuggingFace model ID or local directory)
PORT	int	Yes	HTTP port to serve on (default: 30020)
tensor_parellel_size	int	Yes	Number of GPUs for tensor parallelism (default: 1)
CUDA_VISIBLE_DEVICES	str	Yes	GPU device ID (default: "1")
--mem-fraction-static	float	Yes	Fraction of GPU memory for model (default: 0.85)

Outputs

Name	Type	Description
HTTP Server	Service	Running SGLang server on specified port accepting scoring requests
Score backend	API	PRM scoring accessible via SGLang's set_score_backend mechanism

Usage Examples

Default Configuration

# Launch reward model on GPU 1, port 30020
MODEL_REPO="path/to/prm-model"
PORT=30020
tensor_parellel_size=1

# Reserve 15% GPU memory for collocated embedding model
CUDA_VISIBLE_DEVICES=1 python3 -m sglang.launch_server \
    --model-path $MODEL_REPO \
    --port $PORT \
    --tp-size $tensor_parellel_size \
    --trust-remote-code \
    --mem-fraction-static 0.85

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment