Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:SqueezeAILab ETS Sglang Launch Reward Server

From Leeroopedia
Knowledge Sources
Domains Inference, Model_Serving, Reward_Modeling
Last Updated 2026-02-14 02:00 GMT

Overview

Concrete tool for launching an SGLang HTTP server hosting the Process Reward Model (PRM) on a dedicated GPU.

Description

This shell script launches the SGLang inference server for the reward model. It differs from the policy server by using --mem-fraction-static 0.85, which reserves 15% of GPU memory for a collocated SentenceTransformer embedding model used for trajectory diversity scoring. The reward server is typically deployed on GPU 1 (separate from the policy model on GPU 0).

Usage

Run this script in a separate terminal or background process before starting the ETS tree search. Must be running alongside the policy model server.

Code Reference

Source Location

  • Repository: ETS
  • File: scripts/run_reward.sh
  • Lines: 1-11

Signature

CUDA_VISIBLE_DEVICES=1 python3 -m sglang.launch_server \
    --model-path $MODEL_REPO \
    --port $PORT \
    --tp-size $tensor_parellel_size \
    --trust-remote-code \
    --mem-fraction-static 0.85

Import

# Client-side connection from rebase.py:
from sglang import RuntimeEndpoint
reward_endpoint = RuntimeEndpoint("http://localhost:30020")

I/O Contract

Inputs

Name Type Required Description
MODEL_REPO str Yes Path to PRM weights (HuggingFace model ID or local directory)
PORT int Yes HTTP port to serve on (default: 30020)
tensor_parellel_size int Yes Number of GPUs for tensor parallelism (default: 1)
CUDA_VISIBLE_DEVICES str Yes GPU device ID (default: "1")
--mem-fraction-static float Yes Fraction of GPU memory for model (default: 0.85)

Outputs

Name Type Description
HTTP Server Service Running SGLang server on specified port accepting scoring requests
Score backend API PRM scoring accessible via SGLang's set_score_backend mechanism

Usage Examples

Default Configuration

# Launch reward model on GPU 1, port 30020
MODEL_REPO="path/to/prm-model"
PORT=30020
tensor_parellel_size=1

# Reserve 15% GPU memory for collocated embedding model
CUDA_VISIBLE_DEVICES=1 python3 -m sglang.launch_server \
    --model-path $MODEL_REPO \
    --port $PORT \
    --tp-size $tensor_parellel_size \
    --trust-remote-code \
    --mem-fraction-static 0.85

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment