Implementation:Facebookresearch Habitat lab SingleAgentAccessMgr

Knowledge Sources	Facebookresearch_Habitat_lab
Domains	Embodied_AI, Reinforcement_Learning, PPO
Last Updated	2026-02-15 00:00 GMT

Overview

SingleAgentAccessMgr is the default agent access manager for single-agent PPO training, responsible for creating and managing the policy, updater, rollout storage, and learning rate schedule for a single agent.

Description

The SingleAgentAccessMgr class implements the AgentAccessMgr interface and is registered in the baseline registry via @baseline_registry.register_agent_access_mgr. It serves as the central manager that wires together the core RL training components:

Policy creation: Instantiates the actor-critic policy from the registry based on the configuration. Supports loading pretrained weights (full model or encoder-only) and optionally freezing the visual encoder. Can reset the critic head with orthogonal initialization.

Updater creation: Creates the PPO updater (or its distributed variant for DD-PPO) from the registry. The updater wraps the actor-critic and provides the optimization step.

Learning rate scheduling: Supports linear learning rate decay via a LambdaLR scheduler. The schedule function takes the percent of training completed and returns a multiplier. A custom schedule function can be provided; the default is linear_lr_schedule which returns 1 - percent_done.

Rollout storage: Created via the post_init method, which accepts an optional factory function. The default behavior uses get_rollout_obs_space to handle frozen visual encoders (replacing raw observations with pre-computed visual features in the observation space).

State management: Provides methods for saving and loading checkpoints (get_resume_state, get_save_state, load_state_dict, load_ckpt_state_dict), switching between train/eval modes, and performing post-update operations (LR scheduling, clip parameter decay).

The module also includes the helper function get_rollout_obs_space, which augments the observation space with pre-computed visual features when the visual encoder is frozen (static encoder mode).

Usage

Use this class for standard single-agent PPO or DD-PPO training. It is automatically selected when there is a single agent in the configuration. For multi-agent scenarios, a different access manager is used. The class is typically instantiated by the PPO trainer during setup.

Code Reference

Source Location

Repository: Facebookresearch_Habitat_lab
File: habitat-baselines/habitat_baselines/rl/ppo/single_agent_access_mgr.py
Lines: 1-319

Signature

@baseline_registry.register_agent_access_mgr
class SingleAgentAccessMgr(AgentAccessMgr):
    def __init__(
        self,
        config: "DictConfig",
        env_spec: EnvironmentSpec,
        is_distrib: bool,
        device,
        num_envs: int,
        percent_done_fn: Callable[[], float],
        resume_state: Optional[Dict[str, Any]] = None,
        lr_schedule_fn: Optional[Callable[[float], float]] = None,
        agent_name=None,
    ): ...

def get_rollout_obs_space(obs_space, actor_critic, config): ...

def linear_lr_schedule(percent_done: float) -> float: ...

Import

from habitat_baselines.rl.ppo.single_agent_access_mgr import (
    SingleAgentAccessMgr,
    get_rollout_obs_space,
    linear_lr_schedule,
)

I/O Contract

Inputs

Name	Type	Required	Description
config	DictConfig	Yes	Full Habitat baselines configuration
env_spec	EnvironmentSpec	Yes	Environment specification containing observation space, action space, and original action space
is_distrib	bool	Yes	Whether training is distributed (DD-PPO)
device	torch.device	Yes	Device to place the policy and storage on
num_envs	int	Yes	Number of parallel environments
percent_done_fn	Callable[[], float]	Yes	Function returning the fraction of training completed (0.0 to 1.0)
resume_state	Optional[Dict[str, Any]]	No	State dict for resuming training from a checkpoint
lr_schedule_fn	Optional[Callable[[float], float]]	No	Custom learning rate schedule function; defaults to linear_lr_schedule
agent_name	Optional[str]	No	Name of the agent; inferred from config if single-agent

Outputs

Name	Type	Description
actor_critic	NetPolicy	The initialized actor-critic policy (accessible via property)
updater	Updater	The PPO updater wrapping the policy (accessible via property)
rollouts	Storage	The rollout storage for experience collection (accessible via property, after post_init)

Usage Examples

Basic Usage

from habitat_baselines.rl.ppo.single_agent_access_mgr import SingleAgentAccessMgr

# Typically created by the PPO trainer:
agent_access_mgr = SingleAgentAccessMgr(
    config=config,
    env_spec=env_spec,
    is_distrib=False,
    device=torch.device("cuda:0"),
    num_envs=4,
    percent_done_fn=lambda: current_step / total_steps,
)

# Initialize rollout storage
agent_access_mgr.post_init()

# Access components
policy = agent_access_mgr.actor_critic
updater = agent_access_mgr.updater
rollouts = agent_access_mgr.rollouts

# Training loop
agent_access_mgr.train()
# ... collect experience, update ...
agent_access_mgr.after_update()

# Save checkpoint
save_state = agent_access_mgr.get_save_state()
resume_state = agent_access_mgr.get_resume_state()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment