Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Facebookresearch Habitat lab SingleAgentAccessMgr

From Leeroopedia
Knowledge Sources
Domains Embodied_AI, Reinforcement_Learning, PPO
Last Updated 2026-02-15 00:00 GMT

Overview

SingleAgentAccessMgr is the default agent access manager for single-agent PPO training, responsible for creating and managing the policy, updater, rollout storage, and learning rate schedule for a single agent.

Description

The SingleAgentAccessMgr class implements the AgentAccessMgr interface and is registered in the baseline registry via @baseline_registry.register_agent_access_mgr. It serves as the central manager that wires together the core RL training components:

  • Policy creation: Instantiates the actor-critic policy from the registry based on the configuration. Supports loading pretrained weights (full model or encoder-only) and optionally freezing the visual encoder. Can reset the critic head with orthogonal initialization.
  • Updater creation: Creates the PPO updater (or its distributed variant for DD-PPO) from the registry. The updater wraps the actor-critic and provides the optimization step.
  • Learning rate scheduling: Supports linear learning rate decay via a LambdaLR scheduler. The schedule function takes the percent of training completed and returns a multiplier. A custom schedule function can be provided; the default is linear_lr_schedule which returns 1 - percent_done.
  • Rollout storage: Created via the post_init method, which accepts an optional factory function. The default behavior uses get_rollout_obs_space to handle frozen visual encoders (replacing raw observations with pre-computed visual features in the observation space).
  • State management: Provides methods for saving and loading checkpoints (get_resume_state, get_save_state, load_state_dict, load_ckpt_state_dict), switching between train/eval modes, and performing post-update operations (LR scheduling, clip parameter decay).

The module also includes the helper function get_rollout_obs_space, which augments the observation space with pre-computed visual features when the visual encoder is frozen (static encoder mode).

Usage

Use this class for standard single-agent PPO or DD-PPO training. It is automatically selected when there is a single agent in the configuration. For multi-agent scenarios, a different access manager is used. The class is typically instantiated by the PPO trainer during setup.

Code Reference

Source Location

Signature

@baseline_registry.register_agent_access_mgr
class SingleAgentAccessMgr(AgentAccessMgr):
    def __init__(
        self,
        config: "DictConfig",
        env_spec: EnvironmentSpec,
        is_distrib: bool,
        device,
        num_envs: int,
        percent_done_fn: Callable[[], float],
        resume_state: Optional[Dict[str, Any]] = None,
        lr_schedule_fn: Optional[Callable[[float], float]] = None,
        agent_name=None,
    ): ...

def get_rollout_obs_space(obs_space, actor_critic, config): ...

def linear_lr_schedule(percent_done: float) -> float: ...

Import

from habitat_baselines.rl.ppo.single_agent_access_mgr import (
    SingleAgentAccessMgr,
    get_rollout_obs_space,
    linear_lr_schedule,
)

I/O Contract

Inputs

Name Type Required Description
config DictConfig Yes Full Habitat baselines configuration
env_spec EnvironmentSpec Yes Environment specification containing observation space, action space, and original action space
is_distrib bool Yes Whether training is distributed (DD-PPO)
device torch.device Yes Device to place the policy and storage on
num_envs int Yes Number of parallel environments
percent_done_fn Callable[[], float] Yes Function returning the fraction of training completed (0.0 to 1.0)
resume_state Optional[Dict[str, Any]] No State dict for resuming training from a checkpoint
lr_schedule_fn Optional[Callable[[float], float]] No Custom learning rate schedule function; defaults to linear_lr_schedule
agent_name Optional[str] No Name of the agent; inferred from config if single-agent

Outputs

Name Type Description
actor_critic NetPolicy The initialized actor-critic policy (accessible via property)
updater Updater The PPO updater wrapping the policy (accessible via property)
rollouts Storage The rollout storage for experience collection (accessible via property, after post_init)

Usage Examples

Basic Usage

from habitat_baselines.rl.ppo.single_agent_access_mgr import SingleAgentAccessMgr

# Typically created by the PPO trainer:
agent_access_mgr = SingleAgentAccessMgr(
    config=config,
    env_spec=env_spec,
    is_distrib=False,
    device=torch.device("cuda:0"),
    num_envs=4,
    percent_done_fn=lambda: current_step / total_steps,
)

# Initialize rollout storage
agent_access_mgr.post_init()

# Access components
policy = agent_access_mgr.actor_critic
updater = agent_access_mgr.updater
rollouts = agent_access_mgr.rollouts

# Training loop
agent_access_mgr.train()
# ... collect experience, update ...
agent_access_mgr.after_update()

# Save checkpoint
save_state = agent_access_mgr.get_save_state()
resume_state = agent_access_mgr.get_resume_state()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment