Implementation:Facebookresearch Habitat lab SingleAgentAccessMgr
| Knowledge Sources | |
|---|---|
| Domains | Embodied_AI, Reinforcement_Learning, PPO |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
SingleAgentAccessMgr is the default agent access manager for single-agent PPO training, responsible for creating and managing the policy, updater, rollout storage, and learning rate schedule for a single agent.
Description
The SingleAgentAccessMgr class implements the AgentAccessMgr interface and is registered in the baseline registry via @baseline_registry.register_agent_access_mgr. It serves as the central manager that wires together the core RL training components:
- Policy creation: Instantiates the actor-critic policy from the registry based on the configuration. Supports loading pretrained weights (full model or encoder-only) and optionally freezing the visual encoder. Can reset the critic head with orthogonal initialization.
- Updater creation: Creates the PPO updater (or its distributed variant for DD-PPO) from the registry. The updater wraps the actor-critic and provides the optimization step.
- Learning rate scheduling: Supports linear learning rate decay via a LambdaLR scheduler. The schedule function takes the percent of training completed and returns a multiplier. A custom schedule function can be provided; the default is
linear_lr_schedulewhich returns1 - percent_done.
- Rollout storage: Created via the post_init method, which accepts an optional factory function. The default behavior uses get_rollout_obs_space to handle frozen visual encoders (replacing raw observations with pre-computed visual features in the observation space).
- State management: Provides methods for saving and loading checkpoints (get_resume_state, get_save_state, load_state_dict, load_ckpt_state_dict), switching between train/eval modes, and performing post-update operations (LR scheduling, clip parameter decay).
The module also includes the helper function get_rollout_obs_space, which augments the observation space with pre-computed visual features when the visual encoder is frozen (static encoder mode).
Usage
Use this class for standard single-agent PPO or DD-PPO training. It is automatically selected when there is a single agent in the configuration. For multi-agent scenarios, a different access manager is used. The class is typically instantiated by the PPO trainer during setup.
Code Reference
Source Location
- Repository: Facebookresearch_Habitat_lab
- File: habitat-baselines/habitat_baselines/rl/ppo/single_agent_access_mgr.py
- Lines: 1-319
Signature
@baseline_registry.register_agent_access_mgr
class SingleAgentAccessMgr(AgentAccessMgr):
def __init__(
self,
config: "DictConfig",
env_spec: EnvironmentSpec,
is_distrib: bool,
device,
num_envs: int,
percent_done_fn: Callable[[], float],
resume_state: Optional[Dict[str, Any]] = None,
lr_schedule_fn: Optional[Callable[[float], float]] = None,
agent_name=None,
): ...
def get_rollout_obs_space(obs_space, actor_critic, config): ...
def linear_lr_schedule(percent_done: float) -> float: ...
Import
from habitat_baselines.rl.ppo.single_agent_access_mgr import (
SingleAgentAccessMgr,
get_rollout_obs_space,
linear_lr_schedule,
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | DictConfig | Yes | Full Habitat baselines configuration |
| env_spec | EnvironmentSpec | Yes | Environment specification containing observation space, action space, and original action space |
| is_distrib | bool | Yes | Whether training is distributed (DD-PPO) |
| device | torch.device | Yes | Device to place the policy and storage on |
| num_envs | int | Yes | Number of parallel environments |
| percent_done_fn | Callable[[], float] | Yes | Function returning the fraction of training completed (0.0 to 1.0) |
| resume_state | Optional[Dict[str, Any]] | No | State dict for resuming training from a checkpoint |
| lr_schedule_fn | Optional[Callable[[float], float]] | No | Custom learning rate schedule function; defaults to linear_lr_schedule |
| agent_name | Optional[str] | No | Name of the agent; inferred from config if single-agent |
Outputs
| Name | Type | Description |
|---|---|---|
| actor_critic | NetPolicy | The initialized actor-critic policy (accessible via property) |
| updater | Updater | The PPO updater wrapping the policy (accessible via property) |
| rollouts | Storage | The rollout storage for experience collection (accessible via property, after post_init) |
Usage Examples
Basic Usage
from habitat_baselines.rl.ppo.single_agent_access_mgr import SingleAgentAccessMgr
# Typically created by the PPO trainer:
agent_access_mgr = SingleAgentAccessMgr(
config=config,
env_spec=env_spec,
is_distrib=False,
device=torch.device("cuda:0"),
num_envs=4,
percent_done_fn=lambda: current_step / total_steps,
)
# Initialize rollout storage
agent_access_mgr.post_init()
# Access components
policy = agent_access_mgr.actor_critic
updater = agent_access_mgr.updater
rollouts = agent_access_mgr.rollouts
# Training loop
agent_access_mgr.train()
# ... collect experience, update ...
agent_access_mgr.after_update()
# Save checkpoint
save_state = agent_access_mgr.get_save_state()
resume_state = agent_access_mgr.get_resume_state()