Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Facebookresearch Habitat lab SimpleCNN

From Leeroopedia
Knowledge Sources
Domains Embodied_AI, Visual_Encoding
Last Updated 2026-02-15 00:00 GMT

Overview

SimpleCNN is a lightweight 3-layer convolutional neural network that takes RGB and/or depth observations and produces a fixed-size embedding vector for use in policy networks.

Description

SimpleCNN extends nn.Module and inspects the observation space to determine whether RGB and/or depth inputs are available. It constructs a sequential CNN with three convolutional layers (kernel sizes 8x8, 4x4, 3x3 with strides 4, 2, 1) followed by a flatten operation and a fully connected layer mapping to the specified output size. The channel counts progress as: (input_channels) -> 32 -> 64 -> 32 -> output_size. ReLU activations are applied after the first two convolutions and after the final linear layer. All convolutional and linear weights are initialized using Kaiming normal initialization tuned for ReLU. If neither RGB nor depth is present in the observation space (is_blind property), an empty sequential module is used. During forward pass, RGB observations are normalized to [0, 1] and both modalities are permuted to NCHW format before concatenation.

Usage

Use SimpleCNN as a visual encoder in RL policies that require a compact feature representation from RGB and/or depth observations. It is designed for straightforward visual processing tasks where a lightweight architecture is sufficient.

Code Reference

Source Location

Signature

class SimpleCNN(nn.Module):
    def __init__(
        self,
        observation_space,
        output_size,
    ):
    def forward(self, observations: Dict[str, torch.Tensor]):

Import

from habitat_baselines.rl.models.simple_cnn import SimpleCNN

I/O Contract

Inputs

Name Type Required Description
observation_space gym.spaces.Dict Yes Observation space containing optional "rgb" and/or "depth" entries with shape (H, W, C)
output_size int Yes Dimensionality of the output embedding vector
observations Dict[str, torch.Tensor] Yes Dictionary of observation tensors passed to forward(); expects "rgb" as (B, H, W, 3) uint8 and/or "depth" as (B, H, W, 1) float

Outputs

Name Type Description
embedding torch.Tensor Embedding vector of shape (batch_size, output_size)

Key Properties

is_blind

@property
def is_blind(self) -> bool

Returns True if neither RGB nor depth channels are present in the observation space.

Architecture

Layer Type Kernel Stride Output Channels
Conv1 Conv2d 8x8 4 32
ReLU1 ReLU - - -
Conv2 Conv2d 4x4 2 64
ReLU2 ReLU - - -
Conv3 Conv2d 3x3 1 32
Flatten Flatten - - -
FC Linear - - output_size
ReLU3 ReLU - - -

Usage Examples

Basic Usage

import torch
import gym.spaces as spaces
import numpy as np
from habitat_baselines.rl.models.simple_cnn import SimpleCNN

# Define observation space with RGB and depth
obs_space = spaces.Dict({
    "rgb": spaces.Box(low=0, high=255, shape=(256, 256, 3), dtype=np.uint8),
    "depth": spaces.Box(low=0.0, high=1.0, shape=(256, 256, 1), dtype=np.float32),
})

cnn = SimpleCNN(observation_space=obs_space, output_size=512)

# Forward pass
observations = {
    "rgb": torch.randint(0, 255, (8, 256, 256, 3), dtype=torch.uint8),
    "depth": torch.rand(8, 256, 256, 1),
}
embedding = cnn(observations)
# embedding shape: (8, 512)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment