Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Farama Foundation Gymnasium MountainCarEnv

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, Classic_Control
Last Updated 2026-02-15 03:00 GMT

Overview

Concrete tool for the MountainCar classic control environment provided by Gymnasium.

Description

The Mountain Car MDP is a deterministic MDP that consists of a car placed stochastically at the bottom of a sinusoidal valley, with the only possible actions being discrete accelerations that can be applied to the car in either direction. The goal of the MDP is to strategically accelerate the car to reach the goal state on top of the right hill. This is the discrete-action variant of the Mountain Car domain.

The transition dynamics update velocity as: velocity_new = velocity + (action - 1) * force - cos(3 * position) * gravity, where force = 0.001 and gravity = 0.0025. The position is then updated as: position_new = position + velocity_new. Collisions at either boundary are inelastic with velocity set to 0 upon collision with the left wall. Position is clipped to [-1.2, 0.6] and velocity is clipped to [-0.07, 0.07].

This MDP first appeared in Andrew Moore's PhD Thesis (1990) from the University of Cambridge. The car's engine is too weak to directly climb the hill, so the agent must learn to leverage gravity by building momentum through oscillation. The environment terminates when the car reaches position >= 0.5 with velocity >= goal_velocity (default 0), and each non-terminal step incurs a reward of -1.

Usage

This environment is commonly used as a benchmark for reinforcement learning algorithms, particularly those dealing with sparse reward problems and discrete action spaces. It is one of the classic test environments for evaluating exploration strategies since the agent receives -1 reward at every step and must discover the goal through undirected exploration. The environment is well-suited for Q-learning, SARSA, Monte Carlo methods, and policy gradient algorithms. It also serves as an important educational tool for demonstrating how simple physics-based environments can pose challenges for naive RL approaches.

Code Reference

Source Location

Signature

class MountainCarEnv(gym.Env):
    def __init__(self, render_mode: str | None = None, goal_velocity=0):

Import

import gymnasium as gym
env = gym.make("MountainCar-v0")

I/O Contract

Inputs

Name Type Required Description
action int Yes Discrete action in {0, 1, 2}: accelerate left (0), no acceleration (1), accelerate right (2)

Outputs

Name Type Description
observation np.ndarray (shape (2,), float32) [position, velocity]
reward float -1.0 at every timestep
terminated bool True when position >= 0.5 and velocity >= goal_velocity
truncated bool False (truncation handled by TimeLimit wrapper; default 200 steps)
info dict Empty dictionary

Observation Space Details

Index Observation Min Max Unit
0 Position of the car along the x-axis -1.2 0.6 position (m)
1 Velocity of the car -0.07 0.07 velocity (m/s)

Action Space Details

Value Action
0 Accelerate to the left
1 Don't accelerate
2 Accelerate to the right

Key Methods

Method Description
__init__(render_mode=None, goal_velocity=0) Initializes the environment with observation space Box(2,), action space Discrete(3), physics parameters, and optional goal velocity
reset(seed=None, options=None) Resets position to random value in [-0.6, -0.4] with velocity=0 (customizable via options "low"/"high"); returns (observation, info)
step(action) Applies discrete acceleration, updates velocity and position, checks termination, and returns (observation, reward, terminated, truncated, info)
render() Renders the environment using pygame in "human" or "rgb_array" mode, showing the sinusoidal valley, car, and goal flag
get_keys_to_action() Returns a mapping from keyboard keys to actions for human play (left arrow=0, right arrow=2, no key=1)
close() Closes the pygame display and cleans up resources

Physics Parameters

Parameter Value Description
min_position -1.2 Minimum car position (left boundary)
max_position 0.6 Maximum car position (right boundary)
max_speed 0.07 Maximum car velocity
goal_position 0.5 Target position for termination
goal_velocity 0 (default) Minimum velocity at goal for termination
force 0.001 Acceleration force magnitude
gravity 0.0025 Gravity constant affecting car on slope

Usage Examples

import gymnasium as gym

env = gym.make("MountainCar-v0")
observation, info = env.reset(seed=42)

for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        observation, info = env.reset()

env.close()

Custom Goal Velocity

import gymnasium as gym

env = gym.make("MountainCar-v0", render_mode="rgb_array", goal_velocity=0.1)
observation, info = env.reset(seed=123)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment