Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Farama Foundation Gymnasium Continuous MountainCarEnv

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, Classic_Control
Last Updated 2026-02-15 03:00 GMT

Overview

Concrete tool for the Continuous Mountain Car classic control environment provided by Gymnasium.

Description

The Continuous Mountain Car MDP is a deterministic MDP that consists of a car placed stochastically at the bottom of a sinusoidal valley, with the only possible actions being continuous accelerations that can be applied to the car in either direction. The goal of the MDP is to strategically accelerate the car to reach the goal state on top of the right hill. This is the continuous-action variant of the Mountain Car domain.

The transition dynamics update velocity as: velocity_new = velocity + force * power - 0.0025 * cos(3 * position), where force is the action clipped to [-1, 1] and power is the constant 0.0015. The position is then updated as: position_new = position + velocity_new. Collisions at either boundary are inelastic, with velocity set to 0 upon collision. Position is clipped to [-1.2, 0.6] and velocity is clipped to [-0.07, 0.07].

This MDP first appeared in Andrew Moore's PhD Thesis (1990) from the University of Cambridge. The environment terminates when the car reaches position >= 0.45 with velocity >= goal_velocity (default 0). The reward function penalizes large actions with -0.1 * action^2 per step, and grants +100 upon reaching the goal.

Usage

This environment is commonly used for benchmarking continuous-action reinforcement learning algorithms. It is well-suited for testing policy gradient methods, actor-critic algorithms, and other methods designed for continuous action spaces. The sparse reward structure (large bonus at the goal, small penalties per step) makes it a useful testbed for exploration strategies. It is also valuable for educational purposes, illustrating the concept of momentum-based problem solving where the agent must learn to swing back and forth to build enough energy to reach the goal.

Code Reference

Source Location

Signature

class Continuous_MountainCarEnv(gym.Env):
    def __init__(self, render_mode: str | None = None, goal_velocity=0):

Import

import gymnasium as gym
env = gym.make("MountainCarContinuous-v0")

I/O Contract

Inputs

Name Type Required Description
action np.ndarray (shape (1,), float32) Yes Continuous force applied to the car, clipped to [-1.0, 1.0]

Outputs

Name Type Description
observation np.ndarray (shape (2,), float32) [position, velocity]
reward float -0.1 * action^2 per step; +100.0 added upon reaching the goal
terminated bool True when position >= 0.45 and velocity >= goal_velocity
truncated bool False (truncation handled by TimeLimit wrapper; default 999 steps)
info dict Empty dictionary

Observation Space Details

Index Observation Min Max Unit
0 Position of the car along the x-axis -1.2 0.6 position (m)
1 Velocity of the car -0.07 0.07 velocity (m/s)

Action Space Details

Dimension Min Max Description
0 -1.0 1.0 Directional force applied on the car (multiplied by power=0.0015)

Key Methods

Method Description
__init__(render_mode=None, goal_velocity=0) Initializes the environment with observation space Box(2,), continuous action space Box(1,), physics parameters, and optional goal velocity
reset(seed=None, options=None) Resets position to random value in [-0.6, -0.4] with velocity=0 (customizable via options "low"/"high"); returns (observation, info)
step(action) Applies continuous force, updates velocity and position, checks termination, computes reward, and returns (observation, reward, terminated, truncated, info)
render() Renders the environment using pygame in "human" or "rgb_array" mode, showing the sinusoidal valley, car, and goal flag
close() Closes the pygame display and cleans up resources

Physics Parameters

Parameter Value Description
min_position -1.2 Minimum car position (left boundary)
max_position 0.6 Maximum car position (right boundary)
max_speed 0.07 Maximum car velocity
goal_position 0.45 Target position for termination
goal_velocity 0 (default) Minimum velocity at goal for termination
power 0.0015 Force multiplier for acceleration
min_action -1.0 Minimum action value
max_action 1.0 Maximum action value

Usage Examples

import gymnasium as gym

env = gym.make("MountainCarContinuous-v0")
observation, info = env.reset(seed=42)

for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        observation, info = env.reset()

env.close()

Custom Goal Velocity

import gymnasium as gym

env = gym.make("MountainCarContinuous-v0", render_mode="rgb_array", goal_velocity=0.1)
observation, info = env.reset(seed=123, options={"low": -0.7, "high": -0.5})

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment