Implementation:Farama Foundation Gymnasium Continuous MountainCarEnv

Knowledge Sources	Farama_Foundation_Gymnasium Gymnasium Docs
Domains	Reinforcement_Learning, Classic_Control
Last Updated	2026-02-15 03:00 GMT

Overview

Concrete tool for the Continuous Mountain Car classic control environment provided by Gymnasium.

Description

The Continuous Mountain Car MDP is a deterministic MDP that consists of a car placed stochastically at the bottom of a sinusoidal valley, with the only possible actions being continuous accelerations that can be applied to the car in either direction. The goal of the MDP is to strategically accelerate the car to reach the goal state on top of the right hill. This is the continuous-action variant of the Mountain Car domain.

The transition dynamics update velocity as: velocity_new = velocity + force * power - 0.0025 * cos(3 * position), where force is the action clipped to [-1, 1] and power is the constant 0.0015. The position is then updated as: position_new = position + velocity_new. Collisions at either boundary are inelastic, with velocity set to 0 upon collision. Position is clipped to [-1.2, 0.6] and velocity is clipped to [-0.07, 0.07].

This MDP first appeared in Andrew Moore's PhD Thesis (1990) from the University of Cambridge. The environment terminates when the car reaches position >= 0.45 with velocity >= goal_velocity (default 0). The reward function penalizes large actions with -0.1 * action^2 per step, and grants +100 upon reaching the goal.

Usage

This environment is commonly used for benchmarking continuous-action reinforcement learning algorithms. It is well-suited for testing policy gradient methods, actor-critic algorithms, and other methods designed for continuous action spaces. The sparse reward structure (large bonus at the goal, small penalties per step) makes it a useful testbed for exploration strategies. It is also valuable for educational purposes, illustrating the concept of momentum-based problem solving where the agent must learn to swing back and forth to build enough energy to reach the goal.

Code Reference

Source Location

Repository: Farama_Foundation_Gymnasium
File: gymnasium/envs/classic_control/continuous_mountain_car.py

Signature

class Continuous_MountainCarEnv(gym.Env):
    def __init__(self, render_mode: str | None = None, goal_velocity=0):

Import

import gymnasium as gym
env = gym.make("MountainCarContinuous-v0")

I/O Contract

Inputs

Name	Type	Required	Description
action	np.ndarray (shape (1,), float32)	Yes	Continuous force applied to the car, clipped to [-1.0, 1.0]

Outputs

Name	Type	Description
observation	np.ndarray (shape (2,), float32)	[position, velocity]
reward	float	-0.1 * action^2 per step; +100.0 added upon reaching the goal
terminated	bool	True when position >= 0.45 and velocity >= goal_velocity
truncated	bool	False (truncation handled by TimeLimit wrapper; default 999 steps)
info	dict	Empty dictionary

Observation Space Details

Index	Observation	Min	Max	Unit
0	Position of the car along the x-axis	-1.2	0.6	position (m)
1	Velocity of the car	-0.07	0.07	velocity (m/s)

Action Space Details

Dimension	Min	Max	Description
0	-1.0	1.0	Directional force applied on the car (multiplied by power=0.0015)

Key Methods

Method	Description
`__init__(render_mode=None, goal_velocity=0)`	Initializes the environment with observation space Box(2,), continuous action space Box(1,), physics parameters, and optional goal velocity
`reset(seed=None, options=None)`	Resets position to random value in [-0.6, -0.4] with velocity=0 (customizable via options "low"/"high"); returns (observation, info)
`step(action)`	Applies continuous force, updates velocity and position, checks termination, computes reward, and returns (observation, reward, terminated, truncated, info)
`render()`	Renders the environment using pygame in "human" or "rgb_array" mode, showing the sinusoidal valley, car, and goal flag
`close()`	Closes the pygame display and cleans up resources

Physics Parameters

Parameter	Value	Description
min_position	-1.2	Minimum car position (left boundary)
max_position	0.6	Maximum car position (right boundary)
max_speed	0.07	Maximum car velocity
goal_position	0.45	Target position for termination
goal_velocity	0 (default)	Minimum velocity at goal for termination
power	0.0015	Force multiplier for acceleration
min_action	-1.0	Minimum action value
max_action	1.0	Maximum action value

Usage Examples

import gymnasium as gym

env = gym.make("MountainCarContinuous-v0")
observation, info = env.reset(seed=42)

for _ in range(1000):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        observation, info = env.reset()

env.close()

Custom Goal Velocity

import gymnasium as gym

env = gym.make("MountainCarContinuous-v0", render_mode="rgb_array", goal_velocity=0.1)
observation, info = env.reset(seed=123, options={"low": -0.7, "high": -0.5})

Related Pages

Environment:Farama_Foundation_Gymnasium_Python_3_10_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment