Overview
Concrete tool for the Continuous Mountain Car classic control environment provided by Gymnasium.
Description
The Continuous Mountain Car MDP is a deterministic MDP that consists of a car placed stochastically at the bottom of a sinusoidal valley, with the only possible actions being continuous accelerations that can be applied to the car in either direction. The goal of the MDP is to strategically accelerate the car to reach the goal state on top of the right hill. This is the continuous-action variant of the Mountain Car domain.
The transition dynamics update velocity as: velocity_new = velocity + force * power - 0.0025 * cos(3 * position), where force is the action clipped to [-1, 1] and power is the constant 0.0015. The position is then updated as: position_new = position + velocity_new. Collisions at either boundary are inelastic, with velocity set to 0 upon collision. Position is clipped to [-1.2, 0.6] and velocity is clipped to [-0.07, 0.07].
This MDP first appeared in Andrew Moore's PhD Thesis (1990) from the University of Cambridge. The environment terminates when the car reaches position >= 0.45 with velocity >= goal_velocity (default 0). The reward function penalizes large actions with -0.1 * action^2 per step, and grants +100 upon reaching the goal.
Usage
This environment is commonly used for benchmarking continuous-action reinforcement learning algorithms. It is well-suited for testing policy gradient methods, actor-critic algorithms, and other methods designed for continuous action spaces. The sparse reward structure (large bonus at the goal, small penalties per step) makes it a useful testbed for exploration strategies. It is also valuable for educational purposes, illustrating the concept of momentum-based problem solving where the agent must learn to swing back and forth to build enough energy to reach the goal.
Code Reference
Source Location
Signature
class Continuous_MountainCarEnv(gym.Env):
def __init__(self, render_mode: str | None = None, goal_velocity=0):
Import
import gymnasium as gym
env = gym.make("MountainCarContinuous-v0")
I/O Contract
Inputs
| Name |
Type |
Required |
Description
|
| action |
np.ndarray (shape (1,), float32) |
Yes |
Continuous force applied to the car, clipped to [-1.0, 1.0]
|
Outputs
| Name |
Type |
Description
|
| observation |
np.ndarray (shape (2,), float32) |
[position, velocity]
|
| reward |
float |
-0.1 * action^2 per step; +100.0 added upon reaching the goal
|
| terminated |
bool |
True when position >= 0.45 and velocity >= goal_velocity
|
| truncated |
bool |
False (truncation handled by TimeLimit wrapper; default 999 steps)
|
| info |
dict |
Empty dictionary
|
Observation Space Details
| Index |
Observation |
Min |
Max |
Unit
|
| 0 |
Position of the car along the x-axis |
-1.2 |
0.6 |
position (m)
|
| 1 |
Velocity of the car |
-0.07 |
0.07 |
velocity (m/s)
|
Action Space Details
| Dimension |
Min |
Max |
Description
|
| 0 |
-1.0 |
1.0 |
Directional force applied on the car (multiplied by power=0.0015)
|
Key Methods
| Method |
Description
|
__init__(render_mode=None, goal_velocity=0) |
Initializes the environment with observation space Box(2,), continuous action space Box(1,), physics parameters, and optional goal velocity
|
reset(seed=None, options=None) |
Resets position to random value in [-0.6, -0.4] with velocity=0 (customizable via options "low"/"high"); returns (observation, info)
|
step(action) |
Applies continuous force, updates velocity and position, checks termination, computes reward, and returns (observation, reward, terminated, truncated, info)
|
render() |
Renders the environment using pygame in "human" or "rgb_array" mode, showing the sinusoidal valley, car, and goal flag
|
close() |
Closes the pygame display and cleans up resources
|
Physics Parameters
| Parameter |
Value |
Description
|
| min_position |
-1.2 |
Minimum car position (left boundary)
|
| max_position |
0.6 |
Maximum car position (right boundary)
|
| max_speed |
0.07 |
Maximum car velocity
|
| goal_position |
0.45 |
Target position for termination
|
| goal_velocity |
0 (default) |
Minimum velocity at goal for termination
|
| power |
0.0015 |
Force multiplier for acceleration
|
| min_action |
-1.0 |
Minimum action value
|
| max_action |
1.0 |
Maximum action value
|
Usage Examples
import gymnasium as gym
env = gym.make("MountainCarContinuous-v0")
observation, info = env.reset(seed=42)
for _ in range(1000):
action = env.action_space.sample()
observation, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
observation, info = env.reset()
env.close()
Custom Goal Velocity
import gymnasium as gym
env = gym.make("MountainCarContinuous-v0", render_mode="rgb_array", goal_velocity=0.1)
observation, info = env.reset(seed=123, options={"low": -0.7, "high": -0.5})
Related Pages