Overview
Concrete tool for the MountainCar classic control environment provided by Gymnasium.
Description
The Mountain Car MDP is a deterministic MDP that consists of a car placed stochastically at the bottom of a sinusoidal valley, with the only possible actions being discrete accelerations that can be applied to the car in either direction. The goal of the MDP is to strategically accelerate the car to reach the goal state on top of the right hill. This is the discrete-action variant of the Mountain Car domain.
The transition dynamics update velocity as: velocity_new = velocity + (action - 1) * force - cos(3 * position) * gravity, where force = 0.001 and gravity = 0.0025. The position is then updated as: position_new = position + velocity_new. Collisions at either boundary are inelastic with velocity set to 0 upon collision with the left wall. Position is clipped to [-1.2, 0.6] and velocity is clipped to [-0.07, 0.07].
This MDP first appeared in Andrew Moore's PhD Thesis (1990) from the University of Cambridge. The car's engine is too weak to directly climb the hill, so the agent must learn to leverage gravity by building momentum through oscillation. The environment terminates when the car reaches position >= 0.5 with velocity >= goal_velocity (default 0), and each non-terminal step incurs a reward of -1.
Usage
This environment is commonly used as a benchmark for reinforcement learning algorithms, particularly those dealing with sparse reward problems and discrete action spaces. It is one of the classic test environments for evaluating exploration strategies since the agent receives -1 reward at every step and must discover the goal through undirected exploration. The environment is well-suited for Q-learning, SARSA, Monte Carlo methods, and policy gradient algorithms. It also serves as an important educational tool for demonstrating how simple physics-based environments can pose challenges for naive RL approaches.
Code Reference
Source Location
Signature
class MountainCarEnv(gym.Env):
def __init__(self, render_mode: str | None = None, goal_velocity=0):
Import
import gymnasium as gym
env = gym.make("MountainCar-v0")
I/O Contract
Inputs
| Name |
Type |
Required |
Description
|
| action |
int |
Yes |
Discrete action in {0, 1, 2}: accelerate left (0), no acceleration (1), accelerate right (2)
|
Outputs
| Name |
Type |
Description
|
| observation |
np.ndarray (shape (2,), float32) |
[position, velocity]
|
| reward |
float |
-1.0 at every timestep
|
| terminated |
bool |
True when position >= 0.5 and velocity >= goal_velocity
|
| truncated |
bool |
False (truncation handled by TimeLimit wrapper; default 200 steps)
|
| info |
dict |
Empty dictionary
|
Observation Space Details
| Index |
Observation |
Min |
Max |
Unit
|
| 0 |
Position of the car along the x-axis |
-1.2 |
0.6 |
position (m)
|
| 1 |
Velocity of the car |
-0.07 |
0.07 |
velocity (m/s)
|
Action Space Details
| Value |
Action
|
| 0 |
Accelerate to the left
|
| 1 |
Don't accelerate
|
| 2 |
Accelerate to the right
|
Key Methods
| Method |
Description
|
__init__(render_mode=None, goal_velocity=0) |
Initializes the environment with observation space Box(2,), action space Discrete(3), physics parameters, and optional goal velocity
|
reset(seed=None, options=None) |
Resets position to random value in [-0.6, -0.4] with velocity=0 (customizable via options "low"/"high"); returns (observation, info)
|
step(action) |
Applies discrete acceleration, updates velocity and position, checks termination, and returns (observation, reward, terminated, truncated, info)
|
render() |
Renders the environment using pygame in "human" or "rgb_array" mode, showing the sinusoidal valley, car, and goal flag
|
get_keys_to_action() |
Returns a mapping from keyboard keys to actions for human play (left arrow=0, right arrow=2, no key=1)
|
close() |
Closes the pygame display and cleans up resources
|
Physics Parameters
| Parameter |
Value |
Description
|
| min_position |
-1.2 |
Minimum car position (left boundary)
|
| max_position |
0.6 |
Maximum car position (right boundary)
|
| max_speed |
0.07 |
Maximum car velocity
|
| goal_position |
0.5 |
Target position for termination
|
| goal_velocity |
0 (default) |
Minimum velocity at goal for termination
|
| force |
0.001 |
Acceleration force magnitude
|
| gravity |
0.0025 |
Gravity constant affecting car on slope
|
Usage Examples
import gymnasium as gym
env = gym.make("MountainCar-v0")
observation, info = env.reset(seed=42)
for _ in range(1000):
action = env.action_space.sample()
observation, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
observation, info = env.reset()
env.close()
Custom Goal Velocity
import gymnasium as gym
env = gym.make("MountainCar-v0", render_mode="rgb_array", goal_velocity=0.1)
observation, info = env.reset(seed=123)
Related Pages