Principle:Farama Foundation Gymnasium Classic Control Environments

Knowledge Sources	Farama_Foundation_Gymnasium Gymnasium Docs
Domains	Reinforcement_Learning, Control_Theory
Last Updated	2026-02-15 03:00 GMT

Overview

Classic control problems are canonical benchmark tasks from control theory that test an agent's ability to balance, swing, or navigate simple dynamical systems.

Description

Classic control environments represent foundational problems in control theory and reinforcement learning. These tasks involve low-dimensional state spaces and simple action spaces, yet they capture essential challenges such as balancing unstable equilibria, performing swing-up maneuvers, and climbing energy landscapes. Each environment is defined by a set of ordinary differential equations governing the system dynamics, which are integrated numerically at each time step.

The suite includes pole balancing (maintaining an inverted pendulum on a cart), acrobot swing-up (raising a two-link chain above a height threshold), mountain car (building momentum to escape a valley), continuous mountain car (the same task with continuous forces), and pendulum (swinging a pendulum to the upright position and holding it). These problems have been studied extensively in the control and RL literature since the 1960s, and they remain standard benchmarks for verifying that new algorithms can learn basic sensorimotor skills.

Within the machine learning ecosystem, classic control environments serve as first-pass sanity checks for RL implementations. Their small state and action spaces allow for rapid training and debugging, while their well-understood dynamics permit analytical verification of learned policies. They span both discrete action spaces (CartPole, Acrobot, MountainCar) and continuous action spaces (Pendulum, ContinuousMountainCar), providing coverage of different algorithmic requirements.

Usage

Use classic control environments for initial algorithm development, debugging, and verification. They are ideal for testing whether a new RL implementation can learn at all before scaling to more complex domains. They also serve pedagogical purposes, illustrating fundamental RL concepts such as sparse vs. dense rewards, exploration challenges, and the difference between discrete and continuous control.

Theoretical Basis

Each classic control environment is governed by specific equations of motion. For example, the CartPole system follows:

$\ddot{x} = \frac{F + m_{p} l ({\dot{θ}}^{2} \sin θ - \ddot{θ} \cos θ)}{m_{c} + m_{p}}$

$\ddot{θ} = \frac{g \sin θ - \cos θ [\frac{F + m_{p} l {\dot{θ}}^{2} \sin θ}{m_{c} + m_{p}}]}{l [\frac{4}{3} - \frac{m_{p} \cos^{2} θ}{m_{c} + m_{p}}]}$

where $x$ is the cart position, $θ$ is the pole angle, $F$ is the applied force, $m_{c}$ and $m_{p}$ are cart and pole masses, $l$ is the pole half-length, and $g$ is gravitational acceleration.

The general simulation loop for all classic control environments follows:

state = initial_state()
for each step:
    action = agent.select_action(state)
    derivatives = compute_dynamics(state, action)
    state = integrate(state, derivatives, dt)   # Euler or RK4
    reward = compute_reward(state, action)
    terminated = check_termination(state)

The Acrobot uses the Runge-Kutta fourth-order integrator for greater numerical accuracy, while CartPole and MountainCar use simpler Euler integration. The Pendulum environment uses a continuous reward based on angle, angular velocity, and torque: $r = - (θ^{2} + 0.1 {\dot{θ}}^{2} + 0.001 u^{2})$ .

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment