Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Farama Foundation Gymnasium Action Transformation

From Leeroopedia
Revision as of 18:08, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/Farama_Foundation_Gymnasium_Action_Transformation.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Reinforcement_Learning, Action_Space_Engineering
Last Updated 2026-02-15 03:00 GMT

Overview

Wrappers that transform, clip, rescale, or discretize actions before forwarding them to the underlying environment enable flexible action space engineering.

Description

Action transformation wrappers modify the agent's actions before they reach the environment's step function. This layer of indirection decouples the action representation seen by the learning algorithm from the action representation expected by the environment. Common transformations include clipping continuous actions to valid bounds, rescaling actions from a normalized range to the environment's native range, discretizing continuous action spaces into finite sets, and applying arbitrary user-defined functions.

The clipping transformation ensures that actions produced by the policy (which may exceed the environment bounds due to Gaussian exploration noise or unbounded policy outputs) are clamped to the valid range before execution. The rescaling transformation maps actions from one bounded range to another, which is useful when the learning algorithm assumes a standard action range (such as [-1, 1]) but the environment expects a different range. Discretization converts a continuous Box action space into a Discrete or MultiDiscrete space, enabling discrete RL algorithms to control continuous environments by selecting from a finite grid of actions.

Additionally, the sticky action wrapper introduces stochastic action repetition, where there is a probability that the previous action is repeated instead of the new one. This was proposed as a way to increase environment stochasticity for Atari games and can also model actuator lag or communication delays. Both single-environment and vectorized versions of action wrappers are provided.

Usage

Use action clipping when the policy may produce out-of-bounds actions and you want to enforce validity without modifying the policy. Use action rescaling when adapting a policy trained with one action range to an environment that expects a different range. Use discretization when applying discrete RL algorithms (DQN, etc.) to continuous control tasks. Use the transform action wrapper for custom action preprocessing (e.g., adding offsets, applying non-linear mappings). Use the sticky action wrapper to increase environment stochasticity or simulate actuator delays.

Theoretical Basis

Action transformation wrappers implement the composition pattern where the modified MDP ~ has a transformed action space:

a~=f(a)

where f is the transformation function and a is the agent's action. The environment then executes a~.

Clipping:

fclip(a)=clip(a,alow,ahigh)=min(max(a,alow),ahigh)

Rescaling from [alow,ahigh] to [blow,bhigh]:

frescale(a)=blow+(bhighblow)aalowahighalow

Discretization of a d-dimensional continuous space into n bins per dimension:

fdiscretize(i)=alow+in1(ahighalow),i{0,1,,n1}

Sticky action with repeat probability p:

def action(self, action):
    if np_random.uniform() < repeat_probability:
        return self.last_action    # repeat previous action
    else:
        self.last_action = action
        return action

The discretized action space has nd total actions for a d-dimensional space with n bins, or i=1dni for per-dimension bin counts. The wrapper converts a single integer index back to the corresponding continuous action via multi-dimensional unraveling.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment