Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Farama Foundation Gymnasium Transform Reward Wrappers

From Leeroopedia
Revision as of 12:37, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Farama_Foundation_Gymnasium_Transform_Reward_Wrappers.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Reinforcement_Learning, Wrappers
Last Updated 2026-02-15 03:00 GMT

Overview

A collection of reward transformation wrappers that modify the reward returned by the environment using user-defined functions, including TransformReward and ClipReward.

Description

This module provides two reward wrappers:

  • TransformReward -- A general-purpose reward wrapper that applies a user-provided callable to the reward returned by the environment's step function. The function takes a reward value (SupportsFloat) and returns a transformed reward. This is the base class for other reward transformations.
  • ClipReward -- A subclass of TransformReward that clips rewards between a minimum and maximum bound using np.clip. At least one of min_reward or max_reward must be provided, and min_reward must be less than or equal to max_reward.

Both wrappers have vector versions available in gymnasium.wrappers.vector.

Usage

Use TransformReward for custom reward shaping (e.g., scaling, shifting, or non-linear transformations). Use ClipReward to bound rewards to a fixed range, which can help stabilize training when reward magnitudes vary significantly.

Code Reference

Source Location

Signature

class TransformReward(gym.RewardWrapper[ObsType, ActType], gym.utils.RecordConstructorArgs):
    def __init__(self, env: gym.Env[ObsType, ActType], func: Callable[[SupportsFloat], SupportsFloat]): ...

class ClipReward(TransformReward[ObsType, ActType], gym.utils.RecordConstructorArgs):
    def __init__(self, env: gym.Env[ObsType, ActType], min_reward: float | np.ndarray | None = None, max_reward: float | np.ndarray | None = None): ...

Import

from gymnasium.wrappers import TransformReward, ClipReward

I/O Contract

Inputs

Name Type Required Description
env Env Yes The environment to wrap
func Callable Yes (TransformReward) Function to apply to the reward
min_reward float, ndarray, or None No (ClipReward) Lower bound for reward clipping
max_reward float, ndarray, or None No (ClipReward) Upper bound for reward clipping

Outputs

Name Type Description
reward SupportsFloat The transformed or clipped reward value

Usage Examples

import gymnasium as gym
from gymnasium.wrappers import TransformReward, ClipReward

# TransformReward: double the reward and add 1
env = gym.make("CartPole-v1")
env = TransformReward(env, lambda r: 2 * r + 1)
_ = env.reset()
_, rew, _, _, _ = env.step(0)
# rew == 3.0

# ClipReward: clip rewards between 0 and 0.5
env = gym.make("CartPole-v1")
env = ClipReward(env, 0, 0.5)
_ = env.reset()
_, rew, _, _, _ = env.step(1)
# rew == 0.5

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment