Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Google deepmind Dm control LQR Solver

From Leeroopedia
Metadata Value
Implementation LQR Solver
Domain Reinforcement_Learning, Control
Source Google_deepmind_Dm_control
Last Updated 2026-02-15 04:00 GMT

Overview

Concrete tool for computing the optimal value function and linear policy for LQR environments provided by the dm_control Control Suite.

Description

The LQR Solver module computes the analytical optimal solution for the infinite-horizon discrete-time Linear Quadratic Regulator problem. Given an LQR environment, the solve function extracts the system dynamics matrices (state transition matrix a, control transition matrix b, state cost Hessian q, and control cost Hessian r) from the MuJoCo model parameters (mass matrix, joint stiffness, damping, and timestep).

The function then solves the discrete algebraic Riccati equation (DARE) using scipy.linalg.solve_discrete_are to obtain the optimal cost-to-go Hessian p. From this, it derives the optimal linear feedback gain matrix k such that the optimal control is u = k * x. The function also computes beta, the maximum eigenvalue of the closed-loop transition matrix (a + b * k), which characterizes the convergence rate of the controlled system. If beta >= 1.0, the controlled system is unstable and a RuntimeError is raised.

This solver is designed specifically for the procedurally generated LQR environments in the lqr domain module. It provides a ground-truth baseline for evaluating learned control policies.

Usage

Use this implementation to obtain the optimal policy for LQR environments as a reference baseline. Import directly as from dm_control.suite import lqr_solver and call lqr_solver.solve(env).

Code Reference

Source Location

Signature

def solve(env):
    """Returns the optimal value and policy for LQR problem.

    Args:
        env: An instance of control.EnvironmentV2 with LQR level.

    Returns:
        p: Hessian of the optimal cost-to-go, V(x) = .5 * x' * p * x.
        k: Optimal linear policy gain matrix, u = k * x.
        beta: Maximum eigenvalue of (a + b * k), convergence rate.

    Raises:
        RuntimeError: If the controlled system is unstable.
    """

Import

from dm_control.suite import lqr_solver

I/O Contract

Inputs

Name Type Required Description
env dm_control.rl.control.Environment Yes An LQR environment instance (created via the lqr domain).

Outputs

Name Type Description
p numpy array (2n, 2n) Hessian of the optimal total cost-to-go: V(x) = 0.5 * x' * p * x.
k numpy array (m, 2n) Optimal linear feedback gain matrix: u = k * x.
beta float Maximum eigenvalue of closed-loop transition matrix; state converges to 0 like beta^n.

Exceptions

Exception Condition
RuntimeError Raised if the controlled system is unstable (beta >= 1.0).

Usage Examples

from dm_control import suite
from dm_control.suite import lqr_solver
import numpy as np

# Load an LQR environment
env = suite.load(domain_name='lqr', task_name='lqr_2_1')

# Compute the optimal policy
p, k, beta = lqr_solver.solve(env)

print(f"Convergence rate (beta): {beta:.4f}")
print(f"Optimal gain matrix shape: {k.shape}")

# Use the optimal policy
time_step = env.reset()
while not time_step.last():
    state = np.concatenate([
        time_step.observation['position'],
        time_step.observation['velocity']
    ])
    action = k.dot(state)
    time_step = env.step(action)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment