Implementation:Google deepmind Dm control LQR Solver

Metadata	Value
Implementation	LQR Solver
Domain	Reinforcement_Learning, Control
Source	Google_deepmind_Dm_control
Last Updated	2026-02-15 04:00 GMT

Overview

Concrete tool for computing the optimal value function and linear policy for LQR environments provided by the dm_control Control Suite.

Description

The LQR Solver module computes the analytical optimal solution for the infinite-horizon discrete-time Linear Quadratic Regulator problem. Given an LQR environment, the solve function extracts the system dynamics matrices (state transition matrix a, control transition matrix b, state cost Hessian q, and control cost Hessian r) from the MuJoCo model parameters (mass matrix, joint stiffness, damping, and timestep).

The function then solves the discrete algebraic Riccati equation (DARE) using scipy.linalg.solve_discrete_are to obtain the optimal cost-to-go Hessian p. From this, it derives the optimal linear feedback gain matrix k such that the optimal control is u = k * x. The function also computes beta, the maximum eigenvalue of the closed-loop transition matrix (a + b * k), which characterizes the convergence rate of the controlled system. If beta >= 1.0, the controlled system is unstable and a RuntimeError is raised.

This solver is designed specifically for the procedurally generated LQR environments in the lqr domain module. It provides a ground-truth baseline for evaluating learned control policies.

Usage

Use this implementation to obtain the optimal policy for LQR environments as a reference baseline. Import directly as from dm_control.suite import lqr_solver and call lqr_solver.solve(env).

Code Reference

Source Location

Repository: Google_deepmind_Dm_control
File: dm_control/suite/lqr_solver.py
Lines: 1-81

Signature

def solve(env):
    """Returns the optimal value and policy for LQR problem.

    Args:
        env: An instance of control.EnvironmentV2 with LQR level.

    Returns:
        p: Hessian of the optimal cost-to-go, V(x) = .5 * x' * p * x.
        k: Optimal linear policy gain matrix, u = k * x.
        beta: Maximum eigenvalue of (a + b * k), convergence rate.

    Raises:
        RuntimeError: If the controlled system is unstable.
    """

Import

from dm_control.suite import lqr_solver

I/O Contract

Inputs

Name	Type	Required	Description
`env`	`dm_control.rl.control.Environment`	Yes	An LQR environment instance (created via the `lqr` domain).

Outputs

Name	Type	Description
`p`	numpy array (2n, 2n)	Hessian of the optimal total cost-to-go: `V(x) = 0.5 * x' * p * x`.
`k`	numpy array (m, 2n)	Optimal linear feedback gain matrix: `u = k * x`.
`beta`	float	Maximum eigenvalue of closed-loop transition matrix; state converges to 0 like `beta^n`.

Exceptions

Exception	Condition
`RuntimeError`	Raised if the controlled system is unstable (`beta >= 1.0`).

Usage Examples

from dm_control import suite
from dm_control.suite import lqr_solver
import numpy as np

# Load an LQR environment
env = suite.load(domain_name='lqr', task_name='lqr_2_1')

# Compute the optimal policy
p, k, beta = lqr_solver.solve(env)

print(f"Convergence rate (beta): {beta:.4f}")
print(f"Optimal gain matrix shape: {k.shape}")

# Use the optimal policy
time_step = env.reset()
while not time_step.last():
    state = np.concatenate([
        time_step.observation['position'],
        time_step.observation['velocity']
    ])
    action = k.dot(state)
    time_step = env.step(action)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment