Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Openai Openai agents python Computer Use

From Leeroopedia
Knowledge Sources
Domains Computer_Use, Agent_Architecture
Last Updated 2026-02-11 00:00 GMT

Overview

Mechanism that enables an agent to interact with a graphical user interface through screenshot observation and mouse/keyboard actions.

Description

Computer Use allows an LLM agent to control a computer environment by observing screenshots and issuing structured actions (click, type, scroll, keypress, drag). The SDK defines an AsyncComputer abstract interface that implementations must satisfy, providing methods for screenshot capture and all supported input actions (click, double_click, scroll, type, wait, move, keypress, drag).

The architecture separates the computer abstraction from the agent via ComputerTool, which wraps an `AsyncComputer` instance (or a `ComputerProvider` factory) and exposes it as a tool the agent can call. The model (typically `computer-use-preview`) receives screenshots as base64-encoded images and returns structured action commands.

Two lifecycle patterns are supported: singleton (a single shared `AsyncComputer` instance for all runs) and per-request (a `ComputerProvider` factory that creates and disposes computer instances per run context). The per-request pattern avoids state leakage between concurrent agent runs.

Usage

Use this principle when building agents that need to interact with graphical interfaces such as web browsers, desktop applications, or any visual environment. Implement the `AsyncComputer` interface for your specific environment (Playwright for browsers, VNC for desktops, etc.). Use `ComputerProvider` for production scenarios requiring isolation between concurrent runs.

Theoretical Basis

Computer Use implements a perception-action loop adapted for GUI interaction:

Pseudo-code Logic:

# Abstract computer-use loop
while not done:
    screenshot = await computer.screenshot()  # base64 image
    actions = model.decide(screenshot, goal)   # structured actions
    for action in actions:
        if action.type == "click":
            await computer.click(action.x, action.y, action.button)
        elif action.type == "type":
            await computer.type(action.text)
        elif action.type == "scroll":
            await computer.scroll(action.x, action.y, action.dx, action.dy)
        # ... other action types
    screenshot = await computer.screenshot()  # observe result

The key design decisions are:

  1. Abstract interface: `AsyncComputer` decouples the agent from specific computer implementations.
  2. Provider pattern: `ComputerProvider` enables per-request lifecycle management.
  3. Structured actions: Actions are typed (click, type, scroll, etc.) rather than raw commands.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment