Computer Use is the capability that lets an LLM control a desktop environment by taking screenshots, reasoning about what it sees, and emitting mouse and keyboard actions. Anthropic introduced the term with their late-2024 release of Claude Computer Use; OpenAI shipped Operator with similar capabilities; Google integrated comparable features in Gemini and Project Mariner. By 2026 Computer Use is a mainstream agent capability with active production deployments.
How it works under the hood:
- The agent runtime takes a screenshot of the controlled environment (a sandboxed Linux desktop, a virtual browser, sometimes the user's actual machine with consent).
- The screenshot is sent to a vision-capable LLM along with the user's goal and the conversation history.
- The LLM emits an action: "click at coordinates (340, 220)", "type 'invoice 2026-04'", "scroll down 500px", "open new tab".
- The runtime executes the action and takes another screenshot.
- Loop until the goal is achieved or the agent gives up.
The 2026 use cases that have stabilised:
- Workflow automation in legacy systems — controlling old enterprise software that has no API.
- Form filling — government, vendor, healthcare, expense reports.
- Personal task automation — book travel, manage subscriptions, schedule appointments.
- End-to-end testing — natural-language test scenarios driven through a real browser.
- Customer support escalation — agents that can navigate internal tools the way a human support rep would.
- Research and data collection — sites that block traditional scraping but allow browser sessions.
The hard problems:
- Reliability — visual reasoning is fragile. UI changes, modals, loading states and pop-ups all break naive agents. Production-grade Computer Use needs robust retry, recovery and human handoff.
- Speed — every action requires a vision LLM call; minute-scale workflows are typical. Background execution is more common than real-time.
- Cost — vision tokens are expensive; long sessions of dozens of screenshots add up to dollars.
- Security — an agent that can click anywhere can also click "delete" anywhere. Sandboxing and permission boundaries are non-negotiable.
- Prompt injection — malicious content visible on screen can hijack the agent's behaviour, just as text-based prompt injection does in chat.
For a US team in 2026, Computer Use is appropriate when there is no better alternative and the task is consequential enough to justify the cost. APIs beat browser automation; browser automation beats Computer Use. But for the tasks where APIs and DOM-based automation both fail, Computer Use has earned its place as a real production capability rather than a research curiosity. The mature deployments treat the agent as a sandboxed worker with a checkpoint-and-approval gate before any irreversible action.