Gemini 3.5 Flash Gains Native Computer Use via API and Agent Platform
TL;DR
- Computer use is now a native built-in tool in Gemini 3.5 Flash, no longer requiring a separate standalone model.
- Agents can observe screens, then click, type, and navigate across browser, mobile, and desktop environments in a repeating loop.
- Google's two optional enterprise safeguards let organizations gate sensitive actions and auto-stop tasks on detected prompt injection.
Computer use has been in Google's AI lineup before, but as a specialized, standalone model that developers had to route to separately. According to Google's blog, that changes with Gemini 3.5 Flash: computer use is now a built-in tool in the model itself, available through the Gemini API and Gemini Enterprise Agent Platform. The practical shift is that you no longer need a dedicated "computer use model" endpoint — the capability comes bundled with Flash.
What the tool does at the mechanics level: the model observes screens via screenshots and generates actions such as clicks, typing, and navigation across browser, mobile, and desktop setups in a repeating loop. Google describes the target use cases as "long-horizon and enterprise automation tasks like continuous software testing and knowledge work across professional applications." Browserbase, Browser Use, and UIPath are cited as early partners with integrations already available.
The safety architecture deserves a close read. Google applied targeted adversarial training specifically for computer use, and is releasing two optional enterprise safeguards: one that requires explicit user confirmation before sensitive or irreversible actions, and one that automatically stops a task if an indirect prompt injection is detected. The company recommends layering those on top of "secure sandboxing, human-in-the-loop verification and strict access controls" as a defense-in-depth approach.
The honest caveat is that both enterprise safeguards are optional. Organizations that skip them get an agent that can take irreversible actions without a confirmation step, a real exposure in any environment where the model's screen access touches sensitive systems. What the announcement does not provide is pricing or latency benchmarks for computer-use tasks, or any specifics on the false-negative rate of the prompt-injection detection mechanism — which is precisely the failure mode that matters most. Bundling computer use into Flash rather than keeping it behind a specialized endpoint is a meaningful expansion for the developer ecosystem, but the operational safety questions are ones each enterprise will have to work out for themselves.
Originally reported by blog.google
Read the original article →Original headline: Google Integrates Computer Use Natively Into Gemini 3.5 Flash via API and Enterprise Agent Platform