Browser Automation

Control a Chromium browser with MyOctopus. Navigate pages, fill forms, take screenshots, and read content. Login sessions persist across app restarts.

How It Works

The Browser Action automation type executes a sequence of browser steps:

Step	Description
Navigate	Go to a URL
Click	Click an element on the page
Fill	Type text into an input field
Read Text	Get text content of a specific element
Read Page	Get all visible text on the page
Screenshot	Capture the page as an image
Wait For	Wait for an element to appear
Wait	Pause for N seconds
Evaluate	Run JavaScript on the page
Close	Close the browser

Persistent Sessions

The browser remembers your login sessions between uses. This means:

WhatsApp Web stays logged in after QR scan
Social media sessions persist
No need to re-authenticate on every use

Screenshot + AI Analysis

Combine screenshots with AI to let Claude see web pages:

"screenshot WhatsApp and list unread messages"

This navigates to WhatsApp Web, takes a screenshot, and sends it to Claude for analysis.

Examples

Command	What happens
"open google.com in the browser"	Launches Chromium, navigates to Google
"fill the search box and click search"	Types into input, clicks button
"read this webpage and summarize it"	Reads page text, sends to Claude
"take a screenshot of the page"	Captures visible viewport

First Use

On first use, MyOctopus installs Chromium automatically. The browser opens visibly (not headless) so you can see what's happening.

Browser Automation ​

How It Works ​

Persistent Sessions ​

Screenshot + AI Analysis ​