3.3 KiB

Raw Permalink Blame History

Browser Automation with Swarms

This example demonstrates how to use browser automation capabilities within the Swarms framework. The BrowserUseAgent class provides a powerful interface for web scraping, navigation, and automated browser interactions using the browser_use library. This is particularly useful for tasks that require real-time web data extraction, form filling, or web application testing.

Install

pip3 install -U swarms browser-use python-dotenv langchain-openai

Environment Variables

# OpenAI API Key (Required for LLM functionality)
OPENAI_API_KEY="your_openai_api_key_here"

Main Code

import asyncio

from browser_use import Agent as BrowserAgent
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI

from swarms import Agent

load_dotenv()


class BrowserUseAgent:
    def __init__(self, agent_name: str = "BrowserAgent", agent_description: str = "A browser agent that can navigate the web and perform tasks."):
        """
        Initialize a BrowserAgent with a given name.

        Args:
            agent_name (str): The name of the browser agent.
        """
        self.agent_name = agent_name
        self.agent_description = agent_description

    async def browser_agent_test(self, task: str):
        """
        Asynchronously run the browser agent on a given task.

        Args:
            task (str): The task prompt for the agent.

        Returns:
            Any: The result of the agent's run method.
        """
        agent = BrowserAgent(
            task=task,
            llm=ChatOpenAI(model="gpt-4.1"),
        )
        result = await agent.run()
        return result.model_dump_json(indent=4)

    def run(self, task: str):
        """
        Run the browser agent synchronously on a given task.

        Args:
            task (str): The task prompt for the agent.

        Returns:
            Any: The result of the agent's run method.
        """
        return asyncio.run(self.browser_agent_test(task))



def browser_agent_tool(task: str):
    """
    Executes a browser automation agent as a callable tool.

    This function instantiates a `BrowserAgent` and runs it synchronously on the provided task prompt.
    The agent will use a language model to interpret the task, control a browser, and return the results
    as a JSON-formatted string.

    Args:
        task (str): 
            A detailed instruction or prompt describing the browser-based task to perform.
            For example, you can instruct the agent to navigate to a website, extract information,
            or interact with web elements.

    Returns:
        str:
            The result of the browser agent's execution, formatted as a JSON string. The output
            typically includes the agent's findings, extracted data, and any relevant observations
            from the automated browser session.

    Example:
        result = browser_agent_tool(
            "Please navigate to https://www.coingecko.com and identify the best performing cryptocurrency coin over the past 24 hours."
        )
        print(result)
    """
    return BrowserAgent().run(task)



agent = Agent(
    name = "Browser Agent",
    model_name = "gpt-4.1",
    tools = [browser_agent_tool],
)

agent.run("Please navigate to https://www.coingecko.com and identify the best performing cryptocurrency coin over the past 24 hours.")

3.3 KiB Raw Permalink Blame History

Browser Automation with Swarms

Install

Environment Variables

Main Code

3.3 KiB

Raw Permalink Blame History