The Firecrawl tool is a powerful web crawling utility that integrates seamlessly with Swarms agents to extract, analyze, and process content from websites. It leverages the Firecrawl API to crawl entire websites, extract structured data, and provide comprehensive content analysis for various use cases including marketing, research, content creation, and data analysis.
Web scraper agents are specialized AI agents that can automatically extract and process information from websites. These agents combine the power of large language models with web scraping tools to intelligently gather, analyze, and structure data from the web.
| **Automatically navigate websites** | Extract relevant information from web pages |
| **Parse and structure data** | Convert HTML content into readable, structured formats |
| **Handle dynamic content** | Process JavaScript-rendered pages and dynamic website elements |
| **Provide intelligent summaries and analysis** | Generate summaries and analyze the scraped content |
| **Scale to multiple websites simultaneously** | Scrape and process data from several websites at once for comprehensive research |
## Install
```bash
pip3 install -U swarms swarms-tools
```
## Environment Setup
```bash
OPENAI_API_KEY="your_openai_api_key_here"
```
## Basic Usage
Here's a simple example of how to create a web scraper agent:
```python
from swarms import Agent
from swarms_tools import scrape_and_format_sync
agent = Agent(
agent_name="Web Scraper Agent",
model_name="gpt-4o-mini",
tools=[scrape_and_format_sync],
dynamic_context_window=True,
dynamic_temperature_enabled=True,
max_loops=1,
system_prompt="You are a web scraper agent. You are given a URL and you need to scrape the website and return the data in a structured format. The format type should be full",
)
out = agent.run(
"Scrape swarms.ai website and provide a full report of the company does. The format type should be full."
)
print(out)
```
## Scraping Multiple Sites
For comprehensive research, you can scrape multiple websites simultaneously using batch execution:
```python
from swarms.structs.multi_agent_exec import batched_grid_agent_execution
from swarms_tools import scrape_and_format_sync
from swarms import Agent
agent = Agent(
agent_name="Web Scraper Agent",
model_name="gpt-4o-mini",
tools=[scrape_and_format_sync],
dynamic_context_window=True,
dynamic_temperature_enabled=True,
max_loops=1,
system_prompt="You are a web scraper agent. You are given a URL and you need to scrape the website and return the data in a structured format. The format type should be full",
)
out = batched_grid_agent_execution(
agents=[agent, agent],
tasks=[
"Scrape swarms.ai website and provide a full report of the company's mission, products, and team. The format type should be full.",
"Scrape langchain.com website and provide a full report of the company's mission, products, and team. The format type should be full.",
],
)
print(out)
```
## Conclusion
Web scraper agents combine AI with advanced automation to efficiently gather and process web data at scale. As you master the basics, explore features like batch processing and custom tools to unlock the full power of AI-driven web scraping.
The `ConcurrentWorkflow` class is designed to facilitate the concurrent execution of multiple agents, each tasked with solving a specific query or problem. This class is particularly useful in scenarios where multiple agents need to work in parallel, allowing for efficient resource utilization and faster completion of tasks. The workflow manages the execution, handles streaming callbacks, and provides optional dashboard monitoring for real-time progress tracking.
The `ConcurrentWorkflow` class is designed to facilitate the concurrent execution of multiple agents, each tasked with solving a specific query or problem. This class is particularly useful in scenarios where multiple agents need to work in parallel, allowing for efficient resource utilization and faster completion of tasks. The workflow manages the execution, collects metadata, and optionally saves the results in a structured format.
Full Path: `swarms.structs.concurrent_workflow`
### Key Features
- **Concurrent Execution**: Runs multiple agents simultaneously using Python's `ThreadPoolExecutor`
- **Interactive Mode**: Supports interactive task modification and execution
- **Caching System**: Implements LRU caching for repeated prompts
- **Progress Tracking**: Optional progress bar for task execution
- **Enhanced Error Handling**: Implements retry mechanism with exponential backoff
- **Input Validation**: Validates task inputs before execution
- **Batch Processing**: Supports running tasks in batches
- **Metadata Collection**: Gathers detailed metadata about each agent's execution
- **Customizable Output**: Allows saving metadata to file or returning as string/dictionary
system_prompt="You are an expert at synthesizing investment advice from multiple perspectives into a coherent recommendation.",
max_loops=1,
model_name="gpt-4o"
)
# Create majority voting system
majority_voting = MajorityVoting(
name="Investment-Advisory-System",
description="Multi-agent system for investment advice",
investment_system = MajorityVoting(
name="Investment-Analysis-System",
description="Multi-agent investment analysis with consensus evaluation",
agents=agents,
consensus_agent=consensus_agent,
verbose=True,
consensus_agent=consensus_agent
output_type="dict"
)
# Run the analysis with majority voting
result = majority_voting.run(
task="Create a table of super high growth opportunities for AI. I have $40k to invest in ETFs, index funds, and more. Please create a table in markdown.",
correct_answer="" # Optional evaluation metric
# Execute investment analysis
result = investment_system.run(
task="""Analyze the following investment scenario and provide recommendations:
- Budget: $50,000
- Risk tolerance: Moderate
- Time horizon: 5-7 years
- Focus areas: Technology, Healthcare, Renewable Energy
Provide specific ETF/index fund recommendations with allocation percentages."""
)
print("Investment Analysis Results:")
print(result)
```
## Batch Execution
### Example 2: Content Creation with Batch Processing
This example shows how to use batch processing for content creation tasks with multiple writing styles.
```python
from swarms import Agent, MajorityVoting
# Initialize multiple agents with different specialties
agents = [
# Initialize content creation agents with different styles
content_agents = [
Agent(
agent_name="Financial-Analysis-Agent",
agent_description="Personal finance advisor focused on market analysis",
system_prompt="You are a financial advisor specializing in market analysis and investment opportunities.",
agent_name="Creative-Writer",
agent_description="Creative content specialist",
system_prompt="You are a creative writer who produces engaging, story-driven content with vivid descriptions.",
max_loops=1,
model_name="gpt-4o"
),
Agent(
agent_name="Risk-Assessment-Agent",
agent_description="Risk analysis and portfolio management expert",
system_prompt="You are a risk assessment expert focused on evaluating investment risks and portfolio diversification.",
agent_name="Technical-Writer",
agent_description="Technical content specialist",
system_prompt="You are a technical writer who focuses on clarity, accuracy, and structured information.",
max_loops=1,
model_name="gpt-4o"
),
Agent(
agent_name="SEO-Optimized-Writer",
agent_description="SEO content specialist",
system_prompt="You are an SEO specialist who creates content optimized for search engines while maintaining quality.",
system_prompt="You are an expert at synthesizing diverse research perspectives into comprehensive, well-supported conclusions.",
max_loops=1,
model_name="gpt-4o"
)
# Create majority voting system
majority_voting = MajorityVoting(
name="Investment-Advisory-System",
description="Multi-agent system for investment advice",
agents=agents,
# Create majority voting system for research
research_system = MajorityVoting(
name="Research-Analysis-System",
description="Concurrent multi-perspective research analysis",
agents=research_agents,
consensus_agent=research_consensus,
verbose=True,
consensus_agent=consensus_agent
output_type="list"
)
# Run the analysis with majority voting
result = majority_voting.batch_run(
task="Create a table of super high growth opportunities for AI. I have $40k to invest in ETFs, index funds, and more. Please create a table in markdown.",
correct_answer="" # Optional evaluation metric
)
# Define research questions for concurrent analysis
research_questions = [
"What are the environmental impacts of electric vehicle adoption?",
"How does remote work affect employee productivity and well-being?",
"What are the economic implications of universal basic income?",
"How can AI be used to improve healthcare outcomes?",
"What are the social effects of social media on mental health?"
system_prompt="You are a web scraper agent. You are given a URL and you need to scrape the website and return the data in a structured format. The format type should be full",
)
out=batched_grid_agent_execution(
agents=[agent,agent],
tasks=[
"Scrape swarms.ai website and provide a full report of the company's mission, products, and team. The format type should be full.",
"Scrape langchain.com website and provide a full report of the company's mission, products, and team. The format type should be full.",
system_prompt="You are a web scraper agent. You are given a URL and you need to scrape the website and return the data in a structured format. The format type should be full",
)
out=agent.run(
"Scrape swarms.ai website and provide a full report of the company does. The format type should be full."
system_prompt="You are a web scraper agent. You are given a URL and you need to scrape the website and return the data in a structured format. The format type should be full",
tools=[scrape_and_format_sync],
dynamic_context_window=True,
dynamic_temperature_enabled=True,
max_loops=1,
streaming_on=True,# Enable streaming mode
print_on=False,# Prevent direct printing (let callback handle it)
logger.debug(f"[SCHEMA] Director schema: {schema}")
returnAgent(
agent_name=self.director_name,
@ -923,7 +924,7 @@ class HierarchicalSwarm:
)
exceptExceptionase:
error_msg=f"❌ Failed to setup director: {str(e)}\n🔍 Traceback: {traceback.format_exc()}\n🐛 If this issue persists, please report it at: https://github.com/kyegomez/swarms/issues"
error_msg=f"[ERROR] Failed to setup director: {str(e)}\n[TRACE] Traceback: {traceback.format_exc()}\n[BUG] If this issue persists, please report it at: https://github.com/kyegomez/swarms/issues"
logger.error(error_msg)
defreliability_checks(self):
@ -963,7 +964,7 @@ class HierarchicalSwarm:
)
exceptExceptionase:
error_msg=f"❌ Failed to setup director: {str(e)}\n🔍 Traceback: {traceback.format_exc()}\n🐛 If this issue persists, please report it at: https://github.com/kyegomez/swarms/issues"
error_msg=f"[ERROR] Failed to setup director: {str(e)}\n[TRACE] Traceback: {traceback.format_exc()}\n[BUG] If this issue persists, please report it at: https://github.com/kyegomez/swarms/issues"
logger.error(error_msg)
defagents_no_print(self):
@ -995,7 +996,9 @@ class HierarchicalSwarm:
"""
try:
ifself.verbose:
logger.info(f"🎯 Running director with task: {task}")
logger.info(
f"[RUN] Running director with task: {task}"
)
ifself.planning_director_agentisnotNone:
plan=self.planning_director_agent.run(
@ -1022,15 +1025,17 @@ class HierarchicalSwarm:
)
ifself.verbose:
logger.success("✅ Director execution completed")
logger.success(
"[SUCCESS] Director execution completed"
)
logger.debug(
f"📋 Director output type: {type(function_call)}"
f"[OUTPUT] Director output type: {type(function_call)}"
)
returnfunction_call
exceptExceptionase:
error_msg=f"❌ Failed to setup director: {str(e)}\n🔍 Traceback: {traceback.format_exc()}\n🐛 If this issue persists, please report it at: https://github.com/kyegomez/swarms/issues"
error_msg=f"[ERROR] Failed to setup director: {str(e)}\n[TRACE] Traceback: {traceback.format_exc()}\n[BUG] If this issue persists, please report it at: https://github.com/kyegomez/swarms/issues"
logger.error(error_msg)
raisee
@ -1059,7 +1064,7 @@ class HierarchicalSwarm:
try:
ifself.verbose:
logger.info(
f"👣 Executing single step for task: {task}"
f"[STEP] Executing single step for task: {task}"
)
# Update dashboard for director execution
@ -1073,7 +1078,7 @@ class HierarchicalSwarm:
ifself.verbose:
logger.info(
f"📋 Parsed plan and {len(orders)} orders"
f"[PARSE] Parsed plan and {len(orders)} orders"
)
# Update dashboard with plan and orders information
error_msg=f"❌ Failed to setup director: {str(e)}\n🔍 Traceback: {traceback.format_exc()}\n🐛 If this issue persists, please report it at: https://github.com/kyegomez/swarms/issues"
error_msg=f"[ERROR] Failed to setup director: {str(e)}\n[TRACE] Traceback: {traceback.format_exc()}\n[BUG] If this issue persists, please report it at: https://github.com/kyegomez/swarms/issues"
error_msg=f"❌ Failed to setup director: {str(e)}\n🔍 Traceback: {traceback.format_exc()}\n🐛 If this issue persists, please report it at: https://github.com/kyegomez/swarms/issues"
error_msg=f"[ERROR] Failed to setup director: {str(e)}\n[TRACE] Traceback: {traceback.format_exc()}\n[BUG] If this issue persists, please report it at: https://github.com/kyegomez/swarms/issues"
logger.error(error_msg)
current_loop+=1
@ -1218,10 +1224,10 @@ class HierarchicalSwarm:
ifself.verbose:
logger.success(
f"🎉 Hierarchical swarm run completed: {self.name}"
f"[COMPLETE] Hierarchical swarm run completed: {self.name}"
)
logger.info(
f"📊 Total loops executed: {current_loop}"
f"[STATS] Total loops executed: {current_loop}"
)
returnhistory_output_formatter(
@ -1234,7 +1240,7 @@ class HierarchicalSwarm:
self.dashboard.update_director_status("ERROR")
self.dashboard.stop()
error_msg=f"❌ Failed to setup director: {str(e)}\n🔍 Traceback: {traceback.format_exc()}\n🐛 If this issue persists, please report it at: https://github.com/kyegomez/swarms/issues"
error_msg=f"[ERROR] Failed to setup director: {str(e)}\n[TRACE] Traceback: {traceback.format_exc()}\n[BUG] If this issue persists, please report it at: https://github.com/kyegomez/swarms/issues"
logger.error(error_msg)
def_get_interactive_task(self)->str:
@ -1275,7 +1281,7 @@ class HierarchicalSwarm:
"""
try:
ifself.verbose:
logger.info("📝 Generating director feedback")
logger.info("[FEEDBACK] Generating director feedback")
"[SUCCESS] Director feedback generated successfully"
)
returnoutput
exceptExceptionase:
error_msg=f"❌ Failed to setup director: {str(e)}\n🔍 Traceback: {traceback.format_exc()}\n🐛 If this issue persists, please report it at: https://github.com/kyegomez/swarms/issues"
error_msg=f"[ERROR] Failed to setup director: {str(e)}\n[TRACE] Traceback: {traceback.format_exc()}\n[BUG] If this issue persists, please report it at: https://github.com/kyegomez/swarms/issues"
error_msg=f"❌ Failed to setup director: {str(e)}\n🔍 Traceback: {traceback.format_exc()}\n🐛 If this issue persists, please report it at: https://github.com/kyegomez/swarms/issues"
error_msg=f"[ERROR] Failed to setup director: {str(e)}\n[TRACE] Traceback: {traceback.format_exc()}\n[BUG] If this issue persists, please report it at: https://github.com/kyegomez/swarms/issues"
f"✅ Successfully parsed plan and {len(orders)} orders"
f"[SUCCESS] Successfully parsed plan and {len(orders)} orders"
)
returnplan,orders
@ -1463,7 +1469,7 @@ class HierarchicalSwarm:
)asjson_err:
ifself.verbose:
logger.warning(
f"⚠️ JSON decode error: {json_err}"
f"[WARN] JSON decode error: {json_err}"
)
pass
# Check if it's a direct function call format
@ -1488,7 +1494,7 @@ class HierarchicalSwarm:
ifself.verbose:
logger.success(
f"✅ Successfully parsed plan and {len(orders)} orders"
f"[SUCCESS] Successfully parsed plan and {len(orders)} orders"
)
returnplan,orders
@ -1497,7 +1503,7 @@ class HierarchicalSwarm:
)asjson_err:
ifself.verbose:
logger.warning(
f"⚠️ JSON decode error: {json_err}"
f"[WARN] JSON decode error: {json_err}"
)
pass
# If no function call found, raise error
@ -1515,7 +1521,7 @@ class HierarchicalSwarm:
ifself.verbose:
logger.success(
f"✅ Successfully parsed plan and {len(orders)} orders"
f"[SUCCESS] Successfully parsed plan and {len(orders)} orders"
)
returnplan,orders
@ -1529,7 +1535,7 @@ class HierarchicalSwarm:
)
exceptExceptionase:
error_msg=f"❌ Failed to parse orders: {str(e)}\n🔍 Traceback: {traceback.format_exc()}\n🐛 If this issue persists, please report it at: https://github.com/kyegomez/swarms/issues"
error_msg=f"[ERROR] Failed to parse orders: {str(e)}\n[TRACE] Traceback: {traceback.format_exc()}\n[BUG] If this issue persists, please report it at: https://github.com/kyegomez/swarms/issues"
f"📋 Executing order {i+1}/{len(orders)}: {order.agent_name}"
f"[ORDER] Executing order {i+1}/{len(orders)}: {order.agent_name}"
)
# Update dashboard for agent execution
@ -1590,13 +1596,13 @@ class HierarchicalSwarm:
ifself.verbose:
logger.success(
f"✅ All {len(orders)} orders executed successfully"
f"[SUCCESS] All {len(orders)} orders executed successfully"
)
returnoutputs
exceptExceptionase:
error_msg=f"❌ Failed to setup director: {str(e)}\n🔍 Traceback: {traceback.format_exc()}\n🐛 If this issue persists, please report it at: https://github.com/kyegomez/swarms/issues"
error_msg=f"[ERROR] Failed to setup director: {str(e)}\n[TRACE] Traceback: {traceback.format_exc()}\n[BUG] If this issue persists, please report it at: https://github.com/kyegomez/swarms/issues"
f"Model {self.model_name} supports reasoning and reasoning enabled is set to {self.reasoning_enabled}. Temperature will be set to 1 for better reasoning as some models may not work with low temperature."
)
self.temperature=1
else:
logger.warning(
f"Model {self.model_name} does not support reasoning and reasoning enabled is set to {self.reasoning_enabled}. Temperature will not be set to 1."
f"Model {self.model_name} may or may not support reasoning and reasoning enabled is set to {self.reasoning_enabled}"
)
if(
self.reasoning_enabledisTrue
andself.check_if_model_name_uses_anthropic(
model_name=self.model_name
)
isTrue
):
ifself.thinking_tokensisNone:
logger.info(
f"Model {self.model_name} is an Anthropic model and reasoning enabled is set to {self.reasoning_enabled}. Thinking tokens is mandatory for Anthropic models."
)
self.thinking_tokens=self.max_tokens/4
if(
self.reasoning_enabledisTrue
andself.check_if_model_name_uses_anthropic(
model_name=self.model_name
)
isTrue
):
logger.info(
"top_p must be greater than 0.95 for Anthropic models with reasoning enabled"