vision and tools

pull/926/head
Kye Gomez 6 days ago
parent 7277a3ffbb
commit de2382dafe

@ -317,6 +317,7 @@ nav:
- Agent with Structured Outputs: "swarms/examples/agent_structured_outputs.md"
- Agents with Vision: "swarms/examples/vision_processing.md"
- Agent with Multiple Images: "swarms/examples/multiple_images.md"
- Agents with Vision and Tool Usage: "swarms/examples/vision_tools.md"
- Gradio Chat Interface: "swarms/ui/main.md"
- Various Model Providers:
- OpenAI: "swarms/examples/openai_example.md"

@ -0,0 +1,140 @@
# Agents with Vision and Tool Usage
This tutorial demonstrates how to create intelligent agents that can analyze images and use custom tools to perform specific actions based on their visual observations. You'll learn to build a quality control agent that can process images, identify potential security concerns, and automatically trigger appropriate responses using function calling capabilities.
## What You'll Learn
- How to configure an agent with multi-modal capabilities for image analysis
- How to integrate custom tools and functions with vision-enabled agents
- How to implement automated security analysis based on visual observations
- How to use function calling to trigger specific actions from image analysis results
- Best practices for building production-ready vision agents with tool integration
## Use Cases
This approach is perfect for:
- **Quality Control Systems**: Automated inspection of manufacturing processes
- **Security Monitoring**: Real-time threat detection and response
- **Object Detection**: Identifying and categorizing items in images
- **Compliance Checking**: Ensuring standards are met in various environments
- **Automated Reporting**: Generating detailed analysis reports from visual data
## Installation
Install the swarms package using pip:
```bash
pip install -U swarms
```
## Basic Setup
1. First, set up your environment variables:
```python
WORKSPACE_DIR="agent_workspace"
OPENAI_API_KEY=""
```
## Code
- Create tools for your agent as a function with types and documentation
- Pass tools to your agent `Agent(tools=[list_of_callables])`
- Add your image path to the run method like: `Agent().run(task=task, img=img)`
-
```python
from swarms.structs import Agent
from swarms.prompts.logistics import (
Quality_Control_Agent_Prompt,
)
# Image for analysis
factory_image = "image.jpg"
def security_analysis(danger_level: str) -> str:
"""
Analyzes the security danger level and returns an appropriate response.
Args:
danger_level (str, optional): The level of danger to analyze.
Can be "low", "medium", "high", or None. Defaults to None.
Returns:
str: A string describing the danger level assessment.
- "No danger level provided" if danger_level is None
- "No danger" if danger_level is "low"
- "Medium danger" if danger_level is "medium"
- "High danger" if danger_level is "high"
- "Unknown danger level" for any other value
"""
if danger_level is None:
return "No danger level provided"
if danger_level == "low":
return "No danger"
if danger_level == "medium":
return "Medium danger"
if danger_level == "high":
return "High danger"
return "Unknown danger level"
custom_system_prompt = f"""
{Quality_Control_Agent_Prompt}
You have access to tools that can help you with your analysis. When you need to perform a security analysis, you MUST use the security_analysis function with an appropriate danger level (low, medium, or high) based on your observations.
Always use the available tools when they are relevant to the task. If you determine there is any level of danger or security concern, call the security_analysis function with the appropriate danger level.
"""
# Quality control agent
quality_control_agent = Agent(
agent_name="Quality Control Agent",
agent_description="A quality control agent that analyzes images and provides a detailed report on the quality of the product in the image.",
# model_name="anthropic/claude-3-opus-20240229",
model_name="gpt-4o-mini",
system_prompt=custom_system_prompt,
multi_modal=True,
max_loops=1,
output_type="str-all-except-first",
# tools_list_dictionary=[schema],
tools=[security_analysis],
)
response = quality_control_agent.run(
task="Analyze the image and then perform a security analysis. Based on what you see in the image, determine if there is a low, medium, or high danger level and call the security_analysis function with that danger level",
img=factory_image,
)
```
## Support and Community
If you're facing issues or want to learn more, check out the following resources to join our Discord, stay updated on Twitter, and watch tutorials on YouTube!
| Platform | Link | Description |
|----------|------|-------------|
| 📚 Documentation | [docs.swarms.world](https://docs.swarms.world) | Official documentation and guides |
| 📝 Blog | [Medium](https://medium.com/@kyeg) | Latest updates and technical articles |
| 💬 Discord | [Join Discord](https://discord.gg/jM3Z6M9uMq) | Live chat and community support |
| 🐦 Twitter | [@kyegomez](https://twitter.com/kyegomez) | Latest news and announcements |
| 👥 LinkedIn | [The Swarm Corporation](https://www.linkedin.com/company/the-swarm-corporation) | Professional network and updates |
| 📺 YouTube | [Swarms Channel](https://www.youtube.com/channel/UC9yXyitkbU_WSy7bd_41SqQ) | Tutorials and demos |
| 🎫 Events | [Sign up here](https://lu.ma/5p2jnc2v) | Join our community events |
Loading…
Cancel
Save