Merge pull request #1078 from aparekh02/master

[DOCS] Jarvis Agent using Gemini Nano Banana
pull/1079/head
Kye Gomez 4 weeks ago committed by GitHub
commit f064e31cd8
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

@ -0,0 +1,147 @@
# Gemini Nano Banana with Jarvis Agent by Swarms
## Overview
Gemini Nano Banana, powered by Google's Gemini 2.5 Flash Image Preview, is now integrated with Swarms agents via the Jarvis Agent. This enables advanced image generation, editing, and analysis in Swarms workflows, with secure, real-time processing and flexible deployment.
| Industry | Use Case | Business Impact |
|----------|----------|-----------------|
| **Healthcare** | Medical image analysis, radiology report generation, patient data visualization | Faster diagnosis, reduced errors, improved patient outcomes |
| **Manufacturing** | Quality control inspection, defect detection, predictive maintenance visualization | Reduced waste, improved efficiency, proactive maintenance |
| **Retail** | Product catalog generation, visual search, inventory management | Enhanced customer experience, automated workflows, reduced manual labor |
| **Real Estate** | Property visualization, virtual tours, market analysis charts | Improved sales conversion, remote property viewing, data-driven insights |
| **Insurance** | Claims processing, damage assessment, fraud detection | Accelerated claims settlement, reduced fraud losses, improved accuracy |
| **Financial Services** | Risk visualization, market trend charts, document processing | Enhanced decision-making, automated reporting, regulatory compliance |
## Get Started
### Prerequisites
Before using Gemini Nano Banana models, ensure you have:
1. **Swarms Framework**: Install the latest version of Swarms
```bash
pip install swarms
```
2. **API Access**: Configure your Gemini API credentials
```bash
export GEMINI_API_KEY="your-api-key-here"
```
3. **Image Dependencies**: For image processing tasks, ensure you have the necessary image libraries
```bash
pip install pillow requests
```
### Model Configuration
The Gemini models are accessed through the `gemini/gemini-2.5-flash-image-preview` model identifier, which provides optimized performance for image-related tasks.
## Image Generation
Create a new Python file (e.g., `img_gen_nano_banana.py`) and use the following code to generate images from text prompts:
```python
from swarms import Agent
IMAGE_GEN_SYSTEM_PROMPT = (
"You are an advanced image generation agent. Given a textual description, generate a high-quality, photorealistic image that matches the prompt. "
"Return only the generated image."
)
image_gen_agent = Agent(
agent_name="Image-Generation-Agent",
agent_description="Agent specialized in generating high-quality, photorealistic images from textual prompts.",
model_name="gemini/gemini-2.5-flash-image-preview", # Replace with your preferred image generation model if available
dynamic_temperature_enabled=True,
max_loops=1,
dynamic_context_window=True,
retry_interval=1,
)
image_gen_out = image_gen_agent.run(
task=f"{IMAGE_GEN_SYSTEM_PROMPT} \n\n Generate a photorealistic image of a futuristic city skyline at sunset.",
)
print("Image Generation Output:")
print(image_gen_out)
```
## Image Editing and Annotation
For image editing and annotation tasks, create a new file (e.g., `jarvis_agent.py`) with the following code:
```python
from swarms import Agent
SYSTEM_PROMPT = (
"You are a location-based AR experience generator. Highlight points of interest in this image and annotate relevant information about it. "
"Return the image only."
)
# Agent for AR annotation
agent = Agent(
agent_name="Tactical-Strategist-Agent",
agent_description="Agent specialized in tactical strategy, scenario analysis, and actionable recommendations for complex situations.",
model_name="gemini/gemini-2.5-flash-image-preview",
dynamic_temperature_enabled=True,
max_loops=1,
dynamic_context_window=True,
retry_interval=1,
)
out = agent.run(
task=f"{SYSTEM_PROMPT} \n\n Annotate all the tallest buildings in the image",
img="hk.jpg",
)
```
## Additional Features
### Custom System Prompts
| Prompt Type | Example | Enterprise Use Case |
|-------------|---------|-------------------|
| Architectural Analysis | "Analyze building structures and materials" | Real estate due diligence |
| Medical Imaging | "Highlight anomalies in medical scans" | Healthcare diagnostics |
| Artistic Enhancement | "Enhance colors and composition" | Marketing content creation |
| Object Detection | "Identify and label all objects" | Quality control automation |
### Optimization Strategies
| Parameter | Current Setting | Alternative | Impact | Use Case |
|-----------|----------------|-------------|---------|----------|
| `max_loops` | 1 | 2-3 | Higher quality but slower | Complex image editing |
| `dynamic_temperature_enabled` | True | False | More consistent results | Production environments |
| `retry_interval` | 1 | 2-5 | Better error handling | Unstable connections |
| `dynamic_context_window` | True | False | Memory efficient | Large images |
## Security and Compliance
Gemini Nano Banana offers robust enterprise security, including end-to-end data encryption, role-based access controls, and comprehensive audit logging to ensure compliance and privacy. The platform meets major industry certifications, making it suitable for healthcare, finance, and other regulated sectors.
| Security Feature | Certification |
|----------------------|----------------------|
| Data Encryption | SOC 2 Type II |
| Access Controls (RBAC)| HIPAA, PCI DSS |
| Audit Logging | ISO 27001 |
## Connect With Us
If you'd like technical support, join our Discord below and stay updated on our Twitter for new updates!
| Platform | Link | Description |
|----------|------|-------------|
| 📚 Documentation | [docs.swarms.world](https://docs.swarms.world) | Official documentation and guides |
| 📝 Blog | [Medium](https://medium.com/@kyeg) | Latest updates and technical articles |
| 💬 Discord | [Join Discord](https://discord.gg/EamjgSaEQf) | Live chat and community support |
| 🐦 Twitter | [@kyegomez](https://twitter.com/kyegomez) | Latest news and announcements |
| 👥 LinkedIn | [The Swarm Corporation](https://www.linkedin.com/company/the-swarm-corporation) | Professional network and updates |
| 📺 YouTube | [Swarms Channel](https://www.youtube.com/channel/UC9yXyitkbU_WSy7bd_41SqQ) | Tutorials and demos |
| 🎫 Events | [Sign up here](https://lu.ma/5p2jnc2v) | Join our community events |

@ -372,6 +372,7 @@ nav:
- Agent with Streaming: "examples/agent_stream.md" - Agent with Streaming: "examples/agent_stream.md"
- Agent Output Types: "swarms/examples/agent_output_types.md" - Agent Output Types: "swarms/examples/agent_output_types.md"
- Gradio Chat Interface: "swarms/ui/main.md" - Gradio Chat Interface: "swarms/ui/main.md"
- Agent with Gemini Nano Banana: "swarms/examples/jarvis_agent.md"
- LLM Providers: - LLM Providers:
- Language Models: - Language Models:
- How to Create A Custom Language Model: "swarms/models/custom_model.md" - How to Create A Custom Language Model: "swarms/models/custom_model.md"

Loading…
Cancel
Save