Merge branch 'kyegomez:master' into feat/testing_suite

pull/948/head
harshalmore31 3 days ago committed by GitHub
commit bf22003941
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

@ -14,7 +14,7 @@ jobs:
python-version: '3.10'
- name: Cache pip dependencies
uses: actions/cache@v3
uses: actions/cache@v4
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ hashFiles('**/pyproject.toml') }}

@ -34,7 +34,7 @@ jobs:
docker build -t docker.io/my-organization/my-app:${{ github.sha }} .
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@76071ef0d7ec797419534a183b498b4d6366cf37
uses: aquasecurity/trivy-action@dc5a429b52fcf669ce959baa2c2dd26090d2a6c4
with:
image-ref: 'docker.io/my-organization/my-app:${{ github.sha }}'
format: 'template'

@ -0,0 +1,170 @@
# Bug Report: Swarms Codebase Issues
## Bug 1: Error Handling in Daemon Thread (Critical)
**Location:** `swarms/structs/agent.py` lines 1446-1453
**Description:** The `_handle_run_error` method creates a daemon thread to handle errors, but the exception raised in `__handle_run_error` will not propagate back to the main thread. This causes silent failures where errors are logged but not properly propagated to the caller.
**Type:** Concurrency/Error Handling Bug
**Severity:** Critical - Can lead to silent failures
**Current Code:**
```python
def _handle_run_error(self, error: any):
process_thread = threading.Thread(
target=self.__handle_run_error,
args=(error,),
daemon=True,
)
process_thread.start()
```
**Problem:**
- The daemon thread will exit when the main thread exits
- The `raise error` at the end of `__handle_run_error` occurs in the daemon thread, not the main thread
- This means exceptions are lost and not properly handled by the calling code
**Fix:** Remove the threading wrapper and call the error handler directly, or use proper exception propagation.
---
## Bug 2: Method Name Typo (Logic Error)
**Location:** `swarms/structs/agent.py` lines 2128 and 2122
**Description:** There are two related typos in the response filtering functionality:
1. The method `apply_reponse_filters` has a typo in the name - it should be `apply_response_filters`
2. The `add_response_filter` method accesses `self.reponse_filters` instead of `self.response_filters`
**Type:** Naming/Logic Error
**Severity:** Medium - Can cause AttributeError when called
**Current Code:**
```python
def add_response_filter(self, filter_word: str) -> None:
logger.info(f"Adding response filter: {filter_word}")
self.reponse_filters.append(filter_word) # TYPO: reponse_filters
def apply_reponse_filters(self, response: str) -> str: # TYPO: apply_reponse_filters
"""
Apply the response filters to the response
"""
logger.info(
f"Applying response filters to response: {response}"
)
for word in self.response_filters:
response = response.replace(word, "[FILTERED]")
return response
```
**Problem:**
- Method name is misspelled: `apply_reponse_filters` instead of `apply_response_filters`
- Attribute access is misspelled: `self.reponse_filters` instead of `self.response_filters`
- The method is called correctly in `filtered_run` method, suggesting these are typos
**Fix:** Fix both typos to use correct spelling.
---
## Bug 3: Document Ingestion Logic Error (Data Loss)
**Location:** `swarms/structs/agent.py` lines 2193-2212
**Description:** The `ingest_docs` method has a logic error where it processes all documents in a loop but only retains the data from the last document. All previous documents are processed but their data is overwritten and lost.
**Type:** Logic Error
**Severity:** High - Causes data loss
**Current Code:**
```python
def ingest_docs(self, docs: List[str], *args, **kwargs):
"""Ingest the docs into the memory
Args:
docs (List[str]): Documents of pdfs, text, csvs
Returns:
None
"""
try:
for doc in docs:
data = data_to_text(doc)
return self.short_memory.add(
role=self.user_name, content=data
)
except Exception as error:
logger.info(f"Error ingesting docs: {error}", "red")
```
**Problem:**
- The `data` variable is overwritten on each iteration
- Only the last document's data is actually added to memory
- All previous documents are processed but their data is lost
- The method should either process documents individually or combine all data
**Fix:** Accumulate all document data or process each document individually.
---
## Impact Assessment
1. **Bug 1 (Critical):** Can cause silent failures in production, making debugging difficult
2. **Bug 2 (Medium):** Will cause AttributeError when the method is called correctly
3. **Bug 3 (High):** Causes data loss when ingesting multiple documents
## Fixes Applied
### Bug 1 Fix - Error Handling
**Status:** ✅ FIXED
Changed the `_handle_run_error` method to call `__handle_run_error` directly instead of using a daemon thread:
```python
def _handle_run_error(self, error: any):
# Handle error directly instead of using daemon thread
# to ensure proper exception propagation
self.__handle_run_error(error)
```
### Bug 2 Fix - Method Name Typos
**Status:** ✅ FIXED
Fixed both typos in the response filtering functionality:
1. Renamed `apply_reponse_filters` to `apply_response_filters`
2. Fixed `self.reponse_filters` to `self.response_filters`
### Bug 3 Fix - Document Ingestion Logic
**Status:** ✅ FIXED
Modified the `ingest_docs` method to process all documents and combine their content:
```python
def ingest_docs(self, docs: List[str], *args, **kwargs):
try:
# Process all documents and combine their content
all_data = []
for doc in docs:
data = data_to_text(doc)
all_data.append(f"Document: {doc}\n{data}")
# Combine all document content
combined_data = "\n\n".join(all_data)
return self.short_memory.add(
role=self.user_name, content=combined_data
)
except Exception as error:
logger.info(f"Error ingesting docs: {error}", "red")
```
## Recommendations
1. ✅ Fixed the error handling to properly propagate exceptions
2. ✅ Corrected the method name typos
3. ✅ Fixed the document ingestion logic to process all documents
4. Add unit tests to prevent similar issues in the future
5. Consider adding linting rules to catch method name typos
6. Consider code review processes to catch similar issues

@ -86,7 +86,7 @@ swarms = "swarms.cli.main:main"
[tool.poetry.group.lint.dependencies]
black = ">=23.1,<26.0"
ruff = ">=0.5.1,<0.11.14"
ruff = ">=0.5.1,<0.12.3"
types-toml = "^0.10.8.1"
types-pytz = ">=2023.3,<2026.0"
types-chardet = "^5.0.4.6"

@ -1444,12 +1444,9 @@ class Agent:
raise error
def _handle_run_error(self, error: any):
process_thread = threading.Thread(
target=self.__handle_run_error,
args=(error,),
daemon=True,
)
process_thread.start()
# Handle error directly instead of using daemon thread
# to ensure proper exception propagation
self.__handle_run_error(error)
async def arun(
self,
@ -2123,9 +2120,9 @@ class Agent:
"""
logger.info(f"Adding response filter: {filter_word}")
self.reponse_filters.append(filter_word)
self.response_filters.append(filter_word)
def apply_reponse_filters(self, response: str) -> str:
def apply_response_filters(self, response: str) -> str:
"""
Apply the response filters to the response
@ -2200,11 +2197,17 @@ class Agent:
None
"""
try:
# Process all documents and combine their content
all_data = []
for doc in docs:
data = data_to_text(doc)
all_data.append(f"Document: {doc}\n{data}")
# Combine all document content
combined_data = "\n\n".join(all_data)
return self.short_memory.add(
role=self.user_name, content=data
role=self.user_name, content=combined_data
)
except Exception as error:
logger.info(f"Error ingesting docs: {error}", "red")

@ -212,44 +212,62 @@ class LiteLLM:
Process vision input specifically for Anthropic models.
Handles Anthropic's specific image format requirements.
"""
# Get base64 encoded image
image_url = get_image_base64(image)
# Extract mime type from the data URI or use default
mime_type = "image/jpeg" # default
if "data:" in image_url and ";base64," in image_url:
mime_type = image_url.split(";base64,")[0].split("data:")[
1
]
# Ensure mime type is one of the supported formats
supported_formats = [
"image/jpeg",
"image/png",
"image/gif",
"image/webp",
]
if mime_type not in supported_formats:
mime_type = (
"image/jpeg" # fallback to jpeg if unsupported
# Check if we can use direct URL
if self._should_use_direct_url(image):
# Use direct URL without base64 conversion
messages.append(
{
"role": "user",
"content": [
{"type": "text", "text": task},
{
"type": "image_url",
"image_url": {
"url": image,
},
},
],
}
)
else:
# Fall back to base64 conversion for local files
image_url = get_image_base64(image)
# Extract mime type from the data URI or use default
mime_type = "image/jpeg" # default
if "data:" in image_url and ";base64," in image_url:
mime_type = image_url.split(";base64,")[0].split("data:")[
1
]
# Ensure mime type is one of the supported formats
supported_formats = [
"image/jpeg",
"image/png",
"image/gif",
"image/webp",
]
if mime_type not in supported_formats:
mime_type = (
"image/jpeg" # fallback to jpeg if unsupported
)
# Construct Anthropic vision message
messages.append(
{
"role": "user",
"content": [
{"type": "text", "text": task},
{
"type": "image_url",
"image_url": {
"url": image_url,
"format": mime_type,
# Construct Anthropic vision message with base64
messages.append(
{
"role": "user",
"content": [
{"type": "text", "text": task},
{
"type": "image_url",
"image_url": {
"url": image_url,
"format": mime_type,
},
},
},
],
}
)
],
}
)
return messages
@ -260,21 +278,29 @@ class LiteLLM:
Process vision input specifically for OpenAI models.
Handles OpenAI's specific image format requirements.
"""
# Get base64 encoded image with proper format
image_url = get_image_base64(image)
# Prepare vision message
vision_message = {
"type": "image_url",
"image_url": {"url": image_url},
}
# Add format for specific models
extension = Path(image).suffix.lower()
mime_type = (
f"image/{extension[1:]}" if extension else "image/jpeg"
)
vision_message["image_url"]["format"] = mime_type
# Check if we can use direct URL
if self._should_use_direct_url(image):
# Use direct URL without base64 conversion
vision_message = {
"type": "image_url",
"image_url": {"url": image},
}
else:
# Fall back to base64 conversion for local files
image_url = get_image_base64(image)
# Prepare vision message with base64
vision_message = {
"type": "image_url",
"image_url": {"url": image_url},
}
# Add format for specific models
extension = Path(image).suffix.lower()
mime_type = (
f"image/{extension[1:]}" if extension else "image/jpeg"
)
vision_message["image_url"]["format"] = mime_type
# Append vision message
messages.append(
@ -289,44 +315,61 @@ class LiteLLM:
return messages
def _should_use_direct_url(self, image: str) -> bool:
"""
Determine if we should use direct URL passing instead of base64 conversion.
Args:
image (str): The image source (URL or file path)
Returns:
bool: True if we should use direct URL, False if we need base64 conversion
"""
# Only use direct URL for HTTP/HTTPS URLs
if not image.startswith(("http://", "https://")):
return False
# Check for local/custom models that might not support direct URLs
model_lower = self.model_name.lower()
local_indicators = ["localhost", "127.0.0.1", "local", "custom", "ollama", "llama-cpp"]
is_local = any(indicator in model_lower for indicator in local_indicators) or \
(self.base_url is not None and any(indicator in self.base_url.lower() for indicator in local_indicators))
if is_local:
return False
# Use LiteLLM's supports_vision to check if model supports vision and direct URLs
try:
return supports_vision(model=self.model_name)
except Exception:
return False
def vision_processing(
self, task: str, image: str, messages: Optional[list] = None
):
"""
Process the image for the given task.
Handles different image formats and model requirements.
This method now intelligently chooses between:
1. Direct URL passing (when model supports it and image is a URL)
2. Base64 conversion (for local files or unsupported models)
This approach reduces server load and improves performance by avoiding
unnecessary image downloads and base64 conversions when possible.
"""
# # # Handle Anthropic models separately
# # if "anthropic" in self.model_name.lower() or "claude" in self.model_name.lower():
# # messages = self.anthropic_vision_processing(task, image, messages)
# # return messages
# # Get base64 encoded image with proper format
# image_url = get_image_base64(image)
# # Prepare vision message
# vision_message = {
# "type": "image_url",
# "image_url": {"url": image_url},
# }
# # Add format for specific models
# extension = Path(image).suffix.lower()
# mime_type = f"image/{extension[1:]}" if extension else "image/jpeg"
# vision_message["image_url"]["format"] = mime_type
# # Append vision message
# messages.append(
# {
# "role": "user",
# "content": [
# {"type": "text", "text": task},
# vision_message,
# ],
# }
# )
# return messages
logger.info(f"Processing image for model: {self.model_name}")
# Log whether we're using direct URL or base64 conversion
if self._should_use_direct_url(image):
logger.info(f"Using direct URL passing for image: {image[:100]}...")
else:
if image.startswith(("http://", "https://")):
logger.info("Converting URL image to base64 (model doesn't support direct URLs)")
else:
logger.info("Converting local file to base64")
if (
"anthropic" in self.model_name.lower()
or "claude" in self.model_name.lower()
@ -370,7 +413,16 @@ class LiteLLM:
def check_if_model_supports_vision(self, img: str = None):
"""
Check if the model supports vision.
Check if the model supports vision capabilities.
This method uses LiteLLM's built-in supports_vision function to verify
that the model can handle image inputs before processing.
Args:
img (str, optional): Image path/URL to validate against model capabilities
Raises:
ValueError: If the model doesn't support vision and an image is provided
"""
if img is not None:
out = supports_vision(model=self.model_name)

@ -201,6 +201,119 @@ def run_test_suite():
except Exception as e:
log_test_result("Batched Run", False, str(e))
# Test 8: Vision Support Check
try:
logger.info("Testing vision support check")
llm = LiteLLM(model_name="gpt-4o")
# This should not raise an error for vision-capable models
llm.check_if_model_supports_vision(img="test.jpg")
log_test_result("Vision Support Check", True)
except Exception as e:
log_test_result("Vision Support Check", False, str(e))
# Test 9: Direct URL Processing
try:
logger.info("Testing direct URL processing")
llm = LiteLLM(model_name="gpt-4o")
test_url = "https://github.com/kyegomez/swarms/blob/master/swarms_logo_new.png?raw=true"
should_use_direct = llm._should_use_direct_url(test_url)
assert isinstance(should_use_direct, bool)
log_test_result("Direct URL Processing", True)
except Exception as e:
log_test_result("Direct URL Processing", False, str(e))
# Test 10: Message Preparation with Image
try:
logger.info("Testing message preparation with image")
llm = LiteLLM(model_name="gpt-4o")
# Mock image URL to test message structure
test_img = "https://github.com/kyegomez/swarms/blob/master/swarms_logo_new.png?raw=true"
messages = llm._prepare_messages("Describe this image", img=test_img)
assert isinstance(messages, list)
assert len(messages) >= 1
# Check if image content is properly structured
user_message = next((msg for msg in messages if msg["role"] == "user"), None)
assert user_message is not None
log_test_result("Message Preparation with Image", True)
except Exception as e:
log_test_result("Message Preparation with Image", False, str(e))
# Test 11: Vision Processing Methods
try:
logger.info("Testing vision processing methods")
llm = LiteLLM(model_name="gpt-4o")
messages = []
# Test OpenAI vision processing
processed_messages = llm.openai_vision_processing(
"Describe this image",
"https://github.com/kyegomez/swarms/blob/master/swarms_logo_new.png?raw=true",
messages.copy()
)
assert isinstance(processed_messages, list)
assert len(processed_messages) > 0
# Test Anthropic vision processing
llm_anthropic = LiteLLM(model_name="claude-3-5-sonnet-20241022")
processed_messages_anthropic = llm_anthropic.anthropic_vision_processing(
"Describe this image",
"https://github.com/kyegomez/swarms/blob/master/swarms_logo_new.png?raw=true",
messages.copy()
)
assert isinstance(processed_messages_anthropic, list)
assert len(processed_messages_anthropic) > 0
log_test_result("Vision Processing Methods", True)
except Exception as e:
log_test_result("Vision Processing Methods", False, str(e))
# Test 12: Local vs URL Detection
try:
logger.info("Testing local vs URL detection")
llm = LiteLLM(model_name="gpt-4o")
# Test URL detection
url_test = "https://github.com/kyegomez/swarms/blob/master/swarms_logo_new.png?raw=true"
is_url_direct = llm._should_use_direct_url(url_test)
# Test local file detection
local_test = "/path/to/local/image.jpg"
is_local_direct = llm._should_use_direct_url(local_test)
# URLs should potentially use direct, local files should not
assert isinstance(is_url_direct, bool)
assert isinstance(is_local_direct, bool)
assert is_local_direct == False # Local files should never use direct URL
log_test_result("Local vs URL Detection", True)
except Exception as e:
log_test_result("Local vs URL Detection", False, str(e))
# Test 13: Vision Message Structure
try:
logger.info("Testing vision message structure")
llm = LiteLLM(model_name="gpt-4o")
messages = []
# Test message structure for image input
result = llm.vision_processing(
task="What do you see?",
image="https://github.com/kyegomez/swarms/blob/master/swarms_logo_new.png?raw=true",
messages=messages
)
assert isinstance(result, list)
assert len(result) > 0
# Verify the message contains both text and image components
user_msg = result[-1] # Last message should be user message
assert user_msg["role"] == "user"
assert "content" in user_msg
log_test_result("Vision Message Structure", True)
except Exception as e:
log_test_result("Vision Message Structure", False, str(e))
# Generate test report
success_rate = (passed_tests / total_tests) * 100
logger.info("\n=== Test Suite Report ===")

Loading…
Cancel
Save