Changes from background composer bc-28e5b419-dd49-422f-a10f-4e570cd56ab5

4 days ago · 2c0eed5f9e
parent 440173817d
commit 2c0eed5f9e
2 changed files with 760 additions and 0 deletions
--- a/hierarchical_swarm_improvements.md
+++ b/hierarchical_swarm_improvements.md
@ -0,0 +1,185 @@
+# HierarchicalSwarm Improvement Plan
+
+## Current State Analysis
+
+The current HierarchicalSwarm implementation has several key components:
+- A director agent that creates plans and distributes orders
+- Worker agents that execute assigned tasks
+- Basic feedback loop system
+- Conversation history preservation
+- Simple ordering system with HierarchicalOrder
+
+## Identified Improvement Areas
+
+### 1. Enhanced Hierarchical Communication
+
+**Current Issues:**
+- Limited communication patterns (director → agents only)
+- No peer-to-peer agent communication
+- Static communication channels
+- Basic feedback mechanisms
+
+**Improvements:**
+- Multi-directional communication (director ↔ agents, agents ↔ agents)
+- Communication channels with priorities and routing
+- Structured message passing with protocols
+- Advanced feedback and escalation mechanisms
+
+### 2. Dynamic Role Assignment and Specialization
+
+**Current Issues:**
+- Static agent roles and responsibilities
+- No dynamic task reassignment
+- Limited specialization adaptation
+- Fixed agent capabilities
+
+**Improvements:**
+- Dynamic role assignment based on task complexity and agent performance
+- Skill-based agent selection and specialization
+- Adaptive capability enhancement
+- Role evolution and learning mechanisms
+
+### 3. Multi-level Hierarchy Support
+
+**Current Issues:**
+- Single director-agent hierarchy
+- No sub-swarm management
+- Limited scalability for large teams
+- No hierarchical clustering
+
+**Improvements:**
+- Multi-level hierarchy with middle managers
+- Sub-swarm creation and management
+- Hierarchical clustering algorithms
+- Scalable team structure management
+
+### 4. Advanced Coordination Mechanisms
+
+**Current Issues:**
+- Basic task distribution
+- No resource coordination
+- Limited load balancing
+- No conflict resolution
+
+**Improvements:**
+- Advanced task scheduling and distribution
+- Resource allocation and management
+- Intelligent load balancing
+- Conflict detection and resolution
+
+### 5. Performance Optimizations
+
+**Current Issues:**
+- Sequential task execution
+- No parallel processing optimization
+- Limited caching mechanisms
+- No performance monitoring
+
+**Improvements:**
+- Parallel task execution where possible
+- Intelligent caching and memoization
+- Performance monitoring and optimization
+- Resource usage optimization
+
+### 6. Error Handling and Recovery
+
+**Current Issues:**
+- Basic error logging
+- No recovery mechanisms
+- Limited fault tolerance
+- No graceful degradation
+
+**Improvements:**
+- Comprehensive error handling and recovery
+- Fault tolerance mechanisms
+- Graceful degradation strategies
+- Self-healing capabilities
+
+### 7. Adaptive Planning and Learning
+
+**Current Issues:**
+- Static planning approaches
+- No learning from past executions
+- Limited adaptation to changing conditions
+- No plan optimization
+
+**Improvements:**
+- Adaptive planning algorithms
+- Learning from execution history
+- Dynamic plan optimization
+- Context-aware planning
+
+### 8. Real-time Monitoring and Analytics
+
+**Current Issues:**
+- Limited monitoring capabilities
+- No performance analytics
+- Basic logging only
+- No real-time insights
+
+**Improvements:**
+- Real-time monitoring dashboard
+- Performance analytics and insights
+- Predictive monitoring
+- Advanced logging and metrics
+
+## Implementation Strategy
+
+### Phase 1: Core Communication Enhancement
+1. Enhanced communication protocols
+2. Multi-directional message passing
+3. Priority-based routing
+4. Advanced feedback mechanisms
+
+### Phase 2: Dynamic Role Management
+1. Dynamic role assignment system
+2. Skill-based agent selection
+3. Performance-based specialization
+4. Adaptive capability enhancement
+
+### Phase 3: Multi-level Hierarchy
+1. Sub-swarm management
+2. Hierarchical clustering
+3. Middle manager agents
+4. Scalable team structures
+
+### Phase 4: Advanced Coordination
+1. Intelligent task scheduling
+2. Resource allocation optimization
+3. Load balancing algorithms
+4. Conflict resolution mechanisms
+
+### Phase 5: Performance and Reliability
+1. Parallel processing optimization
+2. Caching and memoization
+3. Error handling and recovery
+4. Monitoring and analytics
+
+## Expected Benefits
+
+1. **Improved Efficiency**: Better task distribution and parallel processing
+2. **Enhanced Scalability**: Support for larger and more complex swarms
+3. **Better Coordination**: Advanced communication and coordination mechanisms
+4. **Higher Reliability**: Robust error handling and recovery
+5. **Adaptive Performance**: Learning and optimization capabilities
+6. **Better Monitoring**: Real-time insights and analytics
+7. **Flexible Architecture**: Support for diverse use cases and requirements
+
+## Implementation Timeline
+
+- **Phase 1**: 2-3 weeks
+- **Phase 2**: 2-3 weeks  
+- **Phase 3**: 3-4 weeks
+- **Phase 4**: 2-3 weeks
+- **Phase 5**: 3-4 weeks
+
+**Total Estimated Timeline**: 12-17 weeks
+
+## Pull Request Strategy
+
+Each phase will result in separate pull requests:
+1. `feat: Enhanced communication protocols for HierarchicalSwarm`
+2. `feat: Dynamic role assignment and specialization system`
+3. `feat: Multi-level hierarchy support with sub-swarms`
+4. `feat: Advanced coordination and scheduling mechanisms`
+5. `feat: Performance optimization and monitoring system`
--- a/swarms/structs/communication.py
+++ b/swarms/structs/communication.py
@ -0,0 +1,575 @@
+"""
+Enhanced Communication Protocol System for HierarchicalSwarm
+
+This module provides advanced communication capabilities including:
+- Multi-directional message passing
+- Priority-based routing
+- Message queuing and buffering
+- Communication channels with different protocols
+- Advanced feedback mechanisms
+"""
+
+import asyncio
+import json
+import time
+import uuid
+from dataclasses import dataclass, field
+from enum import Enum
+from typing import Any, Dict, List, Optional, Protocol, Union, Callable
+from collections import defaultdict, deque
+import threading
+from concurrent.futures import ThreadPoolExecutor
+import logging
+
+logger = logging.getLogger(__name__)
+
+
+class MessageType(Enum):
+    """Types of messages in the communication system"""
+    TASK_ASSIGNMENT = "task_assignment"
+    TASK_COMPLETION = "task_completion"
+    FEEDBACK = "feedback"
+    QUERY = "query"
+    RESPONSE = "response"
+    BROADCAST = "broadcast"
+    ESCALATION = "escalation"
+    COORDINATION = "coordination"
+    RESOURCE_REQUEST = "resource_request"
+    RESOURCE_RESPONSE = "resource_response"
+    STATUS_UPDATE = "status_update"
+    ERROR_REPORT = "error_report"
+    HANDOFF = "handoff"
+    COLLABORATION = "collaboration"
+
+
+class MessagePriority(Enum):
+    """Priority levels for messages"""
+    CRITICAL = 1
+    HIGH = 2
+    MEDIUM = 3
+    LOW = 4
+
+
+class MessageStatus(Enum):
+    """Status of messages"""
+    PENDING = "pending"
+    PROCESSING = "processing"
+    COMPLETED = "completed"
+    FAILED = "failed"
+    TIMEOUT = "timeout"
+
+
+@dataclass
+class Message:
+    """Enhanced message structure for hierarchical communication"""
+    id: str = field(default_factory=lambda: str(uuid.uuid4()))
+    sender_id: str = ""
+    receiver_id: str = ""
+    message_type: MessageType = MessageType.TASK_ASSIGNMENT
+    priority: MessagePriority = MessagePriority.MEDIUM
+    content: Dict[str, Any] = field(default_factory=dict)
+    metadata: Dict[str, Any] = field(default_factory=dict)
+    timestamp: float = field(default_factory=time.time)
+    expiry_time: Optional[float] = None
+    requires_response: bool = False
+    parent_message_id: Optional[str] = None
+    conversation_id: Optional[str] = None
+    status: MessageStatus = MessageStatus.PENDING
+    retry_count: int = 0
+    max_retries: int = 3
+    
+    def is_expired(self) -> bool:
+        """Check if message has expired"""
+        return self.expiry_time is not None and time.time() > self.expiry_time
+    
+    def to_dict(self) -> Dict[str, Any]:
+        """Convert message to dictionary"""
+        return {
+            'id': self.id,
+            'sender_id': self.sender_id,
+            'receiver_id': self.receiver_id,
+            'message_type': self.message_type.value,
+            'priority': self.priority.value,
+            'content': self.content,
+            'metadata': self.metadata,
+            'timestamp': self.timestamp,
+            'expiry_time': self.expiry_time,
+            'requires_response': self.requires_response,
+            'parent_message_id': self.parent_message_id,
+            'conversation_id': self.conversation_id,
+            'status': self.status.value,
+            'retry_count': self.retry_count,
+            'max_retries': self.max_retries
+        }
+
+
+class CommunicationProtocol(Protocol):
+    """Protocol for communication handlers"""
+    
+    def handle_message(self, message: Message) -> Optional[Message]:
+        """Handle an incoming message"""
+        pass
+    
+    def can_handle(self, message_type: MessageType) -> bool:
+        """Check if this protocol can handle the message type"""
+        pass
+
+
+class MessageQueue:
+    """Thread-safe message queue with priority support"""
+    
+    def __init__(self, max_size: int = 1000):
+        self.queues = {
+            MessagePriority.CRITICAL: deque(),
+            MessagePriority.HIGH: deque(),
+            MessagePriority.MEDIUM: deque(),
+            MessagePriority.LOW: deque()
+        }
+        self.max_size = max_size
+        self.lock = threading.Lock()
+        self.condition = threading.Condition(self.lock)
+        self.size = 0
+        
+    def put(self, message: Message, timeout: Optional[float] = None) -> bool:
+        """Add message to queue with timeout"""
+        with self.condition:
+            while self.size >= self.max_size:
+                if timeout is None:
+                    return False
+                if not self.condition.wait(timeout):
+                    return False
+            
+            if message.is_expired():
+                return False
+                
+            self.queues[message.priority].append(message)
+            self.size += 1
+            self.condition.notify()
+            return True
+    
+    def get(self, timeout: Optional[float] = None) -> Optional[Message]:
+        """Get message from queue with timeout"""
+        with self.condition:
+            while self.size == 0:
+                if timeout is None:
+                    self.condition.wait()
+                elif not self.condition.wait(timeout):
+                    return None
+            
+            # Get highest priority message
+            for priority in MessagePriority:
+                if self.queues[priority]:
+                    message = self.queues[priority].popleft()
+                    self.size -= 1
+                    self.condition.notify()
+                    return message
+            
+            return None
+    
+    def peek(self) -> Optional[Message]:
+        """Peek at next message without removing it"""
+        with self.lock:
+            for priority in MessagePriority:
+                if self.queues[priority]:
+                    return self.queues[priority][0]
+            return None
+    
+    def clear_expired(self):
+        """Remove expired messages from queue"""
+        with self.lock:
+            for priority in MessagePriority:
+                queue = self.queues[priority]
+                expired_count = 0
+                while queue and queue[0].is_expired():
+                    queue.popleft()
+                    expired_count += 1
+                self.size -= expired_count
+
+
+class CommunicationChannel:
+    """Communication channel between agents"""
+    
+    def __init__(self, 
+                 channel_id: str,
+                 participants: List[str],
+                 channel_type: str = "direct",
+                 max_queue_size: int = 100):
+        self.channel_id = channel_id
+        self.participants = set(participants)
+        self.channel_type = channel_type
+        self.message_queue = MessageQueue(max_queue_size)
+        self.message_handlers: Dict[MessageType, List[Callable]] = defaultdict(list)
+        self.active = True
+        self.created_at = time.time()
+        
+    def add_participant(self, participant_id: str):
+        """Add participant to channel"""
+        self.participants.add(participant_id)
+        
+    def remove_participant(self, participant_id: str):
+        """Remove participant from channel"""
+        self.participants.discard(participant_id)
+        
+    def send_message(self, message: Message) -> bool:
+        """Send message through channel"""
+        if not self.active:
+            return False
+            
+        if message.sender_id not in self.participants:
+            return False
+            
+        if message.receiver_id and message.receiver_id not in self.participants:
+            return False
+            
+        return self.message_queue.put(message)
+    
+    def receive_message(self, timeout: Optional[float] = None) -> Optional[Message]:
+        """Receive message from channel"""
+        if not self.active:
+            return None
+            
+        return self.message_queue.get(timeout)
+    
+    def register_handler(self, message_type: MessageType, handler: Callable):
+        """Register message handler"""
+        self.message_handlers[message_type].append(handler)
+    
+    def handle_message(self, message: Message) -> List[Message]:
+        """Handle message using registered handlers"""
+        responses = []
+        for handler in self.message_handlers.get(message.message_type, []):
+            try:
+                response = handler(message)
+                if response:
+                    responses.append(response)
+            except Exception as e:
+                logger.error(f"Error handling message {message.id}: {e}")
+                
+        return responses
+
+
+class CommunicationRouter:
+    """Routes messages between agents and channels"""
+    
+    def __init__(self):
+        self.channels: Dict[str, CommunicationChannel] = {}
+        self.agent_channels: Dict[str, List[str]] = defaultdict(list)
+        self.message_history: Dict[str, List[Message]] = defaultdict(list)
+        self.routing_table: Dict[str, str] = {}  # agent_id -> preferred_channel
+        self.lock = threading.Lock()
+        
+    def create_channel(self, 
+                      channel_id: str,
+                      participants: List[str],
+                      channel_type: str = "direct") -> CommunicationChannel:
+        """Create new communication channel"""
+        with self.lock:
+            channel = CommunicationChannel(channel_id, participants, channel_type)
+            self.channels[channel_id] = channel
+            
+            for participant in participants:
+                self.agent_channels[participant].append(channel_id)
+                
+            return channel
+    
+    def get_channel(self, channel_id: str) -> Optional[CommunicationChannel]:
+        """Get communication channel"""
+        return self.channels.get(channel_id)
+    
+    def route_message(self, message: Message) -> bool:
+        """Route message to appropriate channel"""
+        with self.lock:
+            # Find appropriate channel
+            sender_channels = self.agent_channels.get(message.sender_id, [])
+            receiver_channels = self.agent_channels.get(message.receiver_id, [])
+            
+            # Find common channel
+            common_channels = set(sender_channels) & set(receiver_channels)
+            
+            if not common_channels:
+                # Create direct channel if none exists
+                channel_id = f"{message.sender_id}_{message.receiver_id}"
+                channel = self.create_channel(
+                    channel_id, 
+                    [message.sender_id, message.receiver_id],
+                    "direct"
+                )
+                common_channels = {channel_id}
+            
+            # Use first available channel
+            channel_id = next(iter(common_channels))
+            channel = self.channels[channel_id]
+            
+            # Store message in history
+            self.message_history[message.conversation_id or "default"].append(message)
+            
+            return channel.send_message(message)
+    
+    def broadcast_message(self, message: Message, channel_ids: List[str]) -> Dict[str, bool]:
+        """Broadcast message to multiple channels"""
+        results = {}
+        for channel_id in channel_ids:
+            channel = self.channels.get(channel_id)
+            if channel:
+                results[channel_id] = channel.send_message(message)
+            else:
+                results[channel_id] = False
+        return results
+    
+    def get_agent_channels(self, agent_id: str) -> List[str]:
+        """Get channels for an agent"""
+        return self.agent_channels.get(agent_id, [])
+    
+    def get_conversation_history(self, conversation_id: str) -> List[Message]:
+        """Get conversation history"""
+        return self.message_history.get(conversation_id, [])
+
+
+class FeedbackSystem:
+    """Advanced feedback system for hierarchical communication"""
+    
+    def __init__(self):
+        self.feedback_history: Dict[str, List[Dict[str, Any]]] = defaultdict(list)
+        self.performance_metrics: Dict[str, Dict[str, float]] = defaultdict(dict)
+        self.feedback_processors: Dict[str, Callable] = {}
+        
+    def register_feedback_processor(self, feedback_type: str, processor: Callable):
+        """Register feedback processor"""
+        self.feedback_processors[feedback_type] = processor
+        
+    def process_feedback(self, 
+                        agent_id: str,
+                        feedback_type: str,
+                        feedback_data: Dict[str, Any]) -> Dict[str, Any]:
+        """Process feedback for an agent"""
+        processor = self.feedback_processors.get(feedback_type)
+        if processor:
+            processed_feedback = processor(feedback_data)
+        else:
+            processed_feedback = feedback_data
+            
+        # Store feedback history
+        feedback_entry = {
+            'timestamp': time.time(),
+            'type': feedback_type,
+            'data': processed_feedback,
+            'agent_id': agent_id
+        }
+        self.feedback_history[agent_id].append(feedback_entry)
+        
+        # Update performance metrics
+        self._update_performance_metrics(agent_id, feedback_type, processed_feedback)
+        
+        return processed_feedback
+    
+    def _update_performance_metrics(self, 
+                                   agent_id: str, 
+                                   feedback_type: str,
+                                   feedback_data: Dict[str, Any]):
+        """Update performance metrics based on feedback"""
+        metrics = self.performance_metrics[agent_id]
+        
+        # Extract numeric metrics from feedback
+        for key, value in feedback_data.items():
+            if isinstance(value, (int, float)):
+                metric_key = f"{feedback_type}_{key}"
+                if metric_key in metrics:
+                    # Simple moving average
+                    metrics[metric_key] = (metrics[metric_key] + value) / 2
+                else:
+                    metrics[metric_key] = value
+    
+    def get_agent_performance(self, agent_id: str) -> Dict[str, float]:
+        """Get performance metrics for an agent"""
+        return self.performance_metrics.get(agent_id, {})
+    
+    def get_agent_feedback_history(self, agent_id: str) -> List[Dict[str, Any]]:
+        """Get feedback history for an agent"""
+        return self.feedback_history.get(agent_id, [])
+
+
+class EscalationManager:
+    """Manages escalation of issues in hierarchical communication"""
+    
+    def __init__(self):
+        self.escalation_rules: Dict[str, Dict[str, Any]] = {}
+        self.escalation_history: List[Dict[str, Any]] = []
+        self.escalation_handlers: Dict[str, Callable] = {}
+        
+    def register_escalation_rule(self, 
+                                rule_id: str,
+                                condition: Callable,
+                                escalation_target: str,
+                                escalation_level: int = 1):
+        """Register escalation rule"""
+        self.escalation_rules[rule_id] = {
+            'condition': condition,
+            'target': escalation_target,
+            'level': escalation_level
+        }
+        
+    def register_escalation_handler(self, level: int, handler: Callable):
+        """Register escalation handler for specific level"""
+        self.escalation_handlers[f"level_{level}"] = handler
+        
+    def check_escalation(self, message: Message) -> Optional[str]:
+        """Check if message should be escalated"""
+        for rule_id, rule in self.escalation_rules.items():
+            if rule['condition'](message):
+                return rule['target']
+        return None
+    
+    def escalate_message(self, message: Message, escalation_target: str) -> Message:
+        """Escalate message to higher level"""
+        escalation_message = Message(
+            sender_id=message.receiver_id,
+            receiver_id=escalation_target,
+            message_type=MessageType.ESCALATION,
+            priority=MessagePriority.HIGH,
+            content={
+                'original_message': message.to_dict(),
+                'escalation_reason': "Automatic escalation triggered",
+                'escalation_timestamp': time.time()
+            },
+            parent_message_id=message.id,
+            conversation_id=message.conversation_id
+        )
+        
+        # Record escalation
+        self.escalation_history.append({
+            'timestamp': time.time(),
+            'original_message_id': message.id,
+            'escalation_message_id': escalation_message.id,
+            'escalation_target': escalation_target
+        })
+        
+        return escalation_message
+
+
+class CommunicationManager:
+    """Main communication manager for hierarchical swarm"""
+    
+    def __init__(self):
+        self.router = CommunicationRouter()
+        self.feedback_system = FeedbackSystem()
+        self.escalation_manager = EscalationManager()
+        self.active_conversations: Dict[str, Dict[str, Any]] = {}
+        self.message_processors: List[Callable] = []
+        self.executor = ThreadPoolExecutor(max_workers=10)
+        self.running = False
+        
+    def start(self):
+        """Start the communication manager"""
+        self.running = True
+        
+    def stop(self):
+        """Stop the communication manager"""
+        self.running = False
+        self.executor.shutdown(wait=True)
+        
+    def register_message_processor(self, processor: Callable):
+        """Register message processor"""
+        self.message_processors.append(processor)
+        
+    def send_message(self, message: Message) -> bool:
+        """Send message through the system"""
+        if not self.running:
+            return False
+            
+        # Process message through registered processors
+        for processor in self.message_processors:
+            try:
+                message = processor(message) or message
+            except Exception as e:
+                logger.error(f"Error processing message: {e}")
+                return False
+        
+        # Check for escalation
+        escalation_target = self.escalation_manager.check_escalation(message)
+        if escalation_target:
+            escalation_message = self.escalation_manager.escalate_message(message, escalation_target)
+            self.router.route_message(escalation_message)
+        
+        # Route message
+        return self.router.route_message(message)
+    
+    def create_conversation(self, 
+                          conversation_id: str,
+                          participants: List[str],
+                          conversation_type: str = "group") -> str:
+        """Create new conversation"""
+        channel_id = f"conv_{conversation_id}"
+        channel = self.router.create_channel(channel_id, participants, conversation_type)
+        
+        self.active_conversations[conversation_id] = {
+            'channel_id': channel_id,
+            'participants': participants,
+            'type': conversation_type,
+            'created_at': time.time()
+        }
+        
+        return channel_id
+    
+    def get_conversation_messages(self, conversation_id: str) -> List[Message]:
+        """Get messages from conversation"""
+        return self.router.get_conversation_history(conversation_id)
+    
+    def process_feedback(self, 
+                        agent_id: str,
+                        feedback_type: str,
+                        feedback_data: Dict[str, Any]) -> Dict[str, Any]:
+        """Process feedback for an agent"""
+        return self.feedback_system.process_feedback(agent_id, feedback_type, feedback_data)
+    
+    def get_agent_performance(self, agent_id: str) -> Dict[str, float]:
+        """Get performance metrics for an agent"""
+        return self.feedback_system.get_agent_performance(agent_id)
+    
+    async def async_send_message(self, message: Message) -> bool:
+        """Send message asynchronously"""
+        loop = asyncio.get_event_loop()
+        return await loop.run_in_executor(self.executor, self.send_message, message)
+    
+    def broadcast_to_all_agents(self, 
+                               message: Message,
+                               exclude_agents: Optional[List[str]] = None) -> Dict[str, bool]:
+        """Broadcast message to all agents"""
+        exclude_agents = exclude_agents or []
+        all_agents = set()
+        
+        for agent_id in self.router.agent_channels.keys():
+            if agent_id not in exclude_agents:
+                all_agents.add(agent_id)
+        
+        # Create broadcast message for each agent
+        results = {}
+        for agent_id in all_agents:
+            agent_message = Message(
+                sender_id=message.sender_id,
+                receiver_id=agent_id,
+                message_type=MessageType.BROADCAST,
+                priority=message.priority,
+                content=message.content,
+                metadata=message.metadata,
+                conversation_id=message.conversation_id
+            )
+            results[agent_id] = self.send_message(agent_message)
+        
+        return results
+    
+    def create_agent_channel(self, agent_id: str) -> str:
+        """Create dedicated channel for agent"""
+        channel_id = f"agent_{agent_id}"
+        self.router.create_channel(channel_id, [agent_id], "agent")
+        return channel_id
+    
+    def get_channel_statistics(self) -> Dict[str, Any]:
+        """Get communication statistics"""
+        stats = {
+            'total_channels': len(self.router.channels),
+            'active_conversations': len(self.active_conversations),
+            'total_agents': len(self.router.agent_channels),
+            'message_history_size': sum(len(history) for history in self.router.message_history.values()),
+            'escalation_count': len(self.escalation_manager.escalation_history)
+        }
+        return stats