diff --git a/README.md b/README.md
index 9b7eeccf..8b1fc68a 100644
--- a/README.md
+++ b/README.md
@@ -23,7 +23,7 @@ At Swarms, we're transforming the landscape of AI from siloed AI agents to a uni
-----
# 🤝 Schedule a 1-on-1 Session
-Book a [1-on-1 Session with Kye](https://calendly.com/apacai/agora), the Creator, to discuss any issues, provide feedback, or explore how we can improve Swarms for you.
+Book a [1-on-1 Session with Kye](https://calendly.com/swarm-corp/30min), the Creator, to discuss any issues, provide feedback, or explore how we can improve Swarms for you.
----------
diff --git a/docs/features/20swarms.md b/docs/features/20swarms.md
new file mode 100644
index 00000000..5385b2f5
--- /dev/null
+++ b/docs/features/20swarms.md
@@ -0,0 +1,187 @@
+```markdown
+# Swarm Alpha: Data Cruncher
+**Overview**: Processes large datasets.
+**Strengths**: Efficient data handling.
+**Weaknesses**: Requires structured data.
+
+**Pseudo Code**:
+```sql
+FOR each data_entry IN dataset:
+ result = PROCESS(data_entry)
+ STORE(result)
+END FOR
+RETURN aggregated_results
+```
+
+# Swarm Beta: Artistic Ally
+**Overview**: Generates art pieces.
+**Strengths**: Creativity.
+**Weaknesses**: Somewhat unpredictable.
+
+**Pseudo Code**:
+```scss
+INITIATE canvas_parameters
+SELECT art_style
+DRAW(canvas_parameters, art_style)
+RETURN finished_artwork
+```
+
+# Swarm Gamma: Sound Sculptor
+**Overview**: Crafts audio sequences.
+**Strengths**: Diverse audio outputs.
+**Weaknesses**: Complexity in refining outputs.
+
+**Pseudo Code**:
+```sql
+DEFINE sound_parameters
+SELECT audio_style
+GENERATE_AUDIO(sound_parameters, audio_style)
+RETURN audio_sequence
+```
+
+# Swarm Delta: Web Weaver
+**Overview**: Constructs web designs.
+**Strengths**: Modern design sensibility.
+**Weaknesses**: Limited to web interfaces.
+
+**Pseudo Code**:
+```scss
+SELECT template
+APPLY user_preferences(template)
+DESIGN_web(template, user_preferences)
+RETURN web_design
+```
+
+# Swarm Epsilon: Code Compiler
+**Overview**: Writes and compiles code snippets.
+**Strengths**: Quick code generation.
+**Weaknesses**: Limited to certain programming languages.
+
+**Pseudo Code**:
+```scss
+DEFINE coding_task
+WRITE_CODE(coding_task)
+COMPILE(code)
+RETURN executable
+```
+
+# Swarm Zeta: Security Shield
+**Overview**: Detects system vulnerabilities.
+**Strengths**: High threat detection rate.
+**Weaknesses**: Potential false positives.
+
+**Pseudo Code**:
+```sql
+MONITOR system_activity
+IF suspicious_activity_detected:
+ ANALYZE threat_level
+ INITIATE mitigation_protocol
+END IF
+RETURN system_status
+```
+
+# Swarm Eta: Researcher Relay
+**Overview**: Gathers and synthesizes research data.
+**Strengths**: Access to vast databases.
+**Weaknesses**: Depth of research can vary.
+
+**Pseudo Code**:
+```sql
+DEFINE research_topic
+SEARCH research_sources(research_topic)
+SYNTHESIZE findings
+RETURN research_summary
+```
+
+---
+
+# Swarm Theta: Sentiment Scanner
+**Overview**: Analyzes text for sentiment and emotional tone.
+**Strengths**: Accurate sentiment detection.
+**Weaknesses**: Contextual nuances might be missed.
+
+**Pseudo Code**:
+```arduino
+INPUT text_data
+ANALYZE text_data FOR emotional_tone
+DETERMINE sentiment_value
+RETURN sentiment_value
+```
+
+# Swarm Iota: Image Interpreter
+**Overview**: Processes and categorizes images.
+**Strengths**: High image recognition accuracy.
+**Weaknesses**: Can struggle with abstract visuals.
+
+**Pseudo Code**:
+```objective-c
+LOAD image_data
+PROCESS image_data FOR features
+CATEGORIZE image_based_on_features
+RETURN image_category
+```
+
+# Swarm Kappa: Language Learner
+**Overview**: Translates and interprets multiple languages.
+**Strengths**: Supports multiple languages.
+**Weaknesses**: Nuances in dialects might pose challenges.
+
+**Pseudo Code**:
+```vbnet
+RECEIVE input_text, target_language
+TRANSLATE input_text TO target_language
+RETURN translated_text
+```
+
+# Swarm Lambda: Trend Tracker
+**Overview**: Monitors and predicts trends based on data.
+**Strengths**: Proactive trend identification.
+**Weaknesses**: Requires continuous data stream.
+
+**Pseudo Code**:
+```sql
+COLLECT data_over_time
+ANALYZE data_trends
+PREDICT upcoming_trends
+RETURN trend_forecast
+```
+
+# Swarm Mu: Financial Forecaster
+**Overview**: Analyzes financial data to predict market movements.
+**Strengths**: In-depth financial analytics.
+**Weaknesses**: Market volatility can affect predictions.
+
+**Pseudo Code**:
+```sql
+GATHER financial_data
+COMPUTE statistical_analysis
+FORECAST market_movements
+RETURN financial_projections
+```
+
+# Swarm Nu: Network Navigator
+**Overview**: Optimizes and manages network traffic.
+**Strengths**: Efficient traffic management.
+**Weaknesses**: Depends on network infrastructure.
+
+**Pseudo Code**:
+```sql
+MONITOR network_traffic
+IDENTIFY congestion_points
+OPTIMIZE traffic_flow
+RETURN network_status
+```
+
+# Swarm Xi: Content Curator
+**Overview**: Gathers and presents content based on user preferences.
+**Strengths**: Personalized content delivery.
+**Weaknesses**: Limited by available content sources.
+
+**Pseudo Code**:
+```sql
+DEFINE user_preferences
+SEARCH content_sources
+FILTER content_matching_preferences
+DISPLAY curated_content
+```
+
diff --git a/docs/features/SMAPS.md b/docs/features/SMAPS.md
new file mode 100644
index 00000000..c1e60de3
--- /dev/null
+++ b/docs/features/SMAPS.md
@@ -0,0 +1,50 @@
+# Swarms Multi-Agent Permissions System (SMAPS)
+
+## Description
+SMAPS is a robust permissions management system designed to integrate seamlessly with Swarm's multi-agent AI framework. Drawing inspiration from Amazon's IAM, SMAPS ensures secure, granular control over agent actions while allowing for collaborative human-in-the-loop interventions.
+
+## Technical Specification
+
+### 1. Components
+
+- **User Management**: Handle user registrations, roles, and profiles.
+- **Agent Management**: Register, monitor, and manage AI agents.
+- **Permissions Engine**: Define and enforce permissions based on roles.
+- **Multiplayer Interface**: Allows multiple human users to intervene, guide, or collaborate on tasks being executed by AI agents.
+
+### 2. Features
+
+- **Role-Based Access Control (RBAC)**:
+ - Users can be assigned predefined roles (e.g., Admin, Agent Supervisor, Collaborator).
+ - Each role has specific permissions associated with it, defining what actions can be performed on AI agents or tasks.
+
+- **Dynamic Permissions**:
+ - Create custom roles with specific permissions.
+ - Permissions granularity: From broad (e.g., view all tasks) to specific (e.g., modify parameters of a particular agent).
+
+- **Multiplayer Collaboration**:
+ - Multiple users can join a task in real-time.
+ - Collaborators can provide real-time feedback or guidance to AI agents.
+ - A voting system for decision-making when human intervention is required.
+
+- **Agent Supervision**:
+ - Monitor agent actions in real-time.
+ - Intervene, if necessary, to guide agent actions based on permissions.
+
+- **Audit Trail**:
+ - All actions, whether performed by humans or AI agents, are logged.
+ - Review historical actions, decisions, and interventions for accountability and improvement.
+
+### 3. Security
+
+- **Authentication**: Secure login mechanisms with multi-factor authentication options.
+- **Authorization**: Ensure users and agents can only perform actions they are permitted to.
+- **Data Encryption**: All data, whether at rest or in transit, is encrypted using industry-standard protocols.
+
+### 4. Integration
+
+- **APIs**: Expose APIs for integrating SMAPS with other systems or for extending its capabilities.
+- **SDK**: Provide software development kits for popular programming languages to facilitate integration and extension.
+
+## Documentation Description
+Swarms Multi-Agent Permissions System (SMAPS) offers a sophisticated permissions management mechanism tailored for multi-agent AI frameworks. It combines the robustness of Amazon IAM-like permissions with a unique "multiplayer" feature, allowing multiple humans to collaboratively guide AI agents in real-time. This ensures not only that tasks are executed efficiently but also that they uphold the highest standards of accuracy and ethics. With SMAPS, businesses can harness the power of swarms with confidence, knowing that they have full control and transparency over their AI operations.
diff --git a/docs/features/agent_archive.md b/docs/features/agent_archive.md
new file mode 100644
index 00000000..d69e18ce
--- /dev/null
+++ b/docs/features/agent_archive.md
@@ -0,0 +1,73 @@
+# AgentArchive Documentation
+## Swarms Multi-Agent Framework
+
+**AgentArchive is an advanced feature crafted to archive, bookmark, and harness the transcripts of agent runs. It promotes the storing and leveraging of successful agent interactions, offering a powerful means for users to derive "recipes" for future agents. Furthermore, with its public archive feature, users can contribute to and benefit from the collective wisdom of the community.**
+
+---
+
+## Overview:
+
+AgentArchive empowers users to:
+1. Preserve complete transcripts of agent instances.
+2. Bookmark and annotate significant runs.
+3. Categorize runs using various tags.
+4. Transform successful runs into actionable "recipes".
+5. Publish and access a shared knowledge base via a public archive.
+
+---
+
+## Features:
+
+### 1. Archiving:
+
+- **Save Transcripts**: Retain the full narrative of an agent's interaction and choices.
+- **Searchable Database**: Dive into archives using specific keywords, timestamps, or tags.
+
+### 2. Bookmarking:
+
+- **Highlight Essential Runs**: Designate specific agent runs for future reference.
+- **Annotations**: Embed notes or remarks to bookmarked runs for clearer understanding.
+
+### 3. Tagging:
+
+Organize and classify agent runs via:
+- **Prompt**: The originating instruction that triggered the agent run.
+- **Tasks**: Distinct tasks or operations executed by the agent.
+- **Model**: The specific AI model or iteration used during the interaction.
+- **Temperature (Temp)**: The set randomness or innovation level for the agent.
+
+### 4. Recipe Generation:
+
+- **Standardization**: Convert successful run transcripts into replicable "recipes".
+- **Guidance**: Offer subsequent agents a structured approach, rooted in prior successes.
+- **Evolution**: Periodically refine recipes based on newer, enhanced runs.
+
+### 5. Public Archive & Sharing:
+
+- **Publish Successful Runs**: Users can choose to share their successful agent runs.
+- **Collaborative Knowledge Base**: Access a shared repository of successful agent interactions from the community.
+- **Ratings & Reviews**: Users can rate and review shared runs, highlighting particularly effective "recipes."
+- **Privacy & Redaction**: Ensure that any sensitive information is automatically redacted before publishing.
+
+---
+
+## Benefits:
+
+1. **Efficiency**: Revisit past agent activities to inform and guide future decisions.
+2. **Consistency**: Guarantee a uniform approach to recurring challenges, leading to predictable and trustworthy outcomes.
+3. **Collaborative Learning**: Tap into a reservoir of shared experiences, fostering community-driven learning and growth.
+4. **Transparency**: By sharing successful runs, users can build trust and contribute to the broader community's success.
+
+---
+
+## Usage:
+
+1. **Access AgentArchive**: Navigate to the dedicated section within the Swarms Multi-Agent Framework dashboard.
+2. **Search, Filter & Organize**: Utilize the search bar and tagging system for precise retrieval.
+3. **Bookmark, Annotate & Share**: Pin important runs, add notes, and consider sharing with the broader community.
+4. **Engage with Public Archive**: Explore, rate, and apply shared knowledge to enhance agent performance.
+
+---
+
+With AgentArchive, users not only benefit from their past interactions but can also leverage the collective expertise of the Swarms community, ensuring continuous improvement and shared success.
+
diff --git a/docs/features/fail_protocol.md b/docs/features/fail_protocol.md
new file mode 100644
index 00000000..cc0a6b99
--- /dev/null
+++ b/docs/features/fail_protocol.md
@@ -0,0 +1,67 @@
+# Swarms Multi-Agent Framework Documentation
+
+## Table of Contents
+- Agent Failure Protocol
+- Swarm Failure Protocol
+
+---
+
+## Agent Failure Protocol
+
+### 1. Overview
+Agent failures may arise from bugs, unexpected inputs, or external system changes. This protocol aims to diagnose, address, and prevent such failures.
+
+### 2. Root Cause Analysis
+- **Data Collection**: Record the task, inputs, and environmental variables present during the failure.
+- **Diagnostic Tests**: Run the agent in a controlled environment replicating the failure scenario.
+- **Error Logging**: Analyze error logs to identify patterns or anomalies.
+
+### 3. Solution Brainstorming
+- **Code Review**: Examine the code sections linked to the failure for bugs or inefficiencies.
+- **External Dependencies**: Check if external systems or data sources have changed.
+- **Algorithmic Analysis**: Evaluate if the agent's algorithms were overwhelmed or faced an unhandled scenario.
+
+### 4. Risk Analysis & Solution Ranking
+- Assess the potential risks associated with each solution.
+- Rank solutions based on:
+ - Implementation complexity
+ - Potential negative side effects
+ - Resource requirements
+- Assign a success probability score (0.0 to 1.0) based on the above factors.
+
+### 5. Solution Implementation
+- Implement the top 3 solutions sequentially, starting with the highest success probability.
+- If all three solutions fail, trigger the "Human-in-the-Loop" protocol.
+
+---
+
+## Swarm Failure Protocol
+
+### 1. Overview
+Swarm failures are more complex, often resulting from inter-agent conflicts, systemic bugs, or large-scale environmental changes. This protocol delves deep into such failures to ensure the swarm operates optimally.
+
+### 2. Root Cause Analysis
+- **Inter-Agent Analysis**: Examine if agents were in conflict or if there was a breakdown in collaboration.
+- **System Health Checks**: Ensure all system components supporting the swarm are operational.
+- **Environment Analysis**: Investigate if external factors or systems impacted the swarm's operation.
+
+### 3. Solution Brainstorming
+- **Collaboration Protocols**: Review and refine how agents collaborate.
+- **Resource Allocation**: Check if the swarm had adequate computational and memory resources.
+- **Feedback Loops**: Ensure agents are effectively learning from each other.
+
+### 4. Risk Analysis & Solution Ranking
+- Assess the potential systemic risks posed by each solution.
+- Rank solutions considering:
+ - Scalability implications
+ - Impact on individual agents
+ - Overall swarm performance potential
+- Assign a success probability score (0.0 to 1.0) based on the above considerations.
+
+### 5. Solution Implementation
+- Implement the top 3 solutions sequentially, prioritizing the one with the highest success probability.
+- If all three solutions are unsuccessful, invoke the "Human-in-the-Loop" protocol for expert intervention.
+
+---
+
+By following these protocols, the Swarms Multi-Agent Framework can systematically address and prevent failures, ensuring a high degree of reliability and efficiency.
diff --git a/docs/features/human_in_loop.md b/docs/features/human_in_loop.md
new file mode 100644
index 00000000..0630c312
--- /dev/null
+++ b/docs/features/human_in_loop.md
@@ -0,0 +1,49 @@
+# Human-in-the-Loop Task Handling Protocol
+
+## Overview
+
+The Swarms Multi-Agent Framework recognizes the invaluable contributions humans can make, especially in complex scenarios where nuanced judgment is required. The "Human-in-the-Loop Task Handling Protocol" ensures that when agents encounter challenges they cannot handle autonomously, the most capable human collaborator is engaged to provide guidance, based on their skills and expertise.
+
+## Protocol Steps
+
+### 1. Task Initiation & Analysis
+
+- When a task is initiated, agents first analyze the task's requirements.
+- The system maintains an understanding of each task's complexity, requirements, and potential challenges.
+
+### 2. Automated Resolution Attempt
+
+- Agents first attempt to resolve the task autonomously using their algorithms and data.
+- If the task can be completed without issues, it progresses normally.
+
+### 3. Challenge Detection
+
+- If agents encounter challenges or uncertainties they cannot resolve, the "Human-in-the-Loop" protocol is triggered.
+
+### 4. Human Collaborator Identification
+
+- The system maintains a dynamic profile of each human collaborator, cataloging their skills, expertise, and past performance on related tasks.
+- Using this profile data, the system identifies the most capable human collaborator to assist with the current challenge.
+
+### 5. Real-time Collaboration
+
+- The identified human collaborator is notified and provided with all the relevant information about the task and the challenge.
+- Collaborators can provide guidance, make decisions, or even take over specific portions of the task.
+
+### 6. Task Completion & Feedback Loop
+
+- Once the challenge is resolved, agents continue with the task until completion.
+- Feedback from human collaborators is used to update agent algorithms, ensuring continuous learning and improvement.
+
+## Best Practices
+
+1. **Maintain Up-to-date Human Profiles**: Ensure that the skillsets, expertise, and performance metrics of human collaborators are updated regularly.
+2. **Limit Interruptions**: Implement mechanisms to limit the frequency of human interventions, ensuring collaborators are not overwhelmed with requests.
+3. **Provide Context**: When seeking human intervention, provide collaborators with comprehensive context to ensure they can make informed decisions.
+4. **Continuous Training**: Regularly update and train agents based on feedback from human collaborators.
+5. **Measure & Optimize**: Monitor the efficiency of the "Human-in-the-Loop" protocol, aiming to reduce the frequency of interventions while maximizing the value of each intervention.
+6. **Skill Enhancement**: Encourage human collaborators to continuously enhance their skills, ensuring that the collective expertise of the group grows over time.
+
+## Conclusion
+
+The integration of human expertise with AI capabilities is a cornerstone of the Swarms Multi-Agent Framework. This "Human-in-the-Loop Task Handling Protocol" ensures that tasks are executed efficiently, leveraging the best of both human judgment and AI automation. Through collaborative synergy, we can tackle challenges more effectively and drive innovation.
diff --git a/docs/features/info_sec.md b/docs/features/info_sec.md
new file mode 100644
index 00000000..855995f5
--- /dev/null
+++ b/docs/features/info_sec.md
@@ -0,0 +1,48 @@
+# Secure Communication Protocols
+
+## Overview
+
+The Swarms Multi-Agent Framework prioritizes the security and integrity of data, especially personal and sensitive information. Our Secure Communication Protocols ensure that all communications between agents are encrypted, authenticated, and resistant to tampering or unauthorized access.
+
+## Features
+
+### 1. End-to-End Encryption
+
+- All inter-agent communications are encrypted using state-of-the-art cryptographic algorithms.
+- This ensures that data remains confidential and can only be read by the intended recipient agent.
+
+### 2. Authentication
+
+- Before initiating communication, agents authenticate each other using digital certificates.
+- This prevents impersonation attacks and ensures that agents are communicating with legitimate counterparts.
+
+### 3. Forward Secrecy
+
+- Key exchange mechanisms employ forward secrecy, meaning that even if a malicious actor gains access to an encryption key, they cannot decrypt past communications.
+
+### 4. Data Integrity
+
+- Cryptographic hashes ensure that the data has not been altered in transit.
+- Any discrepancies in data integrity result in the communication being rejected.
+
+### 5. Zero-Knowledge Protocols
+
+- When handling especially sensitive data, agents use zero-knowledge proofs to validate information without revealing the actual data.
+
+### 6. Periodic Key Rotation
+
+- To mitigate the risk of long-term key exposure, encryption keys are periodically rotated.
+- Old keys are securely discarded, ensuring that even if they are compromised, they cannot be used to decrypt communications.
+
+## Best Practices for Handling Personal and Sensitive Information
+
+1. **Data Minimization**: Agents should only request and process the minimum amount of personal data necessary for the task.
+2. **Anonymization**: Whenever possible, agents should anonymize personal data, stripping away identifying details.
+3. **Data Retention Policies**: Personal data should be retained only for the period necessary to complete the task, after which it should be securely deleted.
+4. **Access Controls**: Ensure that only authorized agents have access to personal and sensitive information. Implement strict access control mechanisms.
+5. **Regular Audits**: Conduct regular security audits to ensure compliance with privacy regulations and to detect any potential vulnerabilities.
+6. **Training**: All agents should be regularly updated and trained on the latest security protocols and best practices for handling sensitive data.
+
+## Conclusion
+
+Secure communication is paramount in the Swarms Multi-Agent Framework, especially when dealing with personal and sensitive information. Adhering to these protocols and best practices ensures the safety, privacy, and trust of all stakeholders involved.
diff --git a/docs/features/promptimizer.md b/docs/features/promptimizer.md
new file mode 100644
index 00000000..2fdc81bb
--- /dev/null
+++ b/docs/features/promptimizer.md
@@ -0,0 +1,68 @@
+# Promptimizer Documentation
+## Swarms Multi-Agent Framework
+
+**The Promptimizer Tool stands as a cornerstone innovation within the Swarms Multi-Agent Framework, meticulously engineered to refine and supercharge prompts across diverse categories. Capitalizing on extensive libraries of best-practice prompting techniques, this tool ensures your prompts are razor-sharp, tailored, and primed for optimal outcomes.**
+
+---
+
+## Overview:
+
+The Promptimizer Tool is crafted to:
+1. Rigorously analyze and elevate the quality of provided prompts.
+2. Furnish best-in-class recommendations rooted in proven prompting strategies.
+3. Serve a spectrum of categories, from technical operations to expansive creative ventures.
+
+---
+
+## Core Features:
+
+### 1. Deep Prompt Analysis:
+
+- **Clarity Matrix**: A proprietary algorithm assessing prompt clarity, removing ambiguities and sharpening focus.
+- **Efficiency Gauge**: Evaluates the prompt's structure to ensure swift and precise desired results.
+
+### 2. Adaptive Recommendations:
+
+- **Technique Engine**: Suggests techniques aligned with the gold standard for the chosen category.
+- **Exemplar Database**: Offers an extensive array of high-quality prompt examples for comparison and inspiration.
+
+### 3. Versatile Category Framework:
+
+- **Tech Suite**: Optimizes prompts for technical tasks, ensuring actionable clarity.
+- **Narrative Craft**: Hones prompts to elicit vivid and coherent stories.
+- **Visual Visionary**: Shapes prompts for precise and dynamic visual generation.
+- **Sonic Sculptor**: Orchestrates prompts for audio creation, tuning into desired tones and moods.
+
+### 4. Machine Learning Integration:
+
+- **Feedback Dynamo**: Harnesses user feedback, continually refining the tool's recommendation capabilities.
+- **Live Library Updates**: Periodic syncing with the latest in prompting techniques, ensuring the tool remains at the cutting edge.
+
+### 5. Collaboration & Sharing:
+
+- **TeamSync**: Allows teams to collaborate on prompt optimization in real-time.
+- **ShareSpace**: Share and access a community-driven repository of optimized prompts, fostering collective growth.
+
+---
+
+## Benefits:
+
+1. **Precision Engineering**: Harness the power of refined prompts, ensuring desired outcomes are achieved with surgical precision.
+2. **Learning Hub**: Immerse in a tool that not only refines but educates, enhancing the user's prompting acumen.
+3. **Versatile Mastery**: Navigate seamlessly across categories, ensuring top-tier prompt quality regardless of the domain.
+4. **Community-driven Excellence**: Dive into a world of shared knowledge, elevating the collective expertise of the Swarms community.
+
+---
+
+## Usage Workflow:
+
+1. **Launch the Prompt Optimizer**: Access the tool directly from the Swarms Multi-Agent Framework dashboard.
+2. **Prompt Entry**: Input the initial prompt for refinement.
+3. **Category Selection**: Pinpoint the desired category for specialized optimization.
+4. **Receive & Review**: Engage with the tool's recommendations, comparing original and optimized prompts.
+5. **Collaborate, Implement & Share**: Work in tandem with team members, deploy the refined prompt, and consider contributing to the community repository.
+
+---
+
+By integrating the Promptimizer Tool into their workflow, Swarms users stand poised to redefine the boundaries of what's possible, turning each prompt into a beacon of excellence and efficiency.
+
diff --git a/docs/features/shorthand.md b/docs/features/shorthand.md
new file mode 100644
index 00000000..e2732b19
--- /dev/null
+++ b/docs/features/shorthand.md
@@ -0,0 +1,68 @@
+# Shorthand Communication System
+## Swarms Multi-Agent Framework
+
+**The Enhanced Shorthand Communication System is designed to streamline agent-agent communication within the Swarms Multi-Agent Framework. This system employs concise alphanumeric notations to relay task-specific details to agents efficiently.**
+
+---
+
+## Format:
+
+The shorthand format is structured as `[AgentType]-[TaskLayer].[TaskNumber]-[Priority]-[Status]`.
+
+---
+
+## Components:
+
+### 1. Agent Type:
+- Denotes the specific agent role, such as:
+ * `C`: Code agent
+ * `D`: Data processing agent
+ * `M`: Monitoring agent
+ * `N`: Network agent
+ * `R`: Resource management agent
+ * `I`: Interface agent
+ * `S`: Security agent
+
+### 2. Task Layer & Number:
+- Represents the task's category.
+ * Example: `1.8` signifies Task layer 1, task number 8.
+
+### 3. Priority:
+- Indicates task urgency.
+ * `H`: High
+ * `M`: Medium
+ * `L`: Low
+
+### 4. Status:
+- Gives a snapshot of the task's progress.
+ * `I`: Initialized
+ * `P`: In-progress
+ * `C`: Completed
+ * `F`: Failed
+ * `W`: Waiting
+
+---
+
+## Extended Features:
+
+### 1. Error Codes (for failures):
+- `E01`: Resource issues
+- `E02`: Data inconsistency
+- `E03`: Dependency malfunction
+... and more as needed.
+
+### 2. Collaboration Flag:
+- `+`: Denotes required collaboration.
+
+---
+
+## Example Codes:
+
+- `C-1.8-H-I`: A high-priority coding task that's initializing.
+- `D-2.3-M-P`: A medium-priority data task currently in-progress.
+- `M-3.5-L-P+`: A low-priority monitoring task in progress needing collaboration.
+
+---
+
+By leveraging the Enhanced Shorthand Communication System, the Swarms Multi-Agent Framework can ensure swift interactions, concise communications, and effective task management.
+
diff --git a/docs/swarms/chunkers/basechunker.md b/docs/swarms/chunkers/basechunker.md
new file mode 100644
index 00000000..fed03277
--- /dev/null
+++ b/docs/swarms/chunkers/basechunker.md
@@ -0,0 +1,146 @@
+# BaseChunker Documentation
+
+## Table of Contents
+1. [Introduction](#introduction)
+2. [Overview](#overview)
+3. [Installation](#installation)
+4. [Usage](#usage)
+ 1. [BaseChunker Class](#basechunker-class)
+ 2. [Examples](#examples)
+5. [Additional Information](#additional-information)
+6. [Conclusion](#conclusion)
+
+---
+
+## 1. Introduction
+
+The `BaseChunker` module is a tool for splitting text into smaller chunks that can be processed by a language model. It is a fundamental component in natural language processing tasks that require handling long or complex text inputs.
+
+This documentation provides an extensive guide on using the `BaseChunker` module, explaining its purpose, parameters, and usage.
+
+---
+
+## 2. Overview
+
+The `BaseChunker` module is designed to address the challenge of processing lengthy text inputs that exceed the maximum token limit of language models. By breaking such text into smaller, manageable chunks, it enables efficient and accurate processing.
+
+Key features and parameters of the `BaseChunker` module include:
+- `separators`: Specifies a list of `ChunkSeparator` objects used to split the text into chunks.
+- `tokenizer`: Defines the tokenizer to be used for counting tokens in the text.
+- `max_tokens`: Sets the maximum token limit for each chunk.
+
+The `BaseChunker` module facilitates the chunking process and ensures that the generated chunks are within the token limit.
+
+---
+
+## 3. Installation
+
+Before using the `BaseChunker` module, ensure you have the required dependencies installed. The module relies on `griptape` and `swarms` libraries. You can install these dependencies using pip:
+
+```bash
+pip install griptape swarms
+```
+
+---
+
+## 4. Usage
+
+In this section, we'll cover how to use the `BaseChunker` module effectively. It consists of the `BaseChunker` class and provides examples to demonstrate its usage.
+
+### 4.1. `BaseChunker` Class
+
+The `BaseChunker` class is the core component of the `BaseChunker` module. It is used to create a `BaseChunker` instance, which can split text into chunks efficiently.
+
+#### Parameters:
+- `separators` (list[ChunkSeparator]): Specifies a list of `ChunkSeparator` objects used to split the text into chunks.
+- `tokenizer` (OpenAiTokenizer): Defines the tokenizer to be used for counting tokens in the text.
+- `max_tokens` (int): Sets the maximum token limit for each chunk.
+
+### 4.2. Examples
+
+Let's explore how to use the `BaseChunker` class with different scenarios and applications.
+
+#### Example 1: Basic Chunking
+
+```python
+from basechunker import BaseChunker, ChunkSeparator
+
+# Initialize the BaseChunker
+chunker = BaseChunker()
+
+# Text to be chunked
+input_text = "This is a long text that needs to be split into smaller chunks for processing."
+
+# Chunk the text
+chunks = chunker.chunk(input_text)
+
+# Print the generated chunks
+for idx, chunk in enumerate(chunks, start=1):
+ print(f"Chunk {idx}: {chunk.value}")
+```
+
+#### Example 2: Custom Separators
+
+```python
+from basechunker import BaseChunker, ChunkSeparator
+
+# Define custom separators
+custom_separators = [ChunkSeparator(","), ChunkSeparator(";")]
+
+# Initialize the BaseChunker with custom separators
+chunker = BaseChunker(separators=custom_separators)
+
+# Text with custom separators
+input_text = "This text, separated by commas; should be split accordingly."
+
+# Chunk the text
+chunks = chunker.chunk(input_text)
+
+# Print the generated chunks
+for idx, chunk in enumerate(chunks, start=1):
+ print(f"Chunk {idx}: {chunk.value}")
+```
+
+#### Example 3: Adjusting Maximum Tokens
+
+```python
+from basechunker import BaseChunker
+
+# Initialize the BaseChunker with a custom maximum token limit
+chunker = BaseChunker(max_tokens=50)
+
+# Long text input
+input_text = "This is an exceptionally long text that should be broken into smaller chunks based on token count."
+
+# Chunk the text
+chunks = chunker.chunk(input_text)
+
+# Print the generated chunks
+for idx, chunk in enumerate(chunks, start=1):
+ print(f"Chunk {idx}: {chunk.value}")
+```
+
+### 4.3. Additional Features
+
+The `BaseChunker` class also provides additional features:
+
+#### Recursive Chunking
+The `_chunk_recursively` method handles the recursive chunking of text, ensuring that each chunk stays within the token limit.
+
+---
+
+## 5. Additional Information
+
+- **Text Chunking**: The `BaseChunker` module is a fundamental tool for text chunking, a crucial step in preprocessing text data for various natural language processing tasks.
+- **Custom Separators**: You can customize the separators used to split the text, allowing flexibility in how text is chunked.
+- **Token Count**: The module accurately counts tokens using the specified tokenizer, ensuring that chunks do not exceed token limits.
+
+---
+
+## 6. Conclusion
+
+The `BaseChunker` module is an essential tool for text preprocessing and handling long or complex text inputs in natural language processing tasks. This documentation has provided a comprehensive guide on its usage, parameters, and examples, enabling you to efficiently manage and process text data by splitting it into manageable chunks.
+
+By using the `BaseChunker`, you can ensure that your text data remains within token limits and is ready for further analysis and processing.
+
+*Please check the official `BaseChunker` repository and documentation for any updates beyond the knowledge cutoff date.*
\ No newline at end of file
diff --git a/docs/swarms/chunkers/pdf_chunker.md b/docs/swarms/chunkers/pdf_chunker.md
new file mode 100644
index 00000000..5b97a551
--- /dev/null
+++ b/docs/swarms/chunkers/pdf_chunker.md
@@ -0,0 +1,147 @@
+# PdfChunker Documentation
+
+## Table of Contents
+1. [Introduction](#introduction)
+2. [Overview](#overview)
+3. [Installation](#installation)
+4. [Usage](#usage)
+ 1. [PdfChunker Class](#pdfchunker-class)
+ 2. [Examples](#examples)
+5. [Additional Information](#additional-information)
+6. [Conclusion](#conclusion)
+
+---
+
+## 1. Introduction
+
+The `PdfChunker` module is a specialized tool designed to split PDF text content into smaller, more manageable chunks. It is a valuable asset for processing PDF documents in natural language processing and text analysis tasks.
+
+This documentation provides a comprehensive guide on how to use the `PdfChunker` module. It covers its purpose, parameters, and usage, ensuring that you can effectively process PDF text content.
+
+---
+
+## 2. Overview
+
+The `PdfChunker` module serves a critical role in handling PDF text content, which is often lengthy and complex. Key features and parameters of the `PdfChunker` module include:
+
+- `separators`: Specifies a list of `ChunkSeparator` objects used to split the PDF text content into chunks.
+- `tokenizer`: Defines the tokenizer used for counting tokens in the text.
+- `max_tokens`: Sets the maximum token limit for each chunk.
+
+By using the `PdfChunker`, you can efficiently prepare PDF text content for further analysis and processing.
+
+---
+
+## 3. Installation
+
+Before using the `PdfChunker` module, ensure you have the required dependencies installed. The module relies on the `swarms` library. You can install this dependency using pip:
+
+```bash
+pip install swarms
+```
+
+---
+
+## 4. Usage
+
+In this section, we'll explore how to use the `PdfChunker` module effectively. It consists of the `PdfChunker` class and provides examples to demonstrate its usage.
+
+### 4.1. `PdfChunker` Class
+
+The `PdfChunker` class is the core component of the `PdfChunker` module. It is used to create a `PdfChunker` instance, which can split PDF text content into manageable chunks.
+
+#### Parameters:
+- `separators` (list[ChunkSeparator]): Specifies a list of `ChunkSeparator` objects used to split the PDF text content into chunks.
+- `tokenizer` (OpenAiTokenizer): Defines the tokenizer used for counting tokens in the text.
+- `max_tokens` (int): Sets the maximum token limit for each chunk.
+
+### 4.2. Examples
+
+Let's explore how to use the `PdfChunker` class with different scenarios and applications.
+
+#### Example 1: Basic Chunking
+
+```python
+from swarms.chunkers.pdf_chunker import PdfChunker
+from swarms.chunkers.chunk_seperator import ChunkSeparator
+
+# Initialize the PdfChunker
+pdf_chunker = PdfChunker()
+
+# PDF text content to be chunked
+pdf_text = "This is a PDF document with multiple paragraphs and sentences. It should be split into smaller chunks for analysis."
+
+# Chunk the PDF text content
+chunks = pdf_chunker.chunk(pdf_text)
+
+# Print the generated chunks
+for idx, chunk in enumerate(chunks, start=1):
+ print(f"Chunk {idx}:\n{chunk.value}")
+```
+
+#### Example 2: Custom Separators
+
+```python
+from swarms.chunkers.pdf_chunker import PdfChunker
+from swarms.chunkers.chunk_seperator import ChunkSeparator
+
+# Define custom separators for PDF chunking
+custom_separators = [ChunkSeparator("\n\n"), ChunkSeparator(". ")]
+
+# Initialize the PdfChunker with custom separators
+pdf_chunker = PdfChunker(separators=custom_separators)
+
+# PDF text content with custom separators
+pdf_text = "This PDF document has custom paragraph separators.\n\nIt also uses period-based sentence separators. Split accordingly."
+
+# Chunk the PDF text content
+chunks = pdf_chunker.chunk(pdf_text)
+
+# Print the generated chunks
+for idx, chunk in enumerate(chunks, start=1):
+ print(f"Chunk {idx}:\n{chunk.value}")
+```
+
+#### Example 3: Adjusting Maximum Tokens
+
+```python
+from swarms.chunkers.pdf_chunker import PdfChunker
+
+# Initialize the PdfChunker with a custom maximum token limit
+pdf_chunker = PdfChunker(max_tokens=50)
+
+# Lengthy PDF text content
+pdf_text = "This is an exceptionally long PDF document that should be broken into smaller chunks based on token count."
+
+# Chunk the PDF text content
+chunks = pdf_chunker.chunk(pdf_text)
+
+# Print the generated chunks
+for idx, chunk in enumerate(chunks, start=1):
+ print(f"Chunk {idx}:\n{chunk.value}")
+```
+
+### 4.3. Additional Features
+
+The `PdfChunker` class also provides additional features:
+
+#### Recursive Chunking
+The `_chunk_recursively` method handles the recursive chunking of PDF text content, ensuring that each chunk stays within the token limit.
+
+---
+
+## 5. Additional Information
+
+- **PDF Text Chunking**: The `PdfChunker` module is a specialized tool for splitting PDF text content into manageable chunks, making it suitable for natural language processing tasks involving PDF documents.
+- **Custom Separators**: You can customize separators to adapt the PDF text content chunking process to specific document structures.
+- **Token Count**: The module accurately counts tokens using the specified tokenizer, ensuring that chunks do not exceed token limits.
+
+---
+
+## 6. Conclusion
+
+The `PdfChunker` module is a valuable asset for processing PDF text content in various natural language processing and text analysis tasks. This documentation has provided a comprehensive guide on its usage, parameters, and examples, ensuring that you can effectively prepare PDF documents for further analysis and processing.
+
+By using the `PdfChunker`, you can efficiently break down lengthy and complex PDF text content into manageable chunks, making it ready for in-depth analysis.
+
+*Please check the official `PdfChunker` repository and documentation for any updates beyond the knowledge cutoff date.*
\ No newline at end of file
diff --git a/docs/swarms/models/biogpt.md b/docs/swarms/models/biogpt.md
new file mode 100644
index 00000000..291b917c
--- /dev/null
+++ b/docs/swarms/models/biogpt.md
@@ -0,0 +1,157 @@
+# `BioGPT` Documentation
+
+## Table of Contents
+1. [Introduction](#introduction)
+2. [Overview](#overview)
+3. [Installation](#installation)
+4. [Usage](#usage)
+ 1. [BioGPT Class](#biogpt-class)
+ 2. [Examples](#examples)
+5. [Additional Information](#additional-information)
+6. [Conclusion](#conclusion)
+
+---
+
+## 1. Introduction
+
+The `BioGPT` module is a domain-specific generative language model designed for the biomedical domain. It is built upon the powerful Transformer architecture and pretrained on a large corpus of biomedical literature. This documentation provides an extensive guide on using the `BioGPT` module, explaining its purpose, parameters, and usage.
+
+---
+
+## 2. Overview
+
+The `BioGPT` module addresses the need for a language model specialized in the biomedical domain. Unlike general-purpose language models, `BioGPT` excels in generating coherent and contextually relevant text specific to biomedical terms and concepts. It has been evaluated on various biomedical natural language processing tasks and has demonstrated superior performance.
+
+Key features and parameters of the `BioGPT` module include:
+- `model_name`: Name of the pretrained model.
+- `max_length`: Maximum length of generated text.
+- `num_return_sequences`: Number of sequences to return.
+- `do_sample`: Whether to use sampling in generation.
+- `min_length`: Minimum length of generated text.
+
+The `BioGPT` module is equipped with features for generating text, extracting features, and more.
+
+---
+
+## 3. Installation
+
+Before using the `BioGPT` module, ensure you have the required dependencies installed, including the Transformers library and Torch. You can install these dependencies using pip:
+
+```bash
+pip install transformers
+pip install torch
+```
+
+---
+
+## 4. Usage
+
+In this section, we'll cover how to use the `BioGPT` module effectively. It consists of the `BioGPT` class and provides examples to demonstrate its usage.
+
+### 4.1. `BioGPT` Class
+
+The `BioGPT` class is the core component of the `BioGPT` module. It is used to create a `BioGPT` instance, which can generate text, extract features, and more.
+
+#### Parameters:
+- `model_name` (str): Name of the pretrained model.
+- `max_length` (int): Maximum length of generated text.
+- `num_return_sequences` (int): Number of sequences to return.
+- `do_sample` (bool): Whether or not to use sampling in generation.
+- `min_length` (int): Minimum length of generated text.
+
+### 4.2. Examples
+
+Let's explore how to use the `BioGPT` class with different scenarios and applications.
+
+#### Example 1: Generating Biomedical Text
+
+```python
+from biogpt import BioGPT
+
+# Initialize the BioGPT model
+biogpt = BioGPT()
+
+# Generate biomedical text
+input_text = "The patient has a fever"
+generated_text = biogpt(input_text)
+
+print(generated_text)
+```
+
+#### Example 2: Extracting Features
+
+```python
+from biogpt import BioGPT
+
+# Initialize the BioGPT model
+biogpt = BioGPT()
+
+# Extract features from a biomedical text
+input_text = "The patient has a fever"
+features = biogpt.get_features(input_text)
+
+print(features)
+```
+
+#### Example 3: Using Beam Search Decoding
+
+```python
+from biogpt import BioGPT
+
+# Initialize the BioGPT model
+biogpt = BioGPT()
+
+# Generate biomedical text using beam search decoding
+input_text = "The patient has a fever"
+generated_text = biogpt.beam_search_decoding(input_text)
+
+print(generated_text)
+```
+
+### 4.3. Additional Features
+
+The `BioGPT` class also provides additional features:
+
+#### Set a New Pretrained Model
+```python
+biogpt.set_pretrained_model("new_pretrained_model")
+```
+
+#### Get the Model's Configuration
+```python
+config = biogpt.get_config()
+print(config)
+```
+
+#### Save and Load the Model
+```python
+# Save the model and tokenizer to a directory
+biogpt.save_model("saved_model")
+
+# Load a model and tokenizer from a directory
+biogpt.load_from_path("saved_model")
+```
+
+#### Print the Model's Architecture
+```python
+biogpt.print_model()
+```
+
+---
+
+## 5. Additional Information
+
+- **Biomedical Text Generation**: The `BioGPT` module is designed specifically for generating biomedical text, making it a valuable tool for various biomedical natural language processing tasks.
+- **Feature Extraction**: It also provides the capability to extract features from biomedical text.
+- **Beam Search Decoding**: Beam search decoding is available for generating text with improved quality.
+- **Customization**: You can set a new pretrained model and save/load models for customization.
+
+---
+
+## 6. Conclusion
+
+The `BioGPT` module is a powerful and specialized tool for generating and working with biomedical text. This documentation has provided a comprehensive guide on its usage, parameters, and examples, enabling you to effectively leverage it for various biomedical natural language processing tasks.
+
+By using `BioGPT`, you can enhance your biomedical text generation and analysis tasks with contextually relevant and coherent text.
+
+*Please check the official `BioGPT` repository and documentation for any updates beyond the knowledge cutoff date.*
\ No newline at end of file
diff --git a/docs/swarms/models/kosmos.md b/docs/swarms/models/kosmos.md
index 81e3ffd2..1735e153 100644
--- a/docs/swarms/models/kosmos.md
+++ b/docs/swarms/models/kosmos.md
@@ -1,4 +1,4 @@
-# Kosmos Documentation
+# `Kosmos` Documentation
## Introduction
diff --git a/docs/swarms/models/layoutlm_document_qa.md b/docs/swarms/models/layoutlm_document_qa.md
new file mode 100644
index 00000000..4c6169d0
--- /dev/null
+++ b/docs/swarms/models/layoutlm_document_qa.md
@@ -0,0 +1,88 @@
+# `LayoutLMDocumentQA` Documentation
+
+## Introduction
+
+Welcome to the documentation for LayoutLMDocumentQA, a multimodal model designed for visual question answering (QA) on real-world documents, such as invoices, PDFs, and more. This comprehensive documentation will provide you with a deep understanding of the LayoutLMDocumentQA class, its architecture, usage, and examples.
+
+## Overview
+
+LayoutLMDocumentQA is a versatile model that combines layout-based understanding of documents with natural language processing to answer questions about the content of documents. It is particularly useful for automating tasks like invoice processing, extracting information from PDFs, and handling various document-based QA scenarios.
+
+## Class Definition
+
+```python
+class LayoutLMDocumentQA(AbstractModel):
+ def __init__(
+ self,
+ model_name: str = "impira/layoutlm-document-qa",
+ task: str = "document-question-answering",
+ ):
+```
+
+## Purpose
+
+The LayoutLMDocumentQA class serves the following primary purposes:
+
+1. **Document QA**: LayoutLMDocumentQA is specifically designed for document-based question answering. It can process both the textual content and the layout of a document to answer questions.
+
+2. **Multimodal Understanding**: It combines natural language understanding with document layout analysis, making it suitable for documents with complex structures.
+
+## Parameters
+
+- `model_name` (str): The name or path of the pretrained LayoutLMDocumentQA model. Default: "impira/layoutlm-document-qa".
+- `task` (str): The specific task for which the model will be used. Default: "document-question-answering".
+
+## Usage
+
+To use LayoutLMDocumentQA, follow these steps:
+
+1. Initialize the LayoutLMDocumentQA instance:
+
+```python
+from swarms.models import LayoutLMDocumentQA
+
+layout_lm_doc_qa = LayoutLMDocumentQA()
+```
+
+### Example 1 - Initialization
+
+```python
+layout_lm_doc_qa = LayoutLMDocumentQA()
+```
+
+2. Ask a question about a document and provide the document's image path:
+
+```python
+question = "What is the total amount?"
+image_path = "path/to/document_image.png"
+answer = layout_lm_doc_qa(question, image_path)
+```
+
+### Example 2 - Document QA
+
+```python
+layout_lm_doc_qa = LayoutLMDocumentQA()
+question = "What is the total amount?"
+image_path = "path/to/document_image.png"
+answer = layout_lm_doc_qa(question, image_path)
+```
+
+## How LayoutLMDocumentQA Works
+
+LayoutLMDocumentQA employs a multimodal approach to document QA. Here's how it works:
+
+1. **Initialization**: When you create a LayoutLMDocumentQA instance, you can specify the model to use and the task, which is "document-question-answering" by default.
+
+2. **Question and Document**: You provide a question about the document and the image path of the document to the LayoutLMDocumentQA instance.
+
+3. **Multimodal Processing**: LayoutLMDocumentQA processes both the question and the document image. It combines layout-based analysis with natural language understanding.
+
+4. **Answer Generation**: The model generates an answer to the question based on its analysis of the document layout and content.
+
+## Additional Information
+
+- LayoutLMDocumentQA uses the "impira/layoutlm-document-qa" pretrained model, which is specifically designed for document-based question answering.
+- You can adapt this model to various document QA scenarios by changing the task and providing relevant questions and documents.
+- This model is particularly useful for automating document-based tasks and extracting valuable information from structured documents.
+
+That concludes the documentation for LayoutLMDocumentQA. We hope you find this tool valuable for your document-based question answering needs. If you have any questions or encounter any issues, please refer to the LayoutLMDocumentQA documentation for further assistance. Enjoy using LayoutLMDocumentQA!
\ No newline at end of file
diff --git a/docs/swarms/models/nougat.md b/docs/swarms/models/nougat.md
new file mode 100644
index 00000000..217990a1
--- /dev/null
+++ b/docs/swarms/models/nougat.md
@@ -0,0 +1,118 @@
+# `Nougat` Documentation
+
+## Introduction
+
+Welcome to the documentation for Nougat, a versatile model designed by Meta for transcribing scientific PDFs into user-friendly Markdown format, extracting information from PDFs, and extracting metadata from PDF documents. This documentation will provide you with a deep understanding of the Nougat class, its architecture, usage, and examples.
+
+## Overview
+
+Nougat is a powerful tool that combines language modeling and image processing capabilities to convert scientific PDF documents into Markdown format. It is particularly useful for researchers, students, and professionals who need to extract valuable information from PDFs quickly. With Nougat, you can simplify complex PDFs, making their content more accessible and easy to work with.
+
+## Class Definition
+
+```python
+class Nougat:
+ def __init__(
+ self,
+ model_name_or_path="facebook/nougat-base",
+ min_length: int = 1,
+ max_new_tokens: int = 30,
+ ):
+```
+
+## Purpose
+
+The Nougat class serves the following primary purposes:
+
+1. **PDF Transcription**: Nougat is designed to transcribe scientific PDFs into Markdown format. It helps convert complex PDF documents into a more readable and structured format, making it easier to extract information.
+
+2. **Information Extraction**: It allows users to extract valuable information and content from PDFs efficiently. This can be particularly useful for researchers and professionals who need to extract data, figures, or text from scientific papers.
+
+3. **Metadata Extraction**: Nougat can also extract metadata from PDF documents, providing essential details about the document, such as title, author, and publication date.
+
+## Parameters
+
+- `model_name_or_path` (str): The name or path of the pretrained Nougat model. Default: "facebook/nougat-base".
+- `min_length` (int): The minimum length of the generated transcription. Default: 1.
+- `max_new_tokens` (int): The maximum number of new tokens to generate in the Markdown transcription. Default: 30.
+
+## Usage
+
+To use Nougat, follow these steps:
+
+1. Initialize the Nougat instance:
+
+```python
+from swarms.models import Nougat
+
+nougat = Nougat()
+```
+
+### Example 1 - Initialization
+
+```python
+nougat = Nougat()
+```
+
+2. Transcribe a PDF image using Nougat:
+
+```python
+markdown_transcription = nougat("path/to/pdf_file.png")
+```
+
+### Example 2 - PDF Transcription
+
+```python
+nougat = Nougat()
+markdown_transcription = nougat("path/to/pdf_file.png")
+```
+
+3. Extract information from a PDF:
+
+```python
+information = nougat.extract_information("path/to/pdf_file.png")
+```
+
+### Example 3 - Information Extraction
+
+```python
+nougat = Nougat()
+information = nougat.extract_information("path/to/pdf_file.png")
+```
+
+4. Extract metadata from a PDF:
+
+```python
+metadata = nougat.extract_metadata("path/to/pdf_file.png")
+```
+
+### Example 4 - Metadata Extraction
+
+```python
+nougat = Nougat()
+metadata = nougat.extract_metadata("path/to/pdf_file.png")
+```
+
+## How Nougat Works
+
+Nougat employs a vision encoder-decoder model, along with a dedicated processor, to transcribe PDFs into Markdown format and perform information and metadata extraction. Here's how it works:
+
+1. **Initialization**: When you create a Nougat instance, you can specify the model to use, the minimum transcription length, and the maximum number of new tokens to generate.
+
+2. **Processing PDFs**: Nougat can process PDFs as input. You can provide the path to a PDF document.
+
+3. **Image Processing**: The processor converts PDF pages into images, which are then encoded by the model.
+
+4. **Transcription**: Nougat generates Markdown transcriptions of PDF content, ensuring a minimum length and respecting the token limit.
+
+5. **Information Extraction**: Information extraction involves parsing the Markdown transcription to identify key details or content of interest.
+
+6. **Metadata Extraction**: Metadata extraction involves identifying and extracting document metadata, such as title, author, and publication date.
+
+## Additional Information
+
+- Nougat leverages the "facebook/nougat-base" pretrained model, which is specifically designed for document transcription and extraction tasks.
+- You can adjust the minimum transcription length and the maximum number of new tokens to control the output's length and quality.
+- Nougat can be run on both CPU and GPU devices.
+
+That concludes the documentation for Nougat. We hope you find this tool valuable for your PDF transcription, information extraction, and metadata extraction needs. If you have any questions or encounter any issues, please refer to the Nougat documentation for further assistance. Enjoy using Nougat!
\ No newline at end of file
diff --git a/docs/swarms/models/openai_chat.md b/docs/swarms/models/openai_chat.md
index 4bb3ba78..a2ef9811 100644
--- a/docs/swarms/models/openai_chat.md
+++ b/docs/swarms/models/openai_chat.md
@@ -1,4 +1,4 @@
-# `OpenAIChat`` Documentation
+# `OpenAIChat` Documentation
## Table of Contents
diff --git a/mkdocs.yml b/mkdocs.yml
index 3446f2c0..0b8083c9 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -90,22 +90,30 @@ nav:
- OmniModalAgent: "swarms/agents/omni_agent.md"
- Idea2Image: "swarms/agents/idea_to_image.md"
- swarms.models:
- - Overview: "swarms/models/index.md"
- - HuggingFaceLLM: "swarms/models/hf.md"
- - Anthropic: "swarms/models/anthropic.md"
- - OpenAI: "swarms/models/openai.md"
- - Fuyu: "swarms/models/fuyu.md"
- - Zephyr: "swarms/models/zephyr.md"
- - Vilt: "swarms/models/vilt.md"
- - Idefics: "swarms/models/idefics.md"
- - BingChat: "swarms/models/bingchat.md"
- - Kosmos: "swarms/models/kosmos.md"
+ - Language:
+ - Overview: "swarms/models/index.md"
+ - HuggingFaceLLM: "swarms/models/hf.md"
+ - Anthropic: "swarms/models/anthropic.md"
+ - OpenAI: "swarms/models/openai.md"
+ - Zephyr: "swarms/models/zephyr.md"
+ - BioGPT: "swarms/models/biogpt.md"
+ - MultiModal:
+ - Fuyu: "swarms/models/fuyu.md"
+ - Vilt: "swarms/models/vilt.md"
+ - Idefics: "swarms/models/idefics.md"
+ - BingChat: "swarms/models/bingchat.md"
+ - Kosmos: "swarms/models/kosmos.md"
+ - Nougat: "swarms/models/nougat.md"
+ - LayoutLMDocumentQA: "swarms/models/layoutlm_document_qa.md"
- swarms.structs:
- Overview: "swarms/structs/overview.md"
- Workflow: "swarms/structs/workflow.md"
- swarms.memory:
- PineconeVectorStoreStore: "swarms/memory/pinecone.md"
- PGVectorStore: "swarms/memory/pg.md"
+ - swarms.chunkers:
+ - BaseChunker: "swarms/chunkers/basechunker.md"
+ - PdfChunker: "swarms/chunkers/pdf_chunker.md"
- Examples:
- Overview: "examples/index.md"
- Agents:
diff --git a/pyproject.toml b/pyproject.toml
index f7301660..def8662a 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -37,12 +37,16 @@ langchain-experimental = "*"
playwright = "*"
duckduckgo-search = "*"
faiss-cpu = "*"
+datasets = "*"
diffusers = "*"
+sentencepiece = "*"
wget = "*"
griptape = "*"
httpx = "*"
+attrs = "*"
ggl = "*"
beautifulsoup4 = "*"
+huggingface-hub = "*"
pydantic = "*"
tenacity = "*"
redis = "*"
@@ -53,7 +57,9 @@ open-interpreter = "*"
tabulate = "*"
termcolor = "*"
black = "*"
+open_clip_torch = "*"
dalle3 = "*"
+soundfile = "*"
torchvision = "*"
rich = "*"
EdgeGPT = "*"
diff --git a/requirements.txt b/requirements.txt
index 8581892a..3fc889b4 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -13,13 +13,19 @@ wget==3.2
simpleaichat
httpx
torch
+open_clip_torch
ggl
beautifulsoup4
google-search-results==2.4.2
Pillow
faiss-cpu
openai
+attrs
+datasets
+soundfile
+huggingface-hub
google-generativeai
+sentencepiece
duckduckgo-search
agent-protocol
chromadb
diff --git a/swarms/__init__.py b/swarms/__init__.py
index e1dba262..2ecf2033 100644
--- a/swarms/__init__.py
+++ b/swarms/__init__.py
@@ -8,14 +8,14 @@ warnings.filterwarnings("ignore", category=UserWarning)
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"
-
-
from swarms import workers
from swarms.workers.worker import Worker
+
# from swarms import chunkers
-from swarms import models
+from swarms.models import * # import * only works when __all__ = [] is defined in __init__.py
from swarms import structs
from swarms import swarms
from swarms import agents
from swarms.logo import logo
-print(logo)
\ No newline at end of file
+
+print(logo)
diff --git a/swarms/memory/requirements.txt b/swarms/memory/requirements.txt
deleted file mode 100644
index 00934f5c..00000000
--- a/swarms/memory/requirements.txt
+++ /dev/null
@@ -1,8 +0,0 @@
-attrs==21.2.0
-griptape==0.18.2
-oceandb==0.1.0
-pgvector==0.2.3
-pydantic==1.10.8
-SQLAlchemy==1.4.49
-SQLAlchemy==2.0.20
-swarms==1.8.2
diff --git a/swarms/models/__init__.py b/swarms/models/__init__.py
index fe66dee8..1dc42971 100644
--- a/swarms/models/__init__.py
+++ b/swarms/models/__init__.py
@@ -4,10 +4,30 @@ from swarms.models.petals import Petals
from swarms.models.mistral import Mistral
from swarms.models.openai_models import OpenAI, AzureOpenAI, OpenAIChat
from swarms.models.zephyr import Zephyr
-
+from swarms.models.biogpt import BioGPT
# MultiModal Models
from swarms.models.idefics import Idefics
from swarms.models.kosmos_two import Kosmos
from swarms.models.vilt import Vilt
-# from swarms.models.fuyu import Fuyu
\ No newline at end of file
+from swarms.models.nougat import Nougat
+from swarms.models.layoutlm_document_qa import LayoutLMDocumentQA
+
+# from swarms.models.fuyu import Fuyu # Not working, wait until they update
+
+
+__all__ = [
+ "Anthropic",
+ "Petals",
+ "Mistral",
+ "OpenAI",
+ "AzureOpenAI",
+ "OpenAIChat",
+ "Zephyr",
+ "Idefics",
+ "Kosmos",
+ "Vilt",
+ "Nougat",
+ "LayoutLMDocumentQA",
+ "BioGPT",
+]
diff --git a/swarms/models/anthropic.py b/swarms/models/anthropic.py
index 453890b9..232ff647 100644
--- a/swarms/models/anthropic.py
+++ b/swarms/models/anthropic.py
@@ -4,13 +4,13 @@ import os
class Anthropic:
"""
-
+
Anthropic large language models.
-
-
+
+
Args:
-
-
+
+
"""
def __init__(
diff --git a/swarms/models/base.py b/swarms/models/base.py
index 57045165..32a45c43 100644
--- a/swarms/models/base.py
+++ b/swarms/models/base.py
@@ -1,14 +1,17 @@
import time
from abc import ABC, abstractmethod
+
def count_tokens(text: str) -> int:
return len(text.split())
+
class AbstractModel(ABC):
"""
AbstractModel
"""
+
# abstract base class for language models
def __init__(self):
self.start_time = None
@@ -41,7 +44,7 @@ class AbstractModel(ABC):
if elapsed_time == 0:
return float("inf")
return self._num_tokens() / elapsed_time
-
+
def _num_tokens(self, text: str) -> int:
"""Number of tokens"""
return count_tokens(text)
@@ -75,3 +78,16 @@ class AbstractModel(ABC):
if self.start_time and self.end_time:
return self.end_time - self.start_time
return 0
+
+ def metrics(self) -> str:
+ _sec_to_first_token = self._sec_to_first_token()
+ _tokens_per_second = self._tokens_per_second()
+ _num_tokens = self._num_tokens(self.history)
+ _time_for_generation = self._time_for_generation(self.history)
+
+ return f"""
+ SEC TO FIRST TOKEN: {_sec_to_first_token}
+ TOKENS/SEC: {_tokens_per_second}
+ TOKENS: {_num_tokens}
+ Tokens/SEC: {_time_for_generation}
+ """
diff --git a/swarms/models/bing_chat.py b/swarms/models/bing_chat.py
index c91690e5..1d2eb503 100644
--- a/swarms/models/bing_chat.py
+++ b/swarms/models/bing_chat.py
@@ -29,14 +29,22 @@ class BingChat:
self.cookies = json.loads(open(cookies_path, encoding="utf-8").read())
self.bot = asyncio.run(Chatbot.create(cookies=self.cookies))
- def __call__(self, prompt: str, style: ConversationStyle = ConversationStyle.creative) -> str:
+ def __call__(
+ self, prompt: str, style: ConversationStyle = ConversationStyle.creative
+ ) -> str:
"""
Get a text response using the EdgeGPT model based on the provided prompt.
"""
- response = asyncio.run(self.bot.ask(prompt=prompt, conversation_style=style, simplify_response=True))
- return response['text']
+ response = asyncio.run(
+ self.bot.ask(
+ prompt=prompt, conversation_style=style, simplify_response=True
+ )
+ )
+ return response["text"]
- def create_img(self, prompt: str, output_dir: str = "./output", auth_cookie: str = None) -> str:
+ def create_img(
+ self, prompt: str, output_dir: str = "./output", auth_cookie: str = None
+ ) -> str:
"""
Generate an image based on the provided prompt and save it in the given output directory.
Returns the path of the generated image.
@@ -48,7 +56,7 @@ class BingChat:
images = image_generator.get_images(prompt)
image_generator.save_images(images, output_dir=output_dir)
- return Path(output_dir) / images[0]['path']
+ return Path(output_dir) / images[0]["path"]
@staticmethod
def set_cookie_dir_path(path: str):
diff --git a/swarms/models/bioclip.py b/swarms/models/bioclip.py
new file mode 100644
index 00000000..937634e3
--- /dev/null
+++ b/swarms/models/bioclip.py
@@ -0,0 +1,170 @@
+"""
+
+
+BiomedCLIP-PubMedBERT_256-vit_base_patch16_224
+https://huggingface.co/microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224
+BiomedCLIP is a biomedical vision-language foundation model that is pretrained on PMC-15M,
+a dataset of 15 million figure-caption pairs extracted from biomedical research articles in PubMed Central, using contrastive learning. It uses PubMedBERT as the text encoder and Vision Transformer as the image encoder, with domain-specific adaptations. It can perform various vision-language processing (VLP) tasks such as cross-modal retrieval, image classification, and visual question answering. BiomedCLIP establishes new state of the art in a wide range of standard datasets, and substantially outperforms prior VLP approaches:
+
+
+
+Citation
+@misc{https://doi.org/10.48550/arXiv.2303.00915,
+ doi = {10.48550/ARXIV.2303.00915},
+ url = {https://arxiv.org/abs/2303.00915},
+ author = {Zhang, Sheng and Xu, Yanbo and Usuyama, Naoto and Bagga, Jaspreet and Tinn, Robert and Preston, Sam and Rao, Rajesh and Wei, Mu and Valluri, Naveen and Wong, Cliff and Lungren, Matthew and Naumann, Tristan and Poon, Hoifung},
+ title = {Large-Scale Domain-Specific Pretraining for Biomedical Vision-Language Processing},
+ publisher = {arXiv},
+ year = {2023},
+}
+
+Model Use
+How to use
+Please refer to this example notebook.
+
+Intended Use
+This model is intended to be used solely for (I) future research on visual-language processing and (II) reproducibility of the experimental results reported in the reference paper.
+
+Primary Intended Use
+The primary intended use is to support AI researchers building on top of this work. BiomedCLIP and its associated models should be helpful for exploring various biomedical VLP research questions, especially in the radiology domain.
+
+Out-of-Scope Use
+Any deployed use case of the model --- commercial or otherwise --- is currently out of scope. Although we evaluated the models using a broad set of publicly-available research benchmarks, the models and evaluations are not intended for deployed use cases. Please refer to the associated paper for more details.
+
+Data
+This model builds upon PMC-15M dataset, which is a large-scale parallel image-text dataset for biomedical vision-language processing. It contains 15 million figure-caption pairs extracted from biomedical research articles in PubMed Central. It covers a diverse range of biomedical image types, such as microscopy, radiography, histology, and more.
+
+Limitations
+This model was developed using English corpora, and thus can be considered English-only.
+
+Further information
+Please refer to the corresponding paper, "Large-Scale Domain-Specific Pretraining for Biomedical Vision-Language Processing" for additional details on the model training and evaluation.
+"""
+
+import open_clip
+import glob
+import torch
+from PIL import Image
+import matplotlib.pyplot as plt
+
+
+class BioClip:
+ """
+ BioClip
+
+ Args:
+ model_path (str): path to the model
+
+ Attributes:
+ model_path (str): path to the model
+ model (torch.nn.Module): the model
+ preprocess_train (torchvision.transforms.Compose): the preprocessing pipeline for training
+ preprocess_val (torchvision.transforms.Compose): the preprocessing pipeline for validation
+ tokenizer (open_clip.Tokenizer): the tokenizer
+ device (torch.device): the device to run the model on
+
+ Methods:
+ __call__(self, img_path: str, labels: list, template: str = 'this is a photo of ', context_length: int = 256):
+ returns a dictionary of labels and their probabilities
+ plot_image_with_metadata(img_path: str, metadata: dict): plots the image with the metadata
+
+ Usage:
+ clip = BioClip('hf-hub:microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224')
+
+ labels = [
+ 'adenocarcinoma histopathology',
+ 'brain MRI',
+ 'covid line chart',
+ 'squamous cell carcinoma histopathology',
+ 'immunohistochemistry histopathology',
+ 'bone X-ray',
+ 'chest X-ray',
+ 'pie chart',
+ 'hematoxylin and eosin histopathology'
+ ]
+
+ result = clip("your_image_path.jpg", labels)
+ metadata = {'filename': "your_image_path.jpg".split('/')[-1], 'top_probs': result}
+ clip.plot_image_with_metadata("your_image_path.jpg", metadata)
+
+
+ """
+
+ def __init__(self, model_path: str):
+ self.model_path = model_path
+ (
+ self.model,
+ self.preprocess_train,
+ self.preprocess_val,
+ ) = open_clip.create_model_and_transforms(model_path)
+ self.tokenizer = open_clip.get_tokenizer(model_path)
+ self.device = (
+ torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
+ )
+ self.model.to(self.device)
+ self.model.eval()
+
+ def __call__(
+ self,
+ img_path: str,
+ labels: list,
+ template: str = "this is a photo of ",
+ context_length: int = 256,
+ ):
+ image = torch.stack([self.preprocess_val(Image.open(img_path))]).to(self.device)
+ texts = self.tokenizer(
+ [template + l for l in labels], context_length=context_length
+ ).to(self.device)
+
+ with torch.no_grad():
+ image_features, text_features, logit_scale = self.model(image, texts)
+ logits = (
+ (logit_scale * image_features @ text_features.t())
+ .detach()
+ .softmax(dim=-1)
+ )
+ sorted_indices = torch.argsort(logits, dim=-1, descending=True)
+ logits = logits.cpu().numpy()
+ sorted_indices = sorted_indices.cpu().numpy()
+
+ results = {}
+ for idx in sorted_indices[0]:
+ label = labels[idx]
+ prob = logits[0][idx]
+ results[label] = prob
+ return results
+
+ @staticmethod
+ def plot_image_with_metadata(img_path: str, metadata: dict):
+ img = Image.open(img_path)
+ fig, ax = plt.subplots(figsize=(5, 5))
+ ax.imshow(img)
+ ax.axis("off")
+ title = (
+ metadata["filename"]
+ + "\n"
+ + "\n".join([f"{k}: {v*100:.1f}" for k, v in metadata["top_probs"].items()])
+ )
+ ax.set_title(title, fontsize=14)
+ plt.tight_layout()
+ plt.show()
+
+
+# Usage
+# clip = BioClip('hf-hub:microsoft/BiomedCLIP-PubMedBERT_256-vit_base_patch16_224')
+
+# labels = [
+# 'adenocarcinoma histopathology',
+# 'brain MRI',
+# 'covid line chart',
+# 'squamous cell carcinoma histopathology',
+# 'immunohistochemistry histopathology',
+# 'bone X-ray',
+# 'chest X-ray',
+# 'pie chart',
+# 'hematoxylin and eosin histopathology'
+# ]
+
+# result = clip("your_image_path.jpg", labels)
+# metadata = {'filename': "your_image_path.jpg".split('/')[-1], 'top_probs': result}
+# clip.plot_image_with_metadata("your_image_path.jpg", metadata)
diff --git a/swarms/models/biogpt.py b/swarms/models/biogpt.py
new file mode 100644
index 00000000..f5abdf95
--- /dev/null
+++ b/swarms/models/biogpt.py
@@ -0,0 +1,208 @@
+"""
+BioGPT
+Pre-trained language models have attracted increasing attention in the biomedical domain,
+inspired by their great success in the general natural language domain.
+Among the two main branches of pre-trained language models in the general language domain, i.e. BERT (and its variants) and GPT (and its variants),
+the first one has been extensively studied in the biomedical domain, such as BioBERT and PubMedBERT.
+While they have achieved great success on a variety of discriminative downstream biomedical tasks,
+the lack of generation ability constrains their application scope.
+In this paper, we propose BioGPT, a domain-specific generative Transformer language model
+pre-trained on large-scale biomedical literature.
+We evaluate BioGPT on six biomedical natural language processing tasks
+and demonstrate that our model outperforms previous models on most tasks.
+Especially, we get 44.98%, 38.42% and 40.76% F1 score on BC5CDR, KD-DTI and DDI
+end-to-end relation extraction tasks, respectively, and 78.2% accuracy on PubMedQA,
+creating a new record. Our case study on text generation further demonstrates the
+advantage of BioGPT on biomedical literature to generate fluent descriptions for biomedical terms.
+
+
+@article{10.1093/bib/bbac409,
+ author = {Luo, Renqian and Sun, Liai and Xia, Yingce and Qin, Tao and Zhang, Sheng and Poon, Hoifung and Liu, Tie-Yan},
+ title = "{BioGPT: generative pre-trained transformer for biomedical text generation and mining}",
+ journal = {Briefings in Bioinformatics},
+ volume = {23},
+ number = {6},
+ year = {2022},
+ month = {09},
+ abstract = "{Pre-trained language models have attracted increasing attention in the biomedical domain, inspired by their great success in the general natural language domain. Among the two main branches of pre-trained language models in the general language domain, i.e. BERT (and its variants) and GPT (and its variants), the first one has been extensively studied in the biomedical domain, such as BioBERT and PubMedBERT. While they have achieved great success on a variety of discriminative downstream biomedical tasks, the lack of generation ability constrains their application scope. In this paper, we propose BioGPT, a domain-specific generative Transformer language model pre-trained on large-scale biomedical literature. We evaluate BioGPT on six biomedical natural language processing tasks and demonstrate that our model outperforms previous models on most tasks. Especially, we get 44.98\%, 38.42\% and 40.76\% F1 score on BC5CDR, KD-DTI and DDI end-to-end relation extraction tasks, respectively, and 78.2\% accuracy on PubMedQA, creating a new record. Our case study on text generation further demonstrates the advantage of BioGPT on biomedical literature to generate fluent descriptions for biomedical terms.}",
+ issn = {1477-4054},
+ doi = {10.1093/bib/bbac409},
+ url = {https://doi.org/10.1093/bib/bbac409},
+ note = {bbac409},
+ eprint = {https://academic.oup.com/bib/article-pdf/23/6/bbac409/47144271/bbac409.pdf},
+}
+"""
+
+import torch
+from transformers import pipeline, set_seed, BioGptTokenizer, BioGptForCausalLM
+
+
+class BioGPT:
+ """
+ A wrapper class for the BioGptForCausalLM model from the transformers library.
+
+ Attributes:
+ model_name (str): Name of the pretrained model.
+ model (BioGptForCausalLM): The pretrained BioGptForCausalLM model.
+ tokenizer (BioGptTokenizer): The tokenizer for the BioGptForCausalLM model.
+
+ Methods:
+ __call__: Generate text based on the given input.
+ get_features: Get the features of a given text.
+ beam_search_decoding: Generate text using beam search decoding.
+ set_pretrained_model: Set a new tokenizer and model.
+ get_config: Get the model's configuration.
+ save_model: Save the model and tokenizer to a directory.
+ load_from_path: Load a model and tokenizer from a directory.
+ print_model: Print the model's architecture.
+
+ Usage:
+ >>> from swarms.models.biogpt import BioGPTWrapper
+ >>> model = BioGPTWrapper()
+ >>> out = model("The patient has a fever")
+ >>> print(out)
+
+
+ """
+
+ def __init__(
+ self,
+ model_name: str = "microsoft/biogpt",
+ max_length: int = 500,
+ num_return_sequences: int = 5,
+ do_sample: bool = True,
+ min_length: int = 100,
+ ):
+ """
+ Initialize the wrapper class with a model name.
+
+ Args:
+ model_name (str): Name of the pretrained model. Default is "microsoft/biogpt".
+ """
+ self.model_name = model_name
+ self.max_length = max_length
+ self.num_return_sequences = num_return_sequences
+ self.do_sample = do_sample
+ self.min_length = min_length
+
+ self.model = BioGptForCausalLM.from_pretrained(self.model_name)
+ self.tokenizer = BioGptTokenizer.from_pretrained(self.model_name)
+
+ def __call__(self, text: str):
+ """
+ Generate text based on the given input.
+
+ Args:
+ text (str): The input text to generate from.
+ max_length (int): Maximum length of the generated text.
+ num_return_sequences (int): Number of sequences to return.
+ do_sample (bool): Whether or not to use sampling in generation.
+
+ Returns:
+ list[dict]: A list of generated texts.
+ """
+ set_seed(42)
+ generator = pipeline(
+ "text-generation", model=self.model, tokenizer=self.tokenizer
+ )
+ return generator(
+ text,
+ max_length=self.max_length,
+ num_return_sequences=self.num_return_sequences,
+ do_sample=self.do_sample,
+ )
+
+ def get_features(self, text):
+ """
+ Get the features of a given text.
+
+ Args:
+ text (str): Input text.
+
+ Returns:
+ BaseModelOutputWithPastAndCrossAttentions: Model output.
+ """
+ encoded_input = self.tokenizer(text, return_tensors="pt")
+ return self.model(**encoded_input)
+
+ def beam_search_decoding(
+ self,
+ sentence,
+ num_beams=5,
+ early_stopping=True,
+ ):
+ """
+ Generate text using beam search decoding.
+
+ Args:
+ sentence (str): The input sentence to generate from.
+ min_length (int): Minimum length of the generated text.
+ max_length (int): Maximum length of the generated text.
+ num_beams (int): Number of beams for beam search.
+ early_stopping (bool): Whether to stop early during beam search.
+
+ Returns:
+ str: The generated text.
+ """
+ inputs = self.tokenizer(sentence, return_tensors="pt")
+ set_seed(42)
+ with torch.no_grad():
+ beam_output = self.model.generate(
+ **inputs,
+ min_length=self.min_length,
+ max_length=self.max_length,
+ num_beams=num_beams,
+ early_stopping=early_stopping
+ )
+ return self.tokenizer.decode(beam_output[0], skip_special_tokens=True)
+
+ # Feature 1: Set a new tokenizer and model
+ def set_pretrained_model(self, model_name):
+ """
+ Set a new tokenizer and model.
+
+ Args:
+ model_name (str): Name of the pretrained model.
+ """
+ self.model_name = model_name
+ self.model = BioGptForCausalLM.from_pretrained(self.model_name)
+ self.tokenizer = BioGptTokenizer.from_pretrained(self.model_name)
+
+ # Feature 2: Get the model's config details
+ def get_config(self):
+ """
+ Get the model's configuration.
+
+ Returns:
+ PretrainedConfig: The configuration of the model.
+ """
+ return self.model.config
+
+ # Feature 3: Save the model and tokenizer to disk
+ def save_model(self, path):
+ """
+ Save the model and tokenizer to a directory.
+
+ Args:
+ path (str): Path to the directory.
+ """
+ self.model.save_pretrained(path)
+ self.tokenizer.save_pretrained(path)
+
+ # Feature 4: Load a model from a custom path
+ def load_from_path(self, path):
+ """
+ Load a model and tokenizer from a directory.
+
+ Args:
+ path (str): Path to the directory.
+ """
+ self.model = BioGptForCausalLM.from_pretrained(path)
+ self.tokenizer = BioGptTokenizer.from_pretrained(path)
+
+ # Feature 5: Print the model's architecture
+ def print_model(self):
+ """
+ Print the model's architecture.
+ """
+ print(self.model)
diff --git a/swarms/models/idefics.py b/swarms/models/idefics.py
index 747def16..73cb4991 100644
--- a/swarms/models/idefics.py
+++ b/swarms/models/idefics.py
@@ -38,21 +38,22 @@ class Idefics:
# Usage
```
- from exa import idefics
- mmi = idefics()
+ from swarms.models import idefics
+
+ model = idefics()
user_input = "User: What is in this image? https://upload.wikimedia.org/wikipedia/commons/8/86/Id%C3%A9fix.JPG"
- response = mmi.chat(user_input)
+ response = model.chat(user_input)
print(response)
user_input = "User: And who is that? https://static.wikia.nocookie.net/asterix/images/2/25/R22b.gif/revision/latest?cb=20110815073052"
- response = mmi.chat(user_input)
+ response = model.chat(user_input)
print(response)
- mmi.set_checkpoint("new_checkpoint")
- mmi.set_device("cpu")
- mmi.set_max_length(200)
- mmi.clear_chat_history()
+ model.set_checkpoint("new_checkpoint")
+ model.set_device("cpu")
+ model.set_max_length(200)
+ model.clear_chat_history()
```
"""
@@ -87,7 +88,7 @@ class Idefics:
prompts : list
A list of prompts. Each prompt is a list of text strings and images.
batched_mode : bool, optional
- Whether to process the prompts in batched mode. If True, all prompts are
+ Whether to process the prompts in batched mode. If True, all prompts are
processed together. If False, only the first prompt is processed (default is True).
Returns
@@ -131,8 +132,8 @@ class Idefics:
prompts : list
A list of prompts. Each prompt is a list of text strings and images.
batched_mode : bool, optional
- Whether to process the prompts in batched mode.
- If True, all prompts are processed together.
+ Whether to process the prompts in batched mode.
+ If True, all prompts are processed together.
If False, only the first prompt is processed (default is True).
Returns
diff --git a/swarms/models/kosmos_two.py b/swarms/models/kosmos_two.py
index 91118c77..b36affcb 100644
--- a/swarms/models/kosmos_two.py
+++ b/swarms/models/kosmos_two.py
@@ -20,7 +20,7 @@ class Kosmos:
"""
Args:
-
+
# Initialize Kosmos
diff --git a/swarms/models/layoutlm_document_qa.py b/swarms/models/layoutlm_document_qa.py
new file mode 100644
index 00000000..6fe83210
--- /dev/null
+++ b/swarms/models/layoutlm_document_qa.py
@@ -0,0 +1,36 @@
+"""
+LayoutLMDocumentQA is a multimodal good for
+visual question answering on real world docs lik invoice, pdfs, etc
+"""
+from transformers import pipeline
+from swarms.models.base import AbstractModel
+
+
+class LayoutLMDocumentQA(AbstractModel):
+ """
+ LayoutLMDocumentQA for document question answering:
+
+ Args:
+ model_name (str, optional): [description]. Defaults to "impira/layoutlm-document-qa".
+ task (str, optional): [description]. Defaults to "document-question-answering".
+
+ Usage:
+ >>> from swarms.models import LayoutLMDocumentQA
+ >>> model = LayoutLMDocumentQA()
+ >>> out = model("What is the total amount?", "path/to/img.png")
+ >>> print(out)
+
+ """
+
+ def __init__(
+ self,
+ model_name: str = "impira/layoutlm-document-qa",
+ task: str = "document-question-answering",
+ ):
+ self.pipeline = pipeline(self.task, model=self.model_name)
+
+ def __call__(self, task: str, img_path: str):
+ """Call for model"""
+ out = self.pipeline(img_path, task)
+ out = str(out)
+ return out
diff --git a/swarms/models/nougat.py b/swarms/models/nougat.py
new file mode 100644
index 00000000..cc154283
--- /dev/null
+++ b/swarms/models/nougat.py
@@ -0,0 +1,69 @@
+"""
+Nougat by Meta
+
+Good for:
+- transcribe Scientific PDFs into an easy to use markdown
+format
+- Extracting information from PDFs
+- Extracting metadata from pdfs
+
+"""
+
+import torch
+from PIL import Image
+from transformers import NougatProcessor, VisionEncoderDecoderModel
+
+
+class Nougat:
+ """
+ Nougat
+
+ ArgsS:
+ model_name_or_path: str, default="facebook/nougat-base"
+ min_length: int, default=1
+ max_new_tokens: int, default=30
+
+ Usage:
+ >>> from swarms.models.nougat import Nougat
+ >>> nougat = Nougat()
+ >>> nougat("path/to/image.png")
+
+
+ """
+
+ def __init__(
+ self,
+ model_name_or_path="facebook/nougat-base",
+ min_length: int = 1,
+ max_new_tokens: int = 30,
+ ):
+ self.model_name_or_path = model_name_or_path
+ self.min_length = min_length
+ self.max_new_tokens = max_new_tokens
+
+ self.processor = NougatProcessor.from_pretrained(self.model_name_or_path)
+ self.model = VisionEncoderDecoderModel.from_pretrained(self.model_name_or_path)
+ self.device = "cuda" if torch.cuda.is_available() else "cpu"
+ self.model.to(self.device)
+
+ def get_image(self, img_path: str):
+ """Get an image from a path"""
+ image = Image.open(img_path)
+ return image
+
+ def __call__(self, img_path: str):
+ """Call the model with an image_path str as an input"""
+ image = Image.open(img_path)
+ pixel_values = self.processor(image, return_tensors="pt").pixel_values
+
+ # Generate transcriptions, here we only generate 30 tokens
+ outputs = self.model.generate(
+ pixel_values.to(self.device),
+ min_length=self.min_length,
+ max_new_tokens=self.max_new_tokens,
+ bad_words_ids=[[self.processor.unk_token - id]],
+ )
+
+ sequence = self.processor.batch_decode(outputs, skip_special_tokens=True)[0]
+ sequence = self.processor.post_process_generation(sequence, fix_markdown=False)
+ return sequence
diff --git a/swarms/models/phi.py b/swarms/models/phi.py
new file mode 100644
index 00000000..90fca08e
--- /dev/null
+++ b/swarms/models/phi.py
@@ -0,0 +1 @@
+"""Phi by Microsoft written by Kye"""
diff --git a/swarms/models/speecht5.py b/swarms/models/speecht5.py
new file mode 100644
index 00000000..e98036ac
--- /dev/null
+++ b/swarms/models/speecht5.py
@@ -0,0 +1,159 @@
+"""
+SpeechT5 (TTS task)
+SpeechT5 model fine-tuned for speech synthesis (text-to-speech) on LibriTTS.
+
+This model was introduced in SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing by Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei.
+
+SpeechT5 was first released in this repository, original weights. The license used is MIT.
+
+Model Description
+Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural language processing models, we propose a unified-modal SpeechT5 framework that explores the encoder-decoder pre-training for self-supervised speech/text representation learning. The SpeechT5 framework consists of a shared encoder-decoder network and six modal-specific (speech/text) pre/post-nets. After preprocessing the input speech/text through the pre-nets, the shared encoder-decoder network models the sequence-to-sequence transformation, and then the post-nets generate the output in the speech/text modality based on the output of the decoder.
+
+Leveraging large-scale unlabeled speech and text data, we pre-train SpeechT5 to learn a unified-modal representation, hoping to improve the modeling capability for both speech and text. To align the textual and speech information into this unified semantic space, we propose a cross-modal vector quantization approach that randomly mixes up speech/text states with latent units as the interface between encoder and decoder.
+
+Extensive evaluations show the superiority of the proposed SpeechT5 framework on a wide variety of spoken language processing tasks, including automatic speech recognition, speech synthesis, speech translation, voice conversion, speech enhancement, and speaker identification.
+
+Developed by: Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei.
+Shared by [optional]: Matthijs Hollemans
+Model type: text-to-speech
+Language(s) (NLP): [More Information Needed]
+License: MIT
+Finetuned from model [optional]: [More Information Needed]
+Model Sources [optional]
+Repository: [https://github.com/microsoft/SpeechT5/]
+Paper: [https://arxiv.org/pdf/2110.07205.pdf]
+Blog Post: [https://huggingface.co/blog/speecht5]
+Demo: [https://huggingface.co/spaces/Matthijs/speecht5-tts-demo]
+
+"""
+import torch
+import soundfile as sf
+from transformers import (
+ pipeline,
+ SpeechT5Processor,
+ SpeechT5ForTextToSpeech,
+ SpeechT5HifiGan,
+)
+from datasets import load_dataset
+
+
+class SpeechT5:
+ """
+ SpeechT5Wrapper
+
+
+ Args:
+ model_name (str, optional): Model name or path. Defaults to "microsoft/speecht5_tts".
+ vocoder_name (str, optional): Vocoder name or path. Defaults to "microsoft/speecht5_hifigan".
+ dataset_name (str, optional): Dataset name or path. Defaults to "Matthijs/cmu-arctic-xvectors".
+
+ Attributes:
+ model_name (str): Model name or path.
+ vocoder_name (str): Vocoder name or path.
+ dataset_name (str): Dataset name or path.
+ processor (SpeechT5Processor): Processor for the SpeechT5 model.
+ model (SpeechT5ForTextToSpeech): SpeechT5 model.
+ vocoder (SpeechT5HifiGan): SpeechT5 vocoder.
+ embeddings_dataset (datasets.Dataset): Dataset containing speaker embeddings.
+
+ Methods
+ __call__: Synthesize speech from text.
+ save_speech: Save speech to a file.
+ set_model: Change the model.
+ set_vocoder: Change the vocoder.
+ set_embeddings_dataset: Change the embeddings dataset.
+ get_sampling_rate: Get the sampling rate of the model.
+ print_model_details: Print details of the model.
+ quick_synthesize: Customize pipeline method for quick synthesis.
+ change_dataset_split: Change dataset split (train, validation, test).
+ load_custom_embedding: Load a custom speaker embedding (xvector) for the text.
+
+ Usage:
+ >>> speechT5 = SpeechT5Wrapper()
+ >>> result = speechT5("Hello, how are you?")
+ >>> speechT5.save_speech(result)
+ >>> print("Speech saved successfully!")
+
+
+
+ """
+
+ def __init__(
+ self,
+ model_name="microsoft/speecht5_tts",
+ vocoder_name="microsoft/speecht5_hifigan",
+ dataset_name="Matthijs/cmu-arctic-xvectors",
+ ):
+ self.model_name = model_name
+ self.vocoder_name = vocoder_name
+ self.dataset_name = dataset_name
+ self.processor = SpeechT5Processor.from_pretrained(self.model_name)
+ self.model = SpeechT5ForTextToSpeech.from_pretrained(self.model_name)
+ self.vocoder = SpeechT5HifiGan.from_pretrained(self.vocoder_name)
+ self.embeddings_dataset = load_dataset(self.dataset_name, split="validation")
+
+ def __call__(self, text: str, speaker_id: float = 7306):
+ """Call the model on some text and return the speech."""
+ speaker_embedding = torch.tensor(
+ self.embeddings_dataset[speaker_id]["xvector"]
+ ).unsqueeze(0)
+ inputs = self.processor(text=text, return_tensors="pt")
+ speech = self.model.generate_speech(
+ inputs["input_ids"], speaker_embedding, vocoder=self.vocoder
+ )
+ return speech
+
+ def save_speech(self, speech, filename="speech.wav"):
+ """Save Speech to a file."""
+ sf.write(filename, speech.numpy(), samplerate=16000)
+
+ def set_model(self, model_name: str):
+ """Set the model to a new model."""
+ self.model_name = model_name
+ self.processor = SpeechT5Processor.from_pretrained(self.model_name)
+ self.model = SpeechT5ForTextToSpeech.from_pretrained(self.model_name)
+
+ def set_vocoder(self, vocoder_name):
+ """Set the vocoder to a new vocoder."""
+ self.vocoder_name = vocoder_name
+ self.vocoder = SpeechT5HifiGan.from_pretrained(self.vocoder_name)
+
+ def set_embeddings_dataset(self, dataset_name):
+ """Set the embeddings dataset to a new dataset."""
+ self.dataset_name = dataset_name
+ self.embeddings_dataset = load_dataset(self.dataset_name, split="validation")
+
+ # Feature 1: Get sampling rate
+ def get_sampling_rate(self):
+ """Get sampling rate of the model."""
+ return 16000
+
+ # Feature 2: Print details of the model
+ def print_model_details(self):
+ """Print details of the model."""
+ print(f"Model Name: {self.model_name}")
+ print(f"Vocoder Name: {self.vocoder_name}")
+
+ # Feature 3: Customize pipeline method for quick synthesis
+ def quick_synthesize(self, text):
+ """Customize pipeline method for quick synthesis."""
+ synthesiser = pipeline("text-to-speech", self.model_name)
+ speech = synthesiser(text)
+ return speech
+
+ # Feature 4: Change dataset split (train, validation, test)
+ def change_dataset_split(self, split="train"):
+ """Change dataset split (train, validation, test)."""
+ self.embeddings_dataset = load_dataset(self.dataset_name, split=split)
+
+ # Feature 5: Load a custom speaker embedding (xvector) for the text
+ def load_custom_embedding(self, xvector):
+ """Load a custom speaker embedding (xvector) for the text."""
+ return torch.tensor(xvector).unsqueeze(0)
+
+
+# if __name__ == "__main__":
+# speechT5 = SpeechT5Wrapper()
+# result = speechT5("Hello, how are you?")
+# speechT5.save_speech(result)
+# print("Speech saved successfully!")
diff --git a/swarms/models/trocr.py b/swarms/models/trocr.py
new file mode 100644
index 00000000..f4a4156d
--- /dev/null
+++ b/swarms/models/trocr.py
@@ -0,0 +1,19 @@
+"""
+
+TROCR for Multi-Modal OCR tasks
+
+
+"""
+from transformers import TrOCRProcessor, VisionEncoderDecoderModel
+from PIL import Image
+import requests
+
+
+class TrOCR:
+ def __init__(
+ self,
+ ):
+ pass
+
+ def __call__(self):
+ pass
diff --git a/swarms/models/zephyr.py b/swarms/models/zephyr.py
index 8ee12ed9..582bc740 100644
--- a/swarms/models/zephyr.py
+++ b/swarms/models/zephyr.py
@@ -1,16 +1,15 @@
"""Zephyr by HF"""
-import torch
+import torch
from transformers import pipeline
-
class Zephyr:
"""
Zehpyr model from HF
Args:
- max_new_tokens(int) = Number of max new tokens
+ max_new_tokens(int) = Number of max new tokens
temperature(float) = temperature of the LLM
top_k(float) = top k of the model set to 50
top_p(float) = top_p of the model set to 0.95
@@ -23,6 +22,7 @@ class Zephyr:
"""
+
def __init__(
self,
max_new_tokens: int = 300,
@@ -40,18 +40,23 @@ class Zephyr:
"text-generation",
model="HuggingFaceH4/zephyr-7b-alpha",
torch_dtype=torch.bfloa16,
- device_map="auto"
+ device_map="auto",
)
self.messages = [
{
"role": "system",
"content": "You are a friendly chatbot who always responds in the style of a pirate",
},
- {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
+ {
+ "role": "user",
+ "content": "How many helicopters can a human eat in one sitting?",
+ },
]
def __call__(self, text: str):
"""Call the model"""
- prompt = self.pipe.tokenizer.apply_chat_template(self.messages, tokenize=False, add_generation_prompt=True)
+ prompt = self.pipe.tokenizer.apply_chat_template(
+ self.messages, tokenize=False, add_generation_prompt=True
+ )
outputs = self.pipe(prompt, max_new_token=self.max_new_tokens)
- print(outputs[0])["generated_text"]
\ No newline at end of file
+ print(outputs[0])["generated_text"]
diff --git a/tests/chunkers/basechunker.py b/tests/chunkers/basechunker.py
new file mode 100644
index 00000000..f70705bc
--- /dev/null
+++ b/tests/chunkers/basechunker.py
@@ -0,0 +1,83 @@
+import pytest
+from swarms.chunkers.base import (
+ BaseChunker,
+ TextArtifact,
+ ChunkSeparator,
+ OpenAiTokenizer,
+) # adjust the import paths accordingly
+
+
+# 1. Test Initialization
+def test_chunker_initialization():
+ chunker = BaseChunker()
+ assert isinstance(chunker, BaseChunker)
+ assert chunker.max_tokens == chunker.tokenizer.max_tokens
+
+
+def test_default_separators():
+ chunker = BaseChunker()
+ assert chunker.separators == BaseChunker.DEFAULT_SEPARATORS
+
+
+def test_default_tokenizer():
+ chunker = BaseChunker()
+ assert isinstance(chunker.tokenizer, OpenAiTokenizer)
+
+
+# 2. Test Basic Chunking
+@pytest.mark.parametrize(
+ "input_text, expected_output",
+ [
+ ("This is a test.", [TextArtifact("This is a test.")]),
+ ("Hello World!", [TextArtifact("Hello World!")]),
+ # Add more simple cases
+ ],
+)
+def test_basic_chunk(input_text, expected_output):
+ chunker = BaseChunker()
+ result = chunker.chunk(input_text)
+ assert result == expected_output
+
+
+# 3. Test Chunking with Different Separators
+def test_custom_separators():
+ custom_separator = ChunkSeparator(";")
+ chunker = BaseChunker(separators=[custom_separator])
+ input_text = "Hello;World!"
+ expected_output = [TextArtifact("Hello;"), TextArtifact("World!")]
+ result = chunker.chunk(input_text)
+ assert result == expected_output
+
+
+# 4. Test Recursive Chunking
+def test_recursive_chunking():
+ chunker = BaseChunker(max_tokens=5)
+ input_text = "This is a more complex text."
+ expected_output = [
+ TextArtifact("This"),
+ TextArtifact("is a"),
+ TextArtifact("more"),
+ TextArtifact("complex"),
+ TextArtifact("text."),
+ ]
+ result = chunker.chunk(input_text)
+ assert result == expected_output
+
+
+# 5. Test Edge Cases and Special Scenarios
+def test_empty_text():
+ chunker = BaseChunker()
+ result = chunker.chunk("")
+ assert result == []
+
+
+def test_whitespace_text():
+ chunker = BaseChunker()
+ result = chunker.chunk(" ")
+ assert result == [TextArtifact(" ")]
+
+
+def test_single_word():
+ chunker = BaseChunker()
+ result = chunker.chunk("Hello")
+ assert result == [TextArtifact("Hello")]
diff --git a/tests/models/biogpt.py b/tests/models/biogpt.py
new file mode 100644
index 00000000..29cbe86c
--- /dev/null
+++ b/tests/models/biogpt.py
@@ -0,0 +1,207 @@
+from unittest.mock import patch
+
+# Import necessary modules
+import pytest
+import torch
+from transformers import BioGptForCausalLM, BioGptTokenizer
+
+
+
+# Fixture for BioGPT instance
+@pytest.fixture
+def biogpt_instance():
+ from swarms.models import (
+ BioGPT,
+ )
+
+ return BioGPT()
+
+
+# 36. Test if BioGPT provides a response for a simple biomedical question
+def test_biomedical_response_1(biogpt_instance):
+ question = "What are the functions of the mitochondria?"
+ response = biogpt_instance(question)
+ assert response and isinstance(response, str)
+
+
+# 37. Test for a genetics-based question
+def test_genetics_response(biogpt_instance):
+ question = "Can you explain the Mendelian inheritance?"
+ response = biogpt_instance(question)
+ assert response and isinstance(response, str)
+
+
+# 38. Test for a question about viruses
+def test_virus_response(biogpt_instance):
+ question = "How do RNA viruses replicate?"
+ response = biogpt_instance(question)
+ assert response and isinstance(response, str)
+
+
+# 39. Test for a cell biology related question
+def test_cell_biology_response(biogpt_instance):
+ question = "Describe the cell cycle and its phases."
+ response = biogpt_instance(question)
+ assert response and isinstance(response, str)
+
+
+# 40. Test for a question about protein structure
+def test_protein_structure_response(biogpt_instance):
+ question = "What's the difference between alpha helix and beta sheet structures in proteins?"
+ response = biogpt_instance(question)
+ assert response and isinstance(response, str)
+
+
+# 41. Test for a pharmacology question
+def test_pharmacology_response(biogpt_instance):
+ question = "How do beta blockers work?"
+ response = biogpt_instance(question)
+ assert response and isinstance(response, str)
+
+
+# 42. Test for an anatomy-based question
+def test_anatomy_response(biogpt_instance):
+ question = "Describe the structure of the human heart."
+ response = biogpt_instance(question)
+ assert response and isinstance(response, str)
+
+
+# 43. Test for a question about bioinformatics
+def test_bioinformatics_response(biogpt_instance):
+ question = "What is a BLAST search?"
+ response = biogpt_instance(question)
+ assert response and isinstance(response, str)
+
+
+# 44. Test for a neuroscience question
+def test_neuroscience_response(biogpt_instance):
+ question = "Explain the function of synapses in the nervous system."
+ response = biogpt_instance(question)
+ assert response and isinstance(response, str)
+
+
+# 45. Test for an immunology question
+def test_immunology_response(biogpt_instance):
+ question = "What is the role of T cells in the immune response?"
+ response = biogpt_instance(question)
+ assert response and isinstance(response, str)
+
+
+def test_init(bio_gpt):
+ assert bio_gpt.model_name == "microsoft/biogpt"
+ assert bio_gpt.max_length == 500
+ assert bio_gpt.num_return_sequences == 5
+ assert bio_gpt.do_sample is True
+ assert bio_gpt.min_length == 100
+
+
+def test_call(bio_gpt, monkeypatch):
+ def mock_pipeline(*args, **kwargs):
+ class MockGenerator:
+ def __call__(self, text, **kwargs):
+ return ["Generated text"]
+
+ return MockGenerator()
+
+ monkeypatch.setattr("transformers.pipeline", mock_pipeline)
+ result = bio_gpt("Input text")
+ assert result == ["Generated text"]
+
+
+def test_get_features(bio_gpt):
+ features = bio_gpt.get_features("Input text")
+ assert "last_hidden_state" in features
+
+
+def test_beam_search_decoding(bio_gpt):
+ generated_text = bio_gpt.beam_search_decoding("Input text")
+ assert isinstance(generated_text, str)
+
+
+def test_set_pretrained_model(bio_gpt):
+ bio_gpt.set_pretrained_model("new_model")
+ assert bio_gpt.model_name == "new_model"
+
+
+def test_get_config(bio_gpt):
+ config = bio_gpt.get_config()
+ assert "vocab_size" in config
+
+
+def test_save_load_model(tmp_path, bio_gpt):
+ bio_gpt.save_model(tmp_path)
+ bio_gpt.load_from_path(tmp_path)
+ assert bio_gpt.model_name == "microsoft/biogpt"
+
+
+def test_print_model(capsys, bio_gpt):
+ bio_gpt.print_model()
+ captured = capsys.readouterr()
+ assert "BioGptForCausalLM" in captured.out
+
+
+# 26. Test if set_pretrained_model changes the model_name
+def test_set_pretrained_model_name_change(biogpt_instance):
+ biogpt_instance.set_pretrained_model("new_model_name")
+ assert biogpt_instance.model_name == "new_model_name"
+
+
+# 27. Test get_config return type
+def test_get_config_return_type(biogpt_instance):
+ config = biogpt_instance.get_config()
+ assert isinstance(config, type(biogpt_instance.model.config))
+
+
+# 28. Test saving model functionality by checking if files are created
+@patch.object(BioGptForCausalLM, "save_pretrained")
+@patch.object(BioGptTokenizer, "save_pretrained")
+def test_save_model(mock_save_model, mock_save_tokenizer, biogpt_instance):
+ path = "test_path"
+ biogpt_instance.save_model(path)
+ mock_save_model.assert_called_once_with(path)
+ mock_save_tokenizer.assert_called_once_with(path)
+
+
+# 29. Test loading model from path
+@patch.object(BioGptForCausalLM, "from_pretrained")
+@patch.object(BioGptTokenizer, "from_pretrained")
+def test_load_from_path(mock_load_model, mock_load_tokenizer, biogpt_instance):
+ path = "test_path"
+ biogpt_instance.load_from_path(path)
+ mock_load_model.assert_called_once_with(path)
+ mock_load_tokenizer.assert_called_once_with(path)
+
+
+# 30. Test print_model doesn't raise any error
+def test_print_model_metadata(biogpt_instance):
+ try:
+ biogpt_instance.print_model()
+ except Exception as e:
+ pytest.fail(f"print_model() raised an exception: {e}")
+
+
+# 31. Test that beam_search_decoding uses the correct number of beams
+@patch.object(BioGptForCausalLM, "generate")
+def test_beam_search_decoding_num_beams(mock_generate, biogpt_instance):
+ biogpt_instance.beam_search_decoding("test_sentence", num_beams=7)
+ _, kwargs = mock_generate.call_args
+ assert kwargs["num_beams"] == 7
+
+
+# 32. Test if beam_search_decoding handles early_stopping
+@patch.object(BioGptForCausalLM, "generate")
+def test_beam_search_decoding_early_stopping(mock_generate, biogpt_instance):
+ biogpt_instance.beam_search_decoding("test_sentence", early_stopping=False)
+ _, kwargs = mock_generate.call_args
+ assert kwargs["early_stopping"] is False
+
+
+# 33. Test get_features return type
+def test_get_features_return_type(biogpt_instance):
+ result = biogpt_instance.get_features("This is a sample text.")
+ assert isinstance(result, torch.nn.modules.module.Module)
+
+
+# 34. Test if default model is set correctly during initialization
+def test_default_model_name(biogpt_instance):
+ assert biogpt_instance.model_name == "microsoft/biogpt"
diff --git a/tests/models/fuyu.py b/tests/models/fuyu.py
new file mode 100644
index 00000000..9a26dbfb
--- /dev/null
+++ b/tests/models/fuyu.py
@@ -0,0 +1,117 @@
+# tests/test_fuyu.py
+
+import pytest
+from swarms.models import Fuyu
+from transformers import FuyuProcessor, FuyuImageProcessor
+from PIL import Image
+
+
+# Basic test to ensure instantiation of class.
+def test_fuyu_initialization():
+ fuyu_instance = Fuyu()
+ assert isinstance(fuyu_instance, Fuyu)
+
+
+# Using parameterized testing for different init parameters.
+@pytest.mark.parametrize(
+ "pretrained_path, device_map, max_new_tokens",
+ [
+ ("adept/fuyu-8b", "cuda:0", 7),
+ ("adept/fuyu-8b", "cpu", 10),
+ ],
+)
+def test_fuyu_parameters(pretrained_path, device_map, max_new_tokens):
+ fuyu_instance = Fuyu(pretrained_path, device_map, max_new_tokens)
+ assert fuyu_instance.pretrained_path == pretrained_path
+ assert fuyu_instance.device_map == device_map
+ assert fuyu_instance.max_new_tokens == max_new_tokens
+
+
+# Fixture for creating a Fuyu instance.
+@pytest.fixture
+def fuyu_instance():
+ return Fuyu()
+
+
+# Test using the fixture.
+def test_fuyu_processor_initialization(fuyu_instance):
+ assert isinstance(fuyu_instance.processor, FuyuProcessor)
+ assert isinstance(fuyu_instance.image_processor, FuyuImageProcessor)
+
+
+# Test exception when providing an invalid image path.
+def test_invalid_image_path(fuyu_instance):
+ with pytest.raises(FileNotFoundError):
+ fuyu_instance("Hello", "invalid/path/to/image.png")
+
+
+# Using monkeypatch to replace the Image.open method to simulate a failure.
+def test_image_open_failure(fuyu_instance, monkeypatch):
+ def mock_open(*args, **kwargs):
+ raise Exception("Mocked failure")
+
+ monkeypatch.setattr(Image, "open", mock_open)
+
+ with pytest.raises(Exception, match="Mocked failure"):
+ fuyu_instance(
+ "Hello",
+ "https://plus.unsplash.com/premium_photo-1687149699194-0207c04bc6e8?auto=format&fit=crop&q=80&w=1378&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D",
+ )
+
+
+# Marking a slow test.
+@pytest.mark.slow
+def test_fuyu_model_output(fuyu_instance):
+ # This is a dummy test and may not be functional without real data.
+ output = fuyu_instance(
+ "Hello, my name is",
+ "https://plus.unsplash.com/premium_photo-1687149699194-0207c04bc6e8?auto=format&fit=crop&q=80&w=1378&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D",
+ )
+ assert isinstance(output, str)
+
+
+def test_tokenizer_type(fuyu_instance):
+ assert "tokenizer" in dir(fuyu_instance)
+
+
+def test_processor_has_image_processor_and_tokenizer(fuyu_instance):
+ assert fuyu_instance.processor.image_processor == fuyu_instance.image_processor
+ assert fuyu_instance.processor.tokenizer == fuyu_instance.tokenizer
+
+
+def test_model_device_map(fuyu_instance):
+ assert fuyu_instance.model.device_map == fuyu_instance.device_map
+
+
+# Testing maximum tokens setting
+def test_max_new_tokens_setting(fuyu_instance):
+ assert fuyu_instance.max_new_tokens == 7
+
+
+# Test if an exception is raised when invalid text is provided.
+def test_invalid_text_input(fuyu_instance):
+ with pytest.raises(Exception):
+ fuyu_instance(None, "path/to/image.png")
+
+
+# Test if an exception is raised when empty text is provided.
+def test_empty_text_input(fuyu_instance):
+ with pytest.raises(Exception):
+ fuyu_instance("", "path/to/image.png")
+
+
+# Test if an exception is raised when a very long text is provided.
+def test_very_long_text_input(fuyu_instance):
+ with pytest.raises(Exception):
+ fuyu_instance("A" * 10000, "path/to/image.png")
+
+
+# Check model's default device map
+def test_default_device_map():
+ fuyu_instance = Fuyu()
+ assert fuyu_instance.device_map == "cuda:0"
+
+
+# Testing if processor is correctly initialized
+def test_processor_initialization(fuyu_instance):
+ assert isinstance(fuyu_instance.processor, FuyuProcessor)
diff --git a/tests/models/idefics.py b/tests/models/idefics.py
new file mode 100644
index 00000000..610657bd
--- /dev/null
+++ b/tests/models/idefics.py
@@ -0,0 +1,119 @@
+import pytest
+from unittest.mock import patch
+import torch
+from swarms.models.idefics import Idefics, IdeficsForVisionText2Text, AutoProcessor
+
+
+@pytest.fixture
+def idefics_instance():
+ with patch(
+ "torch.cuda.is_available", return_value=False
+ ): # Assuming tests are run on CPU for simplicity
+ instance = Idefics()
+ return instance
+
+
+# Basic Tests
+def test_init_default(idefics_instance):
+ assert idefics_instance.device == "cpu"
+ assert idefics_instance.max_length == 100
+ assert not idefics_instance.chat_history
+
+
+@pytest.mark.parametrize(
+ "device,expected",
+ [
+ (None, "cpu"),
+ ("cuda", "cuda"),
+ ("cpu", "cpu"),
+ ],
+)
+def test_init_device(device, expected):
+ with patch(
+ "torch.cuda.is_available", return_value=True if expected == "cuda" else False
+ ):
+ instance = Idefics(device=device)
+ assert instance.device == expected
+
+
+# Test `run` method
+def test_run(idefics_instance):
+ prompts = [["User: Test"]]
+ with patch.object(idefics_instance, "processor") as mock_processor, patch.object(
+ idefics_instance, "model"
+ ) as mock_model:
+ mock_processor.return_value = {"input_ids": torch.tensor([1, 2, 3])}
+ mock_model.generate.return_value = torch.tensor([1, 2, 3])
+ mock_processor.batch_decode.return_value = ["Test"]
+
+ result = idefics_instance.run(prompts)
+
+ assert result == ["Test"]
+
+
+# Test `__call__` method (using the same logic as run for simplicity)
+def test_call(idefics_instance):
+ prompts = [["User: Test"]]
+ with patch.object(idefics_instance, "processor") as mock_processor, patch.object(
+ idefics_instance, "model"
+ ) as mock_model:
+ mock_processor.return_value = {"input_ids": torch.tensor([1, 2, 3])}
+ mock_model.generate.return_value = torch.tensor([1, 2, 3])
+ mock_processor.batch_decode.return_value = ["Test"]
+
+ result = idefics_instance(prompts)
+
+ assert result == ["Test"]
+
+
+# Test `chat` method
+def test_chat(idefics_instance):
+ user_input = "User: Hello"
+ response = "Model: Hi there!"
+ with patch.object(idefics_instance, "run", return_value=[response]):
+ result = idefics_instance.chat(user_input)
+
+ assert result == response
+ assert idefics_instance.chat_history == [user_input, response]
+
+
+# Test `set_checkpoint` method
+def test_set_checkpoint(idefics_instance):
+ new_checkpoint = "new_checkpoint"
+ with patch.object(
+ IdeficsForVisionText2Text, "from_pretrained"
+ ) as mock_from_pretrained, patch.object(AutoProcessor, "from_pretrained"):
+ idefics_instance.set_checkpoint(new_checkpoint)
+
+ mock_from_pretrained.assert_called_with(new_checkpoint, torch_dtype=torch.bfloat16)
+
+
+# Test `set_device` method
+def test_set_device(idefics_instance):
+ new_device = "cuda"
+ with patch.object(idefics_instance.model, "to"):
+ idefics_instance.set_device(new_device)
+
+ assert idefics_instance.device == new_device
+
+
+# Test `set_max_length` method
+def test_set_max_length(idefics_instance):
+ new_length = 150
+ idefics_instance.set_max_length(new_length)
+ assert idefics_instance.max_length == new_length
+
+
+# Test `clear_chat_history` method
+def test_clear_chat_history(idefics_instance):
+ idefics_instance.chat_history = ["User: Test", "Model: Response"]
+ idefics_instance.clear_chat_history()
+ assert not idefics_instance.chat_history
+
+
+# Exception Tests
+def test_run_with_empty_prompts(idefics_instance):
+ with pytest.raises(
+ Exception
+ ): # Replace Exception with the actual exception that may arise for an empty prompt.
+ idefics_instance.run([])
diff --git a/tests/models/nougat.py b/tests/models/nougat.py
new file mode 100644
index 00000000..e61a45af
--- /dev/null
+++ b/tests/models/nougat.py
@@ -0,0 +1,206 @@
+import os
+from unittest.mock import MagicMock, Mock, patch
+
+import pytest
+import torch
+from PIL import Image
+from transformers import NougatProcessor, VisionEncoderDecoderModel
+
+from swarms.models.nougat import Nougat
+
+
+@pytest.fixture
+def setup_nougat():
+ return Nougat()
+
+
+def test_nougat_default_initialization(setup_nougat):
+ assert setup_nougat.model_name_or_path == "facebook/nougat-base"
+ assert setup_nougat.min_length == 1
+ assert setup_nougat.max_new_tokens == 30
+
+
+def test_nougat_custom_initialization():
+ nougat = Nougat(model_name_or_path="custom_path", min_length=10, max_new_tokens=50)
+ assert nougat.model_name_or_path == "custom_path"
+ assert nougat.min_length == 10
+ assert nougat.max_new_tokens == 50
+
+
+def test_processor_initialization(setup_nougat):
+ assert isinstance(setup_nougat.processor, NougatProcessor)
+
+
+def test_model_initialization(setup_nougat):
+ assert isinstance(setup_nougat.model, VisionEncoderDecoderModel)
+
+
+@pytest.mark.parametrize(
+ "cuda_available, expected_device", [(True, "cuda"), (False, "cpu")]
+)
+def test_device_initialization(cuda_available, expected_device, monkeypatch):
+ monkeypatch.setattr(
+ torch, "cuda", Mock(is_available=Mock(return_value=cuda_available))
+ )
+ nougat = Nougat()
+ assert nougat.device == expected_device
+
+
+def test_get_image_valid_path(setup_nougat):
+ with patch("PIL.Image.open") as mock_open:
+ mock_open.return_value = Mock(spec=Image.Image)
+ assert setup_nougat.get_image("valid_path") is not None
+
+
+def test_get_image_invalid_path(setup_nougat):
+ with pytest.raises(FileNotFoundError):
+ setup_nougat.get_image("invalid_path")
+
+
+@pytest.mark.parametrize(
+ "min_len, max_tokens",
+ [
+ (1, 30),
+ (5, 40),
+ (10, 50),
+ ],
+)
+def test_model_call_with_diff_params(setup_nougat, min_len, max_tokens):
+ setup_nougat.min_length = min_len
+ setup_nougat.max_new_tokens = max_tokens
+
+ with patch("PIL.Image.open") as mock_open:
+ mock_open.return_value = Mock(spec=Image.Image)
+ # Here, mocking other required methods or adding more complex logic would be necessary.
+ result = setup_nougat("valid_path")
+ assert isinstance(result, str)
+
+
+def test_model_call_invalid_image_path(setup_nougat):
+ with pytest.raises(FileNotFoundError):
+ setup_nougat("invalid_path")
+
+
+def test_model_call_mocked_output(setup_nougat):
+ with patch("PIL.Image.open") as mock_open:
+ mock_open.return_value = Mock(spec=Image.Image)
+ mock_model = MagicMock()
+ mock_model.generate.return_value = "mocked_output"
+ setup_nougat.model = mock_model
+
+ result = setup_nougat("valid_path")
+ assert result == "mocked_output"
+
+
+@pytest.fixture
+def mock_processor_and_model():
+ """Mock the NougatProcessor and VisionEncoderDecoderModel to simulate their behavior."""
+ with patch(
+ "transformers.NougatProcessor.from_pretrained", return_value=Mock()
+ ), patch(
+ "transformers.VisionEncoderDecoderModel.from_pretrained", return_value=Mock()
+ ):
+ yield
+
+
+@pytest.mark.usefixtures("mock_processor_and_model")
+def test_nougat_with_sample_image_1(setup_nougat):
+ result = setup_nougat(
+ os.path.join(
+ "sample_images",
+ "https://plus.unsplash.com/premium_photo-1687149699194-0207c04bc6e8?auto=format&fit=crop&q=80&w=1378&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D",
+ )
+ )
+ assert isinstance(result, str)
+
+
+@pytest.mark.usefixtures("mock_processor_and_model")
+def test_nougat_with_sample_image_2(setup_nougat):
+ result = setup_nougat(os.path.join("sample_images", "test2.png"))
+ assert isinstance(result, str)
+
+
+@pytest.mark.usefixtures("mock_processor_and_model")
+def test_nougat_min_length_param(setup_nougat):
+ setup_nougat.min_length = 10
+ result = setup_nougat(
+ os.path.join(
+ "sample_images",
+ "https://plus.unsplash.com/premium_photo-1687149699194-0207c04bc6e8?auto=format&fit=crop&q=80&w=1378&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D",
+ )
+ )
+ assert isinstance(result, str)
+
+
+@pytest.mark.usefixtures("mock_processor_and_model")
+def test_nougat_max_new_tokens_param(setup_nougat):
+ setup_nougat.max_new_tokens = 50
+ result = setup_nougat(
+ os.path.join(
+ "sample_images",
+ "https://plus.unsplash.com/premium_photo-1687149699194-0207c04bc6e8?auto=format&fit=crop&q=80&w=1378&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D",
+ )
+ )
+ assert isinstance(result, str)
+
+
+@pytest.mark.usefixtures("mock_processor_and_model")
+def test_nougat_different_model_path(setup_nougat):
+ setup_nougat.model_name_or_path = "different/path"
+ result = setup_nougat(
+ os.path.join(
+ "sample_images",
+ "https://plus.unsplash.com/premium_photo-1687149699194-0207c04bc6e8?auto=format&fit=crop&q=80&w=1378&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D",
+ )
+ )
+ assert isinstance(result, str)
+
+
+@pytest.mark.usefixtures("mock_processor_and_model")
+def test_nougat_bad_image_path(setup_nougat):
+ with pytest.raises(Exception): # Adjust the exception type accordingly.
+ setup_nougat("bad_image_path.png")
+
+
+@pytest.mark.usefixtures("mock_processor_and_model")
+def test_nougat_image_large_size(setup_nougat):
+ result = setup_nougat(
+ os.path.join(
+ "sample_images",
+ "https://images.unsplash.com/photo-1697641039266-bfa00367f7cb?auto=format&fit=crop&q=60&w=400&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHx0b3BpYy1mZWVkfDJ8SnBnNktpZGwtSGt8fGVufDB8fHx8fA%3D%3D",
+ )
+ )
+ assert isinstance(result, str)
+
+
+@pytest.mark.usefixtures("mock_processor_and_model")
+def test_nougat_image_small_size(setup_nougat):
+ result = setup_nougat(
+ os.path.join(
+ "sample_images",
+ "https://images.unsplash.com/photo-1697638626987-aa865b769276?auto=format&fit=crop&q=60&w=400&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHx0b3BpYy1mZWVkfDd8SnBnNktpZGwtSGt8fGVufDB8fHx8fA%3D%3D",
+ )
+ )
+ assert isinstance(result, str)
+
+
+@pytest.mark.usefixtures("mock_processor_and_model")
+def test_nougat_image_varied_content(setup_nougat):
+ result = setup_nougat(
+ os.path.join(
+ "sample_images",
+ "https://images.unsplash.com/photo-1697469994783-b12bbd9c4cff?auto=format&fit=crop&q=60&w=400&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHx0b3BpYy1mZWVkfDE0fEpwZzZLaWRsLUhrfHxlbnwwfHx8fHw%3D",
+ )
+ )
+ assert isinstance(result, str)
+
+
+@pytest.mark.usefixtures("mock_processor_and_model")
+def test_nougat_image_with_metadata(setup_nougat):
+ result = setup_nougat(
+ os.path.join(
+ "sample_images",
+ "https://images.unsplash.com/photo-1697273300766-5bbaa53ec2f0?auto=format&fit=crop&q=60&w=400&ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHx0b3BpYy1mZWVkfDE5fEpwZzZLaWRsLUhrfHxlbnwwfHx8fHw%3D",
+ )
+ )
+ assert isinstance(result, str)
diff --git a/tests/models/vilt.py b/tests/models/vilt.py
new file mode 100644
index 00000000..b376f41b
--- /dev/null
+++ b/tests/models/vilt.py
@@ -0,0 +1,95 @@
+import pytest
+from unittest.mock import patch, Mock
+from swarms.models.vilt import Vilt, Image, requests
+
+
+# Fixture for Vilt instance
+@pytest.fixture
+def vilt_instance():
+ return Vilt()
+
+
+# 1. Test Initialization
+def test_vilt_initialization(vilt_instance):
+ assert isinstance(vilt_instance, Vilt)
+ assert vilt_instance.processor is not None
+ assert vilt_instance.model is not None
+
+
+# 2. Test Model Predictions
+@patch.object(requests, "get")
+@patch.object(Image, "open")
+def test_vilt_prediction(mock_image_open, mock_requests_get, vilt_instance):
+ mock_image = Mock()
+ mock_image_open.return_value = mock_image
+ mock_requests_get.return_value.raw = Mock()
+
+ # It's a mock response, so no real answer expected
+ with pytest.raises(Exception): # Ensure exception is more specific
+ vilt_instance(
+ "What is this image",
+ "https://images.unsplash.com/photo-1582538885592-e70a5d7ab3d3?ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&auto=format&fit=crop&w=1770&q=80",
+ )
+
+
+# 3. Test Exception Handling for network
+@patch.object(requests, "get", side_effect=requests.RequestException("Network error"))
+def test_vilt_network_exception(vilt_instance):
+ with pytest.raises(requests.RequestException):
+ vilt_instance(
+ "What is this image",
+ "https://images.unsplash.com/photo-1582538885592-e70a5d7ab3d3?ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&auto=format&fit=crop&w=1770&q=80",
+ )
+
+
+# Parameterized test cases for different inputs
+@pytest.mark.parametrize(
+ "text,image_url",
+ [
+ ("What is this?", "http://example.com/image1.jpg"),
+ ("Who is in the image?", "http://example.com/image2.jpg"),
+ ("Where was this picture taken?", "http://example.com/image3.jpg"),
+ # ... Add more scenarios
+ ],
+)
+def test_vilt_various_inputs(text, image_url, vilt_instance):
+ with pytest.raises(Exception): # Again, ensure exception is more specific
+ vilt_instance(text, image_url)
+
+
+# Test with invalid or empty text
+@pytest.mark.parametrize(
+ "text,image_url",
+ [
+ (
+ "",
+ "https://images.unsplash.com/photo-1582538885592-e70a5d7ab3d3?ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&auto=format&fit=crop&w=1770&q=80",
+ ),
+ (
+ None,
+ "https://images.unsplash.com/photo-1582538885592-e70a5d7ab3d3?ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&auto=format&fit=crop&w=1770&q=80",
+ ),
+ (
+ " ",
+ "https://images.unsplash.com/photo-1582538885592-e70a5d7ab3d3?ixlib=rb-4.0.3&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D&auto=format&fit=crop&w=1770&q=80",
+ ),
+ # ... Add more scenarios
+ ],
+)
+def test_vilt_invalid_text(text, image_url, vilt_instance):
+ with pytest.raises(ValueError):
+ vilt_instance(text, image_url)
+
+
+# Test with invalid or empty image_url
+@pytest.mark.parametrize(
+ "text,image_url",
+ [
+ ("What is this?", ""),
+ ("Who is in the image?", None),
+ ("Where was this picture taken?", " "),
+ ],
+)
+def test_vilt_invalid_image_url(text, image_url, vilt_instance):
+ with pytest.raises(ValueError):
+ vilt_instance(text, image_url)