5.0 KiB

Raw Blame History

vLLM Integration Guide

!!! info "Overview" vLLM is a high-performance and easy-to-use library for LLM inference and serving. This guide explains how to integrate vLLM with Swarms for efficient, production-grade language model deployment.

Installation

!!! note "Prerequisites" Before you begin, make sure you have Python 3.8+ installed on your system.

=== "pip" bash pip install -U vllm swarms

=== "poetry" bash poetry add vllm swarms

Basic Usage

Here's a simple example of how to use vLLM with Swarms:

from swarms.utils.vllm_wrapper import VLLMWrapper

# Initialize the vLLM wrapper
vllm = VLLMWrapper(
    model_name="meta-llama/Llama-2-7b-chat-hf",
    system_prompt="You are a helpful assistant.",
    temperature=0.7,
    max_tokens=4000
)

# Run inference
response = vllm.run("What is the capital of France?")
print(response)

VLLMWrapper Class

!!! abstract "Class Overview" The VLLMWrapper class provides a convenient interface for working with vLLM models.

Key Parameters

Parameter	Type	Description	Default
`model_name`	str	Name of the model to use	"meta-llama/Llama-2-7b-chat-hf"
`system_prompt`	str	System prompt to use	None
`stream`	bool	Whether to stream the output	False
`temperature`	float	Sampling temperature	0.5
`max_tokens`	int	Maximum number of tokens to generate	4000

Example with Custom Parameters

vllm = VLLMWrapper(
    model_name="meta-llama/Llama-2-13b-chat-hf",
    system_prompt="You are an expert in artificial intelligence.",
    temperature=0.8,
    max_tokens=2000
)

Integration with Agents

You can easily integrate vLLM with Swarms agents for more complex workflows:

from swarms import Agent
from swarms.utils.vllm_wrapper import VLLMWrapper

# Initialize vLLM
vllm = VLLMWrapper(
    model_name="meta-llama/Llama-2-7b-chat-hf",
    system_prompt="You are a helpful assistant."
)

# Create an agent with vLLM
agent = Agent(
    agent_name="Research-Agent",
    agent_description="Expert in conducting research and analysis",
    system_prompt="""You are an expert research agent. Your tasks include:
    1. Analyzing complex topics
    2. Providing detailed summaries
    3. Making data-driven recommendations""",
    llm=vllm,
    max_loops=1
)

# Run the agent
response = agent.run("Research the impact of AI on healthcare")

Advanced Features

Batch Processing

!!! tip "Performance Optimization" Use batch processing for efficient handling of multiple tasks simultaneously.

tasks = [
    "What is machine learning?",
    "Explain neural networks",
    "Describe deep learning"
]

results = vllm.batched_run(tasks, batch_size=3)

Error Handling

!!! warning "Error Management" Always implement proper error handling in production environments.

from loguru import logger

try:
    response = vllm.run("Complex task")
except Exception as error:
    logger.error(f"Error occurred: {error}")

Best Practices

!!! success "Recommended Practices" === "Model Selection" - Choose appropriate model sizes based on your requirements - Consider the trade-off between model size and inference speed

=== "System Resources"
    - Ensure sufficient GPU memory for your chosen model
    - Monitor resource usage during batch processing

=== "Prompt Engineering"
    - Use clear and specific system prompts
    - Structure user prompts for optimal results

=== "Error Handling"
    - Implement proper error handling and logging
    - Set up monitoring for production deployments

=== "Performance"
    - Use batch processing for multiple tasks
    - Adjust max_tokens based on your use case
    - Fine-tune temperature for optimal output quality

Example: Multi-Agent System

Here's an example of creating a multi-agent system using vLLM:

from swarms import Agent, ConcurrentWorkflow
from swarms.utils.vllm_wrapper import VLLMWrapper

# Initialize vLLM
vllm = VLLMWrapper(
    model_name="meta-llama/Llama-2-7b-chat-hf",
    system_prompt="You are a helpful assistant."
)

# Create specialized agents
research_agent = Agent(
    agent_name="Research-Agent",
    agent_description="Expert in research",
    system_prompt="You are a research expert.",
    llm=vllm
)

analysis_agent = Agent(
    agent_name="Analysis-Agent",
    agent_description="Expert in analysis",
    system_prompt="You are an analysis expert.",
    llm=vllm
)

# Create a workflow
agents = [research_agent, analysis_agent]
workflow = ConcurrentWorkflow(
    name="Research-Analysis-Workflow",
    description="Comprehensive research and analysis workflow",
    agents=agents
)

# Run the workflow
result = workflow.run("Analyze the impact of renewable energy")

5.0 KiB Raw Blame History