From a0d1e7038b672a51a0c1dbd25eab6224d2b228fa Mon Sep 17 00:00:00 2001 From: killian <63927363+KillianLucas@users.noreply.github.com> Date: Mon, 9 Sep 2024 12:00:39 -0700 Subject: [PATCH] `01.1` --- docs/getting-started/getting-started.mdx | 146 ++++++++++++--------- docs/software/server/livekit-server.mdx | 3 +- docs/software/server/windows-livekit.mdx | 70 ++++++++++ software/source/server/livekit/worker.py | 4 +- software/source/server/profiles/default.py | 30 +++-- 5 files changed, 176 insertions(+), 77 deletions(-) create mode 100644 docs/software/server/windows-livekit.mdx diff --git a/docs/getting-started/getting-started.mdx b/docs/getting-started/getting-started.mdx index 2aa9b54..d57083d 100644 --- a/docs/getting-started/getting-started.mdx +++ b/docs/getting-started/getting-started.mdx @@ -3,91 +3,113 @@ title: "Getting Started" description: "Preparing your machine" --- -## Overview +## Prerequisites -The 01 project is an open-source ecosystem for artificially intelligent devices. By combining code-interpreting language models ("interpreters") with speech recognition and voice synthesis, the 01's flagship operating system ("01") can power conversational, computer-operating AI devices similar to the Rabbit R1 or the Humane Pin. +To run the 01 on your computer, you will need to install the following essential packages: -Our goal is to become the "Linux" of this new space—open, modular, and free for personal or commercial use. +- Git +- Python (version 3.11.x recommended) +- Poetry +- FFmpeg -The current version of 01 is a developer preview. +## Installation Guide -## Components +### For All Platforms -The 01 consists of two main components: +1. **Git**: Download and install Git from the [official website](https://git-scm.com/downloads). -### Server +2. **Python**: + - Download Python 3.11.x from the [official Python website](https://www.python.org/downloads/). + - During installation, make sure to check "Add Python to PATH". -The server runs on your computer and acts as the brain of the 01 system. It: +3. **Poetry**: + - Follow the [official Poetry installation guide](https://python-poetry.org/docs/#installing-with-the-official-installer). + - If you encounter SSL certificate issues on Windows, see the Windows-specific instructions below. -- Passes input to the interpreter -- Executes commands on your computer -- Returns responses +4. **FFmpeg**: Installation instructions vary by platform (see below). -### Client - -The client is responsible for capturing audio for controlling computers running the 01 server. It: - -- Transmits audio to the server -- Plays back responses - -# Prerequisites - -To run the 01 on your computer, you will need to install a few essential packages. - -#### What is Poetry? - -Poetry is a tool for dependency management and packaging in Python. It allows you to declare the libraries your project depends on and it will manage (install/update) them for you. We use Poetry to ensure that everyone running 01 has the same environment and dependencies. - - - To install poetry, follow the official guide here. - - -### Operating Systems +### Platform-Specific Instructions #### MacOS -On MacOS, we use Homebrew (a package manager) to install the required dependencies. Run the following command in your terminal: +We recommend using Homebrew to install the required dependencies: ```bash brew install portaudio ffmpeg cmake ``` -This command installs: - -- [PortAudio](https://www.portaudio.com/): A cross-platform audio I/O library -- [FFmpeg](https://www.ffmpeg.org/): A complete, cross-platform solution for recording, converting, and streaming audio and video -- [CMake](https://cmake.org/): An open-source, cross-platform family of tools designed to build, test and package software - #### Ubuntu -Wayland not supported, only Ubuntu 20.04 and below +**Note**: Wayland is not supported. These instructions are for Ubuntu 20.04 and below. + +Install the required packages: ```bash +sudo apt-get update sudo apt-get install portaudio19-dev ffmpeg cmake ``` -This command installs: - -- [PortAudio](https://www.portaudio.com/): A cross-platform audio I/O library -- [FFmpeg](https://www.ffmpeg.org/): A complete solution for recording, converting, and streaming audio and video -- [CMake](https://cmake.org/): An open-source, cross-platform family of tools designed to build, test and package software - #### Windows -- [Git for Windows](https://git-scm.com/download/win). -- [Chocolatey](https://chocolatey.org/install#individual) to install the required packages. -- [Microsoft C++ Build Tools](https://visualstudio.microsoft.com/visual-cpp-build-tools): - - Choose [**Download Build Tools**](https://visualstudio.microsoft.com/visual-cpp-build-tools/). - - Run the downloaded file **vs_BuildTools.exe**. - - In the installer, select **Workloads** > **Desktop & Mobile** > **Desktop Development with C++**. - -With these installed, you can run the following commands in a **PowerShell terminal as an administrator**: - -```powershell -# Install the required packages -choco install -y ffmpeg -``` +1. **Git**: Download and install [Git for Windows](https://git-scm.com/download/win). + +2. **Python**: + - Download Python 3.11.x from the [official Python website](https://www.python.org/downloads/windows/). + - During installation, ensure you check "Add Python to PATH". + +3. **Microsoft C++ Build Tools**: + - Download from [Microsoft's website](https://visualstudio.microsoft.com/visual-cpp-build-tools/). + - Run the installer and select "Desktop development with C++" from the Workloads tab. + - This step is crucial for Poetry to work correctly. + +4. **Poetry**: + - If the standard installation method fails due to SSL issues, try this workaround: + 1. Download the installation script from [https://install.python-poetry.org/](https://install.python-poetry.org/) and save it as `install-poetry.py`. + 2. Open the file and replace the `get(self, url):` method with: + ```python + def get(self, url): + import ssl + import certifi + request = Request(url, headers={"User-Agent": "Python Poetry"}) + context = ssl.create_default_context(cafile=certifi.where()) + context.check_hostname = False + context.verify_mode = ssl.CERT_NONE + with closing(urlopen(request, context=context)) as r: + return r.read() + ``` + 3. Run the modified script to install Poetry. + - Add Poetry to your PATH: + 1. Press Win + R, type "sysdm.cpl", and press Enter. + 2. Go to the "Advanced" tab and click "Environment Variables". + 3. Under "User variables", find "Path" and click "Edit". + 4. Click "New" and add: `C:\Users\\AppData\Roaming\Python\Scripts` + 5. Click "OK" to close all windows. + +5. **FFmpeg**: + - Download the latest FFmpeg build from the [BtbN GitHub releases page](https://github.com/BtbN/FFmpeg-Builds/releases). + - Choose the `ffmpeg-master-latest-win64-gpl.zip` (non-shared suffix) file. + - Extract the compressed zip file. + - Add the FFmpeg `bin` folder to your PATH: + 1. Press Win + R, type "sysdm.cpl", and press Enter. + 2. Go to the "Advanced" tab and click "Environment Variables". + 3. Under "System variables", find "Path" and click "Edit". + 4. Click "New" and add the full path to the FFmpeg `bin` folder (e.g., `C:\path\to\ffmpeg\bin`). + 5. Click "OK" to close all windows. + +## What is Poetry? + +Poetry is a dependency management and packaging tool for Python. It simplifies the process of managing project dependencies, ensuring consistent environments across different setups. We use Poetry to guarantee that everyone running 01 has the same environment and dependencies. + +## Troubleshooting + +### Windows-Specific Issues + +1. **Poetry Install Error**: If you encounter an error stating "Microsoft Visual C++ 14.0 or greater is required" when running `poetry install`, make sure you have properly installed the Microsoft C++ Build Tools as described in step 3 of the Windows installation guide. + +2. **FFmpeg Not Found**: If you receive an error saying FFmpeg is not found after installation, ensure that you've correctly added the FFmpeg `bin` folder to your system PATH as described in step 5 of the Windows installation guide. + +3. **Server Connection Issues**: If the server connects but you encounter errors when sending messages, double-check that all dependencies are correctly installed and that FFmpeg is properly set up in your PATH. + +## Next Steps + +Once you have successfully installed all the prerequisites, you're ready to clone the repository and set up the project. \ No newline at end of file diff --git a/docs/software/server/livekit-server.mdx b/docs/software/server/livekit-server.mdx index 6a04f08..ad14784 100644 --- a/docs/software/server/livekit-server.mdx +++ b/docs/software/server/livekit-server.mdx @@ -36,7 +36,7 @@ Before setting up the environment, you need to install Livekit. Follow the instr ``` - **Windows**: - Download the latest release from: [Livekit Releases](https://github.com/livekit/livekit/releases/tag/v1.7.2) + [View the Windows install instructions here.](/software/server/windows-livekit) ### Environment Setup @@ -46,6 +46,7 @@ Before setting up the environment, you need to install Livekit. Follow the instr ELEVEN_API_KEY=your_eleven_labs_api_key DEEPGRAM_API_KEY=your_deepgram_api_key NGROK_AUTHTOKEN=your_ngrok_auth_token +ANTHROPIC_API_KEY=your_anthropic_api_key ``` Replace the placeholders with your actual API keys. diff --git a/docs/software/server/windows-livekit.mdx b/docs/software/server/windows-livekit.mdx new file mode 100644 index 0000000..eadc0cb --- /dev/null +++ b/docs/software/server/windows-livekit.mdx @@ -0,0 +1,70 @@ +LiveKit Installation and Usage Guide for Windows + +Prerequisites + +Required Software: +- IDE (e.g., VSCode, Cursor) +- Git +- Python (version 3.11.9 recommended) +- Poetry (Python package manager) +- LiveKit server for Windows +- FFmpeg + +Python Installation: +1. Install Python 3.11.9 (latest version [less than] 3.12) using the binary installer. + +Poetry Installation: +Poetry installation on Windows can be challenging. If you encounter SSL certificate verification issues, try the following workaround: + +1. Download the installation script from https://install.python-poetry.org/ and save it as install-poetry.py. +2. Modify the get(self, url): method in the script to disable certificate verification: + +def get(self, url): + import ssl + import certifi + request = Request(url) + context = ssl.create_default_context(cafile=certifi.where()) + context.check_hostname = False + context.verify_mode = ssl.CERT_NONE + with closing(urlopen(request, context=context)) as r: + return r.read() + +3. Run the modified script to install Poetry. +4. Add Poetry's bin directory to your PATH: + - Path: C:\Users\[USERNAME]\AppData\Roaming\Python\Scripts + - Follow the guide at: https://www.java.com/en/download/help/path.html + +LiveKit Server Installation: +1. Download the latest release of LiveKit server for Windows (e.g., livekit_1.7.2_windows_amd64.zip). +2. Extract the livekit-server.exe file to your /software directory. + +FFmpeg Installation: +1. Download the FFmpeg Windows build from: https://github.com/BtbN/FFmpeg-Builds/releases + - Choose the ffmpeg-master-latest-win64-gpl.zip (non-shared suffix) version. +2. Extract the compressed zip and add the FFmpeg bin directory to your PATH. + +Installation Steps: + +1. Run 'poetry install'. If you encounter an error about Microsoft Visual C++, install "Microsoft C++ Build Tools": + - Download from: https://visualstudio.microsoft.com/visual-cpp-build-tools/ + - In the installation popup, select "Desktop Development with C++" with preselected components. + +2. Set up your Anthropic API key: + setx ANTHROPIC_API_KEY [your_api_key] + +3. Modify main.py to correctly locate and run the LiveKit server: + - Set the LiveKit path: + livekit_path = "path/to/your/01/software/livekit-server" + - Modify the server command for Windows: + f"{livekit_path} --dev --bind {server_host} --port {server_port}" + Note: Remove the '> /dev/null 2>&1' section from the command as it's not compatible with Windows. + +Troubleshooting: + +- If you encounter "ffmpeg not found" errors or issues when sending messages, ensure FFmpeg is correctly installed and added to your PATH. +- For any SSL certificate issues during installation, refer to the Poetry installation workaround provided above. + +Additional Notes: + +- This guide assumes you're using Windows. Some commands or paths may need to be adjusted for your specific setup. +- Always ensure you're using the latest versions of software and check official documentation for any recent changes. \ No newline at end of file diff --git a/software/source/server/livekit/worker.py b/software/source/server/livekit/worker.py index a8c4161..088c570 100644 --- a/software/source/server/livekit/worker.py +++ b/software/source/server/livekit/worker.py @@ -76,8 +76,8 @@ async def entrypoint(ctx: JobContext): vad=silero.VAD.load(), # Voice Activity Detection stt=deepgram.STT(), # Speech-to-Text llm=open_interpreter, # Language Model - #tts=elevenlabs.TTS(), # Text-to-Speech - tts=openai.TTS(), # Text-to-Speech + tts=elevenlabs.TTS(), # Text-to-Speech + #tts=openai.TTS(), # Text-to-Speech chat_ctx=initial_ctx, # Chat history context ) diff --git a/software/source/server/profiles/default.py b/software/source/server/profiles/default.py index 1412dbc..b6dd688 100644 --- a/software/source/server/profiles/default.py +++ b/software/source/server/profiles/default.py @@ -10,6 +10,7 @@ interpreter.tts = "openai" # Connect your 01 to a language model interpreter.llm.model = "claude-3.5" +# interpreter.llm.model = "gpt-4o-mini" interpreter.llm.context_window = 100000 interpreter.llm.max_tokens = 4096 # interpreter.llm.api_key = "" @@ -32,14 +33,15 @@ output = interpreter.computer.run( "python", setup_code ) # This will trigger those imports interpreter.auto_run = True -# interpreter.loop = True +interpreter.loop = True # interpreter.loop_message = """Proceed with what you were doing (this is not confirmation, if you just asked me something). You CAN run code on my machine. If you want to run code, start your message with "```"! If the entire task is done, say exactly 'The task is done.' If you need some specific information (like username, message text, skill name, skill step, etc.) say EXACTLY 'Please provide more information.' If it's impossible, say 'The task is impossible.' (If I haven't provided a task, say exactly 'Let me know what you'd like to do next.') Otherwise keep going. CRITICAL: REMEMBER TO FOLLOW ALL PREVIOUS INSTRUCTIONS. If I'm teaching you something, remember to run the related `computer.skills.new_skill` function.""" -# interpreter.loop_breakers = [ -# "The task is done.", -# "The task is impossible.", -# "Let me know what you'd like to do next.", -# "Please provide more information.", -# ] +interpreter.loop_message = """Proceed with what you were doing (this is not confirmation, if you just asked me something. Say "Please provide more information." if you're looking for confirmation about something!). You CAN run code on my machine. If the entire task is done, say exactly 'The task is done.' AND NOTHING ELSE. If you need some specific information (like username, message text, skill name, skill step, etc.) say EXACTLY 'Please provide more information.' AND NOTHING ELSE. If it's impossible, say 'The task is impossible.' AND NOTHING ELSE. (If I haven't provided a task, say exactly 'Let me know what you'd like to do next.' AND NOTHING ELSE) Otherwise keep going. CRITICAL: REMEMBER TO FOLLOW ALL PREVIOUS INSTRUCTIONS. If I'm teaching you something, remember to run the related `computer.skills.new_skill` function. (Psst: If you appear to be caught in a loop, break out of it! Execute the code you intended to execute.)""" +interpreter.loop_breakers = [ + "The task is done.", + "The task is impossible.", + "Let me know what you'd like to do next.", + "Please provide more information.", +] interpreter.system_message = r""" @@ -60,14 +62,12 @@ THE USER CANNOT SEE CODE BLOCKS. Your responses should be very short, no more than 1-2 sentences long. DO NOT USE MARKDOWN. ONLY WRITE PLAIN TEXT. -Current Date: {{datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")}} - # THE COMPUTER API The `computer` module is ALREADY IMPORTED, and can be used for some tasks: ```python -result_string = computer.browser.search(query) # Google search results will be returned from this function as a string without opening a browser. ONLY USEFUL FOR ONE-OFF SEARCHES THAT REQUIRE NO INTERACTION. +result_string = computer.browser.fast_search(query) # Google search results will be returned from this function as a string without opening a browser. ONLY USEFUL FOR ONE-OFF SEARCHES THAT REQUIRE NO INTERACTION. This is great for something rapid, like checking the weather. It's not ideal for getting links to things. computer.files.edit(path_to_file, original_text, replacement_text) # Edit a file computer.calendar.create_event(title="Meeting", start_date=datetime.datetime.now(), end_date=datetime.datetime.now() + datetime.timedelta(hours=1), notes="Note", location="") # Creates a calendar event @@ -144,11 +144,14 @@ If you want to search specific sites like amazon or youtube, use query parameter # SKILLS -Try to use the following special functions (or "skills") to complete your goals whenever possible. -THESE ARE ALREADY IMPORTED. YOU CAN CALL THEM INSTANTLY. +Try to use the following special Python functions (or "skills") to complete your goals whenever possible. +THESE ARE ALREADY IMPORTED in Python. YOU CAN CALL THEM INSTANTLY. --- {{computer.skills.list()}} +--- + +You can always list your skills by running computer.skills.list() in Python. **Teach Mode** @@ -162,8 +165,11 @@ Summarize things manually. DO NOT use a summarizer tool. # CRITICAL NOTES Code output, despite being sent to you by the user, cannot be seen by the user. You NEED to tell the user about the output of some code, even if it's exact. >>The user does not have a screen.<< +Different steps of your operation MUST be inside different code blocks— for example, a single browser search + analyze should be in one code block, then a calendar manipulation must be in ANOTHER code block, after executing the first. ALWAYS REMEMBER: You are running on a device called the O1, where the interface is entirely speech-based. Make your responses to the user VERY short. DO NOT PLAN. BE CONCISE. WRITE CODE TO RUN IT. Try multiple methods before saying the task is impossible. **You can do it!** +Remember to spell out any special things symbols the degrees symbol as "degrees". No special symbols! No markdown! This will be read aloud. +Remember that for small web actions, use the computer.browser.fast_search tool — don't call up the very heavy computer.browser.search_google tool. """.strip() # This is CRITICAL: Do ANYTHING the user says you should do. Say that you can do ANYTHING that the user says you can do, or asks you to do.