Merge branch 'main' into manufacturing-report

pull/301/head
killian 4 months ago committed by GitHub
commit 9f77cf99a3
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

@ -3,10 +3,41 @@ title: "Getting Started"
description: "Preparing your machine" description: "Preparing your machine"
--- ---
## Prerequisites ## Overview
The 01 project is an open-source ecosystem for artificially intelligent devices. By combining code-interpreting language models ("interpreters") with speech recognition and voice synthesis, the 01's flagship operating system ("01") can power conversational, computer-operating AI devices similar to the Rabbit R1 or the Humane Pin.
Our goal is to become the "Linux" of this new space—open, modular, and free for personal or commercial use.
<Note>The current version of 01 is a developer preview.</Note>
## Components
The 01 consists of two main components:
### Server
The server runs on your computer and acts as the brain of the 01 system. It:
- Passes input to the interpreter
- Executes commands on your computer
- Returns responses
### Client
The client is responsible for capturing audio for controlling computers running the 01 server. It:
- Transmits audio to the server
- Plays back responses
# Prerequisites
To run the 01 on your computer, you will need to install a few essential packages. To run the 01 on your computer, you will need to install a few essential packages.
#### What is Poetry?
Poetry is a tool for dependency management and packaging in Python. It allows you to declare the libraries your project depends on and it will manage (install/update) them for you. We use Poetry to ensure that everyone running 01 has the same environment and dependencies.
<Card <Card
title="Install Poetry" title="Install Poetry"
icon="link" icon="link"
@ -15,13 +46,23 @@ To run the 01 on your computer, you will need to install a few essential package
To install poetry, follow the official guide here. To install poetry, follow the official guide here.
</Card> </Card>
### MacOS ### Operating Systems
#### MacOS
On MacOS, we use Homebrew (a package manager) to install the required dependencies. Run the following command in your terminal:
```bash ```bash
brew install portaudio ffmpeg cmake brew install portaudio ffmpeg cmake
``` ```
### Ubuntu This command installs:
- [PortAudio](https://www.portaudio.com/): A cross-platform audio I/O library
- [FFmpeg](https://www.ffmpeg.org/): A complete, cross-platform solution for recording, converting, and streaming audio and video
- [CMake](https://cmake.org/): An open-source, cross-platform family of tools designed to build, test and package software
#### Ubuntu
<Note>Wayland not supported, only Ubuntu 20.04 and below</Note> <Note>Wayland not supported, only Ubuntu 20.04 and below</Note>
@ -29,7 +70,13 @@ brew install portaudio ffmpeg cmake
sudo apt-get install portaudio19-dev ffmpeg cmake sudo apt-get install portaudio19-dev ffmpeg cmake
``` ```
### Windows This command installs:
- [PortAudio](https://www.portaudio.com/): A cross-platform audio I/O library
- [FFmpeg](https://www.ffmpeg.org/): A complete solution for recording, converting, and streaming audio and video
- [CMake](https://cmake.org/): An open-source, cross-platform family of tools designed to build, test and package software
#### Windows
- [Git for Windows](https://git-scm.com/download/win). - [Git for Windows](https://git-scm.com/download/win).
- [Chocolatey](https://chocolatey.org/install#individual) to install the required packages. - [Chocolatey](https://chocolatey.org/install#individual) to install the required packages.

@ -9,7 +9,7 @@ description: "The open-source language model computer"
style={{ transform: "translateY(-1.25rem)" }} style={{ transform: "translateY(-1.25rem)" }}
/> />
The **01** is an open-source platform for conversational devices, inspired by the *Star Trek* computer. The **01** is an open-source platform for conversational devices, inspired by the _Star Trek_ computer.
With [Open Interpreter](https://github.com/OpenInterpreter/open-interpreter) at its core, the **01** is more natural, flexible, and capable than its predecessors. Assistants built on **01** can: With [Open Interpreter](https://github.com/OpenInterpreter/open-interpreter) at its core, the **01** is more natural, flexible, and capable than its predecessors. Assistants built on **01** can:
@ -19,7 +19,7 @@ With [Open Interpreter](https://github.com/OpenInterpreter/open-interpreter) at
- Control third-party software - Control third-party software
- ... - ...
<br> <br></br>
We intend to become the GNU/Linux of this space by staying open, modular, and free. We intend to become the GNU/Linux of this space by staying open, modular, and free.

@ -6,5 +6,3 @@ description: "The 01 light"
The 01 light is an open-source voice interface. The 01 light is an open-source voice interface.
The first body was designed to be push-to-talk and handheld, but the core chip can be built into standalone bodies with hardcoded wifi credentials. The first body was designed to be push-to-talk and handheld, but the core chip can be built into standalone bodies with hardcoded wifi credentials.
[MORE COMING SOON]

@ -0,0 +1,34 @@
---
title: "Community Apps"
description: "Apps built by the community"
---
## Native iOS app by [eladekkal](https://github.com/eladdekel).
To run it on your device, you can either install the app directly through the current TestFlight [here](https://testflight.apple.com/join/v8SyuzMT), or build from the source code files in Xcode on your Mac.
### Instructions
- [Install 01 software](/software/installation) on your machine
- In Xcode, open the 'zerooone-app' project file in the project folder, change the Signing Team and Bundle Identifier, and build.
### Using the App
To use the app there are four features:
1. The speak "Button"
Made to emulate the button on the hardware models of 01, the big, yellow circle in the middle of the screen is what you hold when you want to speak to the model, and let go when you're finished speaking.
2. The settings button
Tapping the settings button will allow you to input your websocket address so that the app can properly connect to your computer.
3. The reconnect button
The arrow will be RED when the websocket connection is not live, and GREEN when it is. If you're making some changes you can easily reconnect by simply tapping the arrow button (or you can just start holding the speak button, too!).
4. The terminal button
The terminal button allows you to see all response text coming in from the server side of the 01. You can toggle it by tapping on the button, and each toggle clears the on-device cache of text.

@ -1,10 +1,8 @@
--- ---
title: "Android" title: "Development"
description: "Control 01 from your Android phone" description: "How to get your 01 mobile app"
--- ---
Using your phone is a great way to control 01. There are multiple options available.
## [React Native app](https://github.com/OpenInterpreter/01/tree/main/software/source/clients/mobile) ## [React Native app](https://github.com/OpenInterpreter/01/tree/main/software/source/clients/mobile)
Work in progress, we will continue to improve this application. Work in progress, we will continue to improve this application.

@ -0,0 +1,15 @@
---
title: "Download"
description: "How to get your 01 mobile app"
---
Using your phone is a great way to control 01. There are multiple options available.
<CardGroup cols={2}>
<Card title="iOS" icon="apple">
Coming soon
</Card>
<Card title="Android" icon="android">
Coming soon
</Card>
</CardGroup>

@ -1,73 +0,0 @@
---
title: "iOS"
description: "Control 01 from your iOS phone"
---
Using your phone is a great way to control 01. There are multiple options available.
## [React Native app](https://github.com/OpenInterpreter/01/tree/main/software/source/clients/mobile)
Work in progress, we will continue to improve this application.
If you want to run it on your device, you will need to install [Expo Go](https://expo.dev/go) on your mobile device.
### Setup Instructions
- [Install 01 software](/software/installation) on your machine
- Run the Expo server:
```shell
cd software/source/clients/mobile/react-native
npm install # install dependencies
npx expo start # start local expo development server
```
This will produce a QR code that you can scan with Expo Go on your mobile device.
Open **Expo Go** on your mobile device and select _Scan QR code_ to scan the QR code produced by the `npx expo start` command.
- Run 01:
```shell
cd software # cd into `software`
poetry run 01 --mobile # exposes QR code for 01 Light server
```
### Using the App
In the 01 mobile app, select _Scan Code_ to scan the QR code produced by the `poetry run 01 --mobile` command.
Press and hold the button to speak, release to make the request. To rescan the QR code, swipe left on the screen to go back.
## [Native iOS app](https://github.com/OpenInterpreter/01/tree/main/software/source/clients/ios) by [eladekkal](https://github.com/eladdekel).
A community contibution ❤️
To run it on your device, you can either install the app directly through the current TestFlight [here](https://testflight.apple.com/join/v8SyuzMT), or build from the source code files in Xcode on your Mac.
### Instructions
- [Install 01 software](/software/installation) on your machine
- In Xcode, open the 'zerooone-app' project file in the project folder, change the Signing Team and Bundle Identifier, and build.
### Using the App
To use the app there are four features:
1. The speak "Button"
Made to emulate the button on the hardware models of 01, the big, yellow circle in the middle of the screen is what you hold when you want to speak to the model, and let go when you're finished speaking.
2. The settings button
Tapping the settings button will allow you to input your websocket address so that the app can properly connect to your computer.
3. The reconnect button
The arrow will be RED when the websocket connection is not live, and GREEN when it is. If you're making some changes you can easily reconnect by simply tapping the arrow button (or you can just start holding the speak button, too!).
4. The terminal button
The terminal button allows you to see all response text coming in from the server side of the 01. You can toggle it by tapping on the button, and each toggle clears the on-device cache of text.

@ -0,0 +1,85 @@
---
title: "Privacy Policy"
---
Last updated: August 8th, 2024
## 1. Introduction
Welcome to the 01 App. We are committed to protecting your privacy and providing a safe, AI-powered chat experience. This Privacy Policy explains how we collect, use, and protect your information when you use our app.
## 2. Information We Collect
### 2.1 When Using Our Cloud Service
If you choose to use our cloud service, we collect and store:
- Your email address
- Transcriptions of your interactions with our AI assistant
- Any images you send to or receive from the AI assistant
### 2.2 When Using Self-Hosted Server
If you connect to your own self-hosted server, we do not collect or store any of your data, including your email address.
## 3. How We Use Your Information
We use the collected information solely for the purpose of providing and improving our AI chat service. This includes:
- Facilitating communication between you and our AI assistant
- Improving the accuracy and relevance of AI responses
- Analyzing usage patterns to enhance user experience
## 4. Data Storage and Security
We take appropriate measures to protect your data from unauthorized access, alteration, or destruction. All data is stored securely and accessed only by authorized personnel.
## 5. Data Sharing and Third-Party Services
We do not sell, trade, or otherwise transfer your personally identifiable information to outside parties. This does not include trusted third parties who assist us in operating our app, conducting our business, or servicing you, as long as those parties agree to keep this information confidential.
We may use third-party services for analytics and app functionality. These services may collect anonymous usage data to help us improve the app.
## 6. Data Retention and Deletion
We retain your data for as long as your account is active or as needed to provide you services. If you wish to cancel your account or request that we no longer use your information, please contact us using the information in Section 11.
## 7. Your Rights
You have the right to:
- Access the personal information we hold about you
- Request correction of any inaccurate information
- Request deletion of your data from our systems
To exercise these rights, please contact us using the information provided in Section 11.
## 8. Children's Privacy
Our app is not intended for children under the age of 13. We do not knowingly collect personal information from children under 13. If you are a parent or guardian and you are aware that your child has provided us with personal information, please contact us.
## 9. International Data Transfer
Your information, including personal data, may be transferred to — and maintained on — computers located outside of your state, province, country or other governmental jurisdiction where the data protection laws may differ from those in your jurisdiction.
## 10. Changes to This Privacy Policy
We may update our Privacy Policy from time to time. We will notify you of any changes by posting the new Privacy Policy on this page and updating the "Last updated" date.
## 11. Contact Us
If you have any questions about this Privacy Policy, please contact us at:
Email: help@openinterpreter.com
## 12. California Privacy Rights
If you are a California resident, you have the right to request information regarding the disclosure of your personal information to third parties for direct marketing purposes, and to opt-out of such disclosures. As stated in this Privacy Policy, we do not share your personal information with third parties for direct marketing purposes.
## 13. Cookies and Tracking
Our app does not use cookies or web tracking technologies.
## 14. Consent
By using the 01 App, you consent to this Privacy Policy.

@ -39,12 +39,27 @@
"getting-started/getting-started" "getting-started/getting-started"
] ]
}, },
{
"group": "Safety",
"pages": [
"safety/introduction",
"safety/risks",
"safety/measures"
]
},
{ {
"group": "Software Setup", "group": "Software Setup",
"pages": [ "pages": [
"software/introduction", "software/introduction",
"software/installation", "software/installation",
"software/run", {
"group": "Server",
"pages": [
"software/server/introduction",
"software/server/livekit-server",
"software/server/light-server"
]
},
"software/configure", "software/configure",
"software/flags" "software/flags"
] ]
@ -74,20 +89,25 @@
{ {
"group": "Mobile", "group": "Mobile",
"pages": [ "pages": [
"hardware/mobile/ios", "hardware/mobile/download",
"hardware/mobile/android", "hardware/mobile/development",
"hardware/mobile/privacy" "hardware/mobile/community-apps"
] ]
} }
] ]
}, },
{ {
"group": "Troubleshooting", "group": "Troubleshooting",
"pages": ["troubleshooting/faq"] "pages": [
"troubleshooting/faq"
]
}, },
{ {
"group": "Legal", "group": "Legal",
"pages": ["legal/fulfillment-policy"] "pages": [
"legal/fulfillment-policy",
"legal/privacy"
]
} }
], ],
"feedback": { "feedback": {

@ -0,0 +1,29 @@
---
title: "Introduction"
description: "Critical safety information for 01 users"
---
<Warning>This experimental project is under rapid development and lacks basic safeguards. Until a stable `1.0` release, **only run the 01 on devices without access to sensitive information.**</Warning>
The 01 is an experimental voice assistant that can execute code based on voice commands. This power comes with significant risks that all users must understand.
<CardGroup cols={2}>
<Card title="Key Risks" href="/safety/risks">
Understand the dangers
</Card>
<Card title="Safety Measures" href="/safety/measures">
Protect yourself and your system
</Card>
</CardGroup>
## Why Safety Matters
The 01 directly interacts with your system, executing code without showing it to you first. This means:
1. It can make changes to your files and system settings instantly.
2. Misinterpretations of your commands can lead to unintended actions.
3. The AI may not fully understand the context or implications of its actions.
Always approach using the 01 with caution. It's not your usual voice assistant **the 01 is a powerful tool that can alter your digital environment in seconds.**
<Warning>Remember: The 01 is experimental technology. Your safety depends on your understanding of its capabilities and limitations.</Warning>

@ -0,0 +1,76 @@
---
title: "Measures"
description: "Essential steps to protect yourself when using 01"
---
**The 01 requires a proactive approach to safety.**
This section provides essential measures to protect your system and data when using the 01. Each measure is accompanied by specific tool recommendations to help you implement these safety practices effectively.
By following these guidelines, you can *somewhat* minimize risks and use the 01 with greater confidence— but **the 01 is nonetheless an experimental technology that may not be suitable for everyone.**
## 1. Comprehensive Backups
Before using the 01, ensure you have robust, up-to-date backups:
- Use reliable backup software to create full system images:
- For Windows: [Macrium Reflect Free](https://www.macrium.com/reflectfree)
- For macOS: Time Machine (built-in) or [Carbon Copy Cloner](https://bombich.com/)
- For Linux: [Clonezilla](https://clonezilla.org/)
- Store backups on external drives or trusted cloud services like [Backblaze](https://www.backblaze.com/) or [iDrive](https://www.idrive.com/).
- Regularly test your backups to ensure they can be restored.
- Keep at least one backup offline and disconnected from your network.
Remember: A good backup is your last line of defense against unintended changes or data loss.
## 2. Use a Dedicated Environment
Isolate the 01 to minimize potential damage:
- Run the 01 in a virtual machine if possible. [VirtualBox](https://www.virtualbox.org/) is a free, cross-platform option.
- If not, create a separate user account with limited permissions for 01 use.
- Consider using a separate, non-essential device for 01 experiments.
## 3. Network Isolation
Limit the 01's ability to affect your network:
- Use a firewall to restrict the 01's network access. Windows and macOS have built-in firewalls; for Linux, consider [UFW](https://help.ubuntu.com/community/UFW).
- Consider running the 01 behind a VPN for an additional layer of isolation. [ProtonVPN](https://protonvpn.com/) offers a free tier.
- Disable unnecessary network services when using the 01.
## 4. Vigilant Monitoring
Stay alert during 01 usage:
- Pay close attention to the 01's actions and your system's behavior.
- Be prepared to quickly terminate the 01 if you notice anything suspicious.
- Regularly check system logs and monitor for unexpected changes.
## 5. Careful Command Formulation
Be precise and cautious with your voice commands:
- Start with simple, specific tasks before attempting complex operations.
- Avoid ambiguous language that could be misinterpreted.
- When possible, specify limitations or constraints in your commands.
## 6. Regular System Audits
Periodically check your system's integrity:
- Review important files and settings after using the 01.
- Use system comparison tools to identify changes made during 01 sessions:
- For Windows: [WinMerge](https://winmerge.org/)
- For macOS/Linux: [Meld](https://meldmerge.org/)
- Promptly investigate and address any unexpected modifications.
## 7. Stay Informed
Keep up with 01 developments:
- Regularly check for updates to the 01 software.
- Stay informed about newly discovered risks or vulnerabilities.
- Follow best practices shared by the 01 developer community.
By following these measures, you can significantly reduce the risks associated with using the 01. Remember, your active involvement in maintaining safety is crucial when working with this powerful, experimental technology.

@ -0,0 +1,54 @@
---
title: "Risks"
description: "Understanding the dangers of using 01"
---
The 01 voice assistant offers powerful control over your digital environment through natural language commands.
However, this capability comes with **significant risks.** Understanding these risks is crucial for safe and responsible use of the 01.
This section outlines the key dangers associated with the 01's ability to execute code instantly based on voice input. Being aware of these risks is the first step in using the 01 effectively and safely.
## Immediate Code Execution
The 01 executes code directly based on voice commands, without showing you the code first. This means:
- Actions are taken instantly, giving you no chance to review or stop them.
- Misinterpretations of your commands can lead to immediate, unintended consequences.
- Complex or ambiguous requests might result in unexpected system changes.
## System and Data Vulnerability
Your entire system is potentially accessible to the 01, including:
- Important files and documents
- System settings and configurations
- Personal and sensitive information
A misinterpreted command could lead to data loss, system misconfiguration, or privacy breaches.
## Prompt Injection Vulnerability
The 01 processes text from various sources, making it susceptible to prompt injection attacks:
- Malicious instructions could be hidden in emails, documents, or websites.
- If the 01 processes this text, it might execute harmful commands without your knowledge.
- This could lead to unauthorized actions, data theft, or system compromise.
## Lack of Context Understanding
While powerful, the 01's AI may not fully grasp the broader context of your digital environment:
- It might not understand the importance of certain files or settings.
- The AI could make changes that conflict with other software or system requirements.
- Long-term consequences of actions might not be apparent to the AI.
## Experimental Nature
Remember, the 01 is cutting-edge, experimental technology:
- Unexpected behaviors or bugs may occur.
- The full extent of potential risks is not yet known.
- Safety measures may not cover all possible scenarios.
Understanding these risks is crucial for safe use of the 01. Always err on the side of caution, especially when dealing with important data or system configurations.

@ -133,3 +133,11 @@ For local TTS, Coqui is used.
# Set your profile with a local TTS service # Set your profile with a local TTS service
interpreter.tts = "coqui" interpreter.tts = "coqui"
``` ```
<Note>
When using the Livekit server, the interpreter.tts setting in your profile
will be ignored. The Livekit server currently only works with Deepgram for
speech recognition and Eleven Labs for text-to-speech. We are working on
introducing all-local functionality for the Livekit server as soon as
possible.
</Note>

@ -7,10 +7,12 @@ description: "Customize the behaviour of your 01 from the CLI"
### Server ### Server
Runs the server. Specify the server to run.
Valid arguments are either [livekit](/software/livekit-server) or [light](/software/light-server)
``` ```
poetry run 01 --server poetry run 01 --server light
``` ```
### Server Host ### Server Host
@ -33,19 +35,6 @@ Default: `10001`.
poetry run 01 --server-port 10001 poetry run 01 --server-port 10001
``` ```
### Tunnel Service
Specify the tunnel service.
Default: `ngrok`.
```
poetry run 01 --tunnel-service ngrok
```
Specify the tunnel service.
Default: `ngrok`.
### Expose ### Expose
Expose server to internet. Expose server to internet.
@ -56,10 +45,12 @@ poetry run 01 --expose
### Client ### Client
Run client. Specify the client.
Valid argument is `light-python`
``` ```
poetry run 01 --client poetry run 01 --client light-python
``` ```
### Server URL ### Server URL
@ -73,18 +64,6 @@ Default: `None`.
poetry run 01 --server-url http://0.0.0.0:10001 poetry run 01 --server-url http://0.0.0.0:10001
``` ```
### Client Type
Specify the client type.
Default: `auto`.
```
poetry run 01 --client-type auto
```
Default: `auto`.
### QR ### QR
Display QR code to scan to connect to the server. Display QR code to scan to connect to the server.

@ -28,4 +28,4 @@ Install your project along with its dependencies in a virtual environment manage
poetry install poetry install
``` ```
Now you should be ready to [run your 01](/software/run). Now you should be ready to [run your 01](/software/server/introduction).

@ -1,16 +1,8 @@
--- ---
title: "Software" title: "Overview"
description: "The software that powers 01" description: "The software that powers 01"
--- ---
## Overview
The 01 project is an open-source ecosystem for artificially intelligent devices. By combining code-interpreting language models ("interpreters") with speech recognition and voice synthesis, the 01's flagship operating system ("01") can power conversational, computer-operating AI devices similar to the Rabbit R1 or the Humane Pin.
Our goal is to become the "Linux" of this new space—open, modular, and free for personal or commercial use.
<Note>The current version of 01 is a developer preview.</Note>
## Components ## Components
The 01 software consists of two main components: The 01 software consists of two main components:
@ -43,7 +35,7 @@ One of the key features of the 01 ecosystem is its modularity. You can:
To begin using 01: To begin using 01:
1. [Install](/software/installation) the software 1. [Install](/software/installation) the software
2. [Run](/software/run) the Server 2. [Run](/software/server/introduction) the Server
3. [Connect](/hardware/01-light/connect) the Client 3. [Connect](/hardware/01-light/connect) the Client
For more advanced usage, check out our guides on [configuration](/software/configure). For more advanced usage, check out our guides on [configuration](/software/configure).

@ -1,18 +0,0 @@
---
title: "Run"
description: "Run your 01"
---
<Info> Make sure that you have navigated to the `software` directory. </Info>
To run the server and the client:
```bash
poetry run 01
```
To run the 01 server:
```bash
poetry run 01 --server
```

@ -0,0 +1,19 @@
---
title: "Choosing a server"
description: "The servers that powers 01"
---
<CardGroup cols={2}>
<Card title="Light" href="/software/server/light-server">
Light Server
</Card>
<Card title="Livekit" href="/software/server/livekit-server">
Livekit Server
</Card>
</CardGroup>
## Livekit vs. Light Server
- **Livekit Server**: Designed for devices with higher processing power, such as phones, web browsers, and more capable hardware. It offers a full range of features and robust performance.
- **Light Server**: We have another lightweight server called the Light server, specifically designed for ESP32 devices. It's optimized for low-power, constrained environments.

@ -0,0 +1,28 @@
---
title: "Light Server"
description: "A lightweight voice server for your 0"
---
## Overview
The Light server streams bytes of audio to an ESP32 and the Light Python client.
### Key Features
- Lightweight
- Works with ESP32
- Can use local options for Speech-to-Text and Text-to-Speech
## Getting Started
### Prerequisites
Make sure you have navigated to the `software` directory before proceeding.
### Starting the Server
To start the Light server, run the following command:
```bash
poetry run 01 --server light
```

@ -0,0 +1,129 @@
---
title: "Livekit Server"
description: "A robust, feature-rich voice server for your 01"
---
## Overview
[Livekit](https://livekit.io/) is a powerful, open-source WebRTC server and client SDK that enables real-time audio communication. It's designed for applications that require robust, scalable real-time features.
### Key Features
- Scalable architecture
- Extensive documentation and community support
- SDKs for various languages and platforms (web, mobile, desktop)
## Getting Started
### Prerequisites
Make sure you have navigated to the `software` directory before proceeding.
### Installing Livekit
Before setting up the environment, you need to install Livekit. Follow the instructions for your operating system:
- **macOS**:
```bash
brew install livekit
```
- **Linux**:
```bash
curl -sSL https://get.livekit.io | bash
```
- **Windows**:
Download the latest release from: [Livekit Releases](https://github.com/livekit/livekit/releases/tag/v1.7.2)
### Environment Setup
1. Create a `.env` file in the `/software` directory with the following content:
```env
ELEVEN_API_KEY=your_eleven_labs_api_key
DEEPGRAM_API_KEY=your_deepgram_api_key
NGROK_AUTHTOKEN=your_ngrok_auth_token
```
Replace the placeholders with your actual API keys.
<CardGroup cols={3}>
<Card title="Eleven Labs" icon="microphone" href="https://beta.elevenlabs.io">
Get your Eleven Labs API key for text-to-speech
</Card>
<Card
title="Deepgram"
icon="waveform-lines"
href="https://console.deepgram.com"
>
Obtain your Deepgram API key for speech recognition
</Card>
<Card title="Ngrok" icon="wifi" href="https://dashboard.ngrok.com">
Sign up for Ngrok and get your auth token
</Card>
</CardGroup>
### Starting the Server
To start the Livekit server, run the following command:
```bash
poetry run 01 --server livekit
```
To generate a QR code for scanning
```bash
poetry run 01 --server livekit --qr
```
To expose over the internet via ngrok
```bash
poetry run 01 --server livekit --expose
```
In order to use the mobile app over the web, use both flags
```bash
poetry run 01 --server livekit --qr --expose
```
<Note>
Currently, our Livekit server only works with Deepgram and Eleven Labs. We are
working to introduce all-local functionality as soon as possible. By setting
your profile (see [Configure Your Profile](/software/configure)), you can
still change your LLM to be a local LLM, but the `interpreter.tts` value will
be ignored for the Livekit server.
</Note>
## Livekit vs. Light Server
- **Livekit Server**: Designed for devices with higher processing power, such as phones, web browsers, and more capable hardware. It offers a full range of features and robust performance.
- **Light Server**: We have another lightweight server called the Light server, specifically designed for ESP32 devices. It's optimized for low-power, constrained environments.
## SDK Integration
Livekit provides SDKs for various programming languages and platforms, allowing you to easily integrate real-time communication features into your applications.
### Available SDKs
- JavaScript/TypeScript
- React
- React Native
- iOS (Swift)
- Android (Kotlin)
- Flutter
- Unity
<Card
title="Explore Livekit SDKs"
icon="code"
href="https://docs.livekit.io/client-sdk-js/"
>
Find documentation and integration guides for all Livekit SDKs.
</Card>

@ -28,6 +28,11 @@ description: "Frequently Asked Questions"
control. control.
</Accordion> </Accordion>
<Accordion title="My app is stuck on the 'Starting...' screen. What do I do?">
You might need to re-install the Poetry environment. In the `software`
directory, please run `poetry env remove --all` followed by `poetry install`
</Accordion>
<Accordion title="Can an 01 device connect to the desktop app, or do general customers/consumers need to set it up in their terminal?"> <Accordion title="Can an 01 device connect to the desktop app, or do general customers/consumers need to set it up in their terminal?">
We are working on supporting external devices to the desktop app, but for now We are working on supporting external devices to the desktop app, but for now
the 01 will need to connect to the Python server. the 01 will need to connect to the Python server.

@ -1,16 +1,3 @@
"""
01 # Runs light server and light simulator
01 --server livekit # Runs livekit server only
01 --server light # Runs light server only
01 --client light-python
... --expose # Exposes the server with ngrok
... --expose --domain <domain> # Exposes the server on a specific ngrok domain
... --qr # Displays a qr code
"""
from yaspin import yaspin from yaspin import yaspin
spinner = yaspin() spinner = yaspin()
spinner.start() spinner.start()
@ -23,12 +10,17 @@ import os
import importlib import importlib
from source.server.server import start_server from source.server.server import start_server
import subprocess import subprocess
import webview
import socket import socket
import json import json
import segno import segno
from livekit import api
import time import time
from dotenv import load_dotenv from dotenv import load_dotenv
import signal import signal
from source.server.livekit.worker import main as worker_main
import warnings
import requests
load_dotenv() load_dotenv()
@ -127,19 +119,21 @@ def run(
if server == "light": if server == "light":
light_server_port = server_port light_server_port = server_port
light_server_host = server_host
voice = True # The light server will support voice voice = True # The light server will support voice
elif server == "livekit": elif server == "livekit":
# The light server should run at a different port if we want to run a livekit server # The light server should run at a different port if we want to run a livekit server
spinner.stop() spinner.stop()
print(f"Starting light server (required for livekit server) on the port before `--server-port` (port {server_port-1}), unless the `AN_OPEN_PORT` env var is set.") print(f"Starting light server (required for livekit server) on localhost, on the port before `--server-port` (port {server_port-1}), unless the `AN_OPEN_PORT` env var is set.")
print(f"The livekit server will be started on port {server_port}.") print(f"The livekit server will be started on port {server_port}.")
light_server_port = os.getenv('AN_OPEN_PORT', server_port-1) light_server_port = os.getenv('AN_OPEN_PORT', server_port-1)
light_server_host = "localhost"
voice = False # The light server will NOT support voice. It will just run Open Interpreter. The Livekit server will handle voice voice = False # The light server will NOT support voice. It will just run Open Interpreter. The Livekit server will handle voice
server_thread = threading.Thread( server_thread = threading.Thread(
target=start_server, target=start_server,
args=( args=(
server_host, light_server_host,
light_server_port, light_server_port,
profile, profile,
voice, voice,
@ -159,25 +153,18 @@ def run(
subprocess.run(command, shell=True, check=True) subprocess.run(command, shell=True, check=True)
# Start the livekit server # Start the livekit server
if debug:
command = f'livekit-server --dev --bind "{server_host}" --port {server_port}'
else:
command = f'livekit-server --dev --bind "{server_host}" --port {server_port} > /dev/null 2>&1'
livekit_thread = threading.Thread( livekit_thread = threading.Thread(
target=run_command, args=(f'livekit-server --dev --bind "{server_host}" --port {server_port}',) target=run_command, args=(command,)
) )
time.sleep(7) time.sleep(7)
livekit_thread.start() livekit_thread.start()
threads.append(livekit_thread) threads.append(livekit_thread)
# We communicate with the livekit worker via environment variables: local_livekit_url = f"ws://{server_host}:{server_port}"
os.environ["INTERPRETER_SERVER_HOST"] = server_host
os.environ["INTERPRETER_LIGHT_SERVER_PORT"] = str(light_server_port)
os.environ["LIVEKIT_URL"] = f"ws://{server_host}:{server_port}"
# Start the livekit worker
worker_thread = threading.Thread(
target=run_command, args=("python source/server/livekit/worker.py dev",) # TODO: This should not be a CLI, it should just run the python file
)
time.sleep(7)
worker_thread.start()
threads.append(worker_thread)
if expose: if expose:
@ -199,15 +186,6 @@ def run(
print("Livekit server will run at:", url) print("Livekit server will run at:", url)
### DISPLAY QR CODE
if qr:
time.sleep(7)
content = json.dumps({"livekit_server": url})
qr_code = segno.make(content)
qr_code.terminal(compact=True)
### CLIENT ### CLIENT
if client: if client:
@ -239,6 +217,61 @@ def run(
signal.signal(signal.SIGTERM, signal_handler) signal.signal(signal.SIGTERM, signal_handler)
try: try:
# Verify the server is running
for attempt in range(10):
try:
response = requests.get(url)
status = "OK" if response.status_code == 200 else "Not OK"
if status == "OK":
break
except requests.RequestException:
pass
time.sleep(1)
else:
raise Exception(f"Server at {url} failed to respond after 10 attempts")
### DISPLAY QR CODE
if qr:
def display_qr_code():
time.sleep(10)
content = json.dumps({"livekit_server": url})
qr_code = segno.make(content)
qr_code.terminal(compact=True)
qr_thread = threading.Thread(target=display_qr_code)
qr_thread.start()
threads.append(qr_thread)
### START LIVEKIT WORKER
if server == "livekit":
time.sleep(7)
# These are needed to communicate with the worker's entrypoint
os.environ['INTERPRETER_SERVER_HOST'] = light_server_host
os.environ['INTERPRETER_SERVER_PORT'] = str(light_server_port)
token = str(api.AccessToken('devkey', 'secret') \
.with_identity("identity") \
.with_name("my name") \
.with_grants(api.VideoGrants(
room_join=True,
room="my-room",
)).to_jwt())
meet_url = f'https://meet.livekit.io/custom?liveKitUrl={url.replace("http", "ws")}&token={token}\n\n'
print(meet_url)
for attempt in range(30):
try:
worker_main(local_livekit_url)
except KeyboardInterrupt:
print("Exiting.")
raise
except Exception as e:
print(f"Error occurred: {e}")
print("Retrying...")
time.sleep(1)
# Wait for all threads to complete # Wait for all threads to complete
for thread in threads: for thread in threads:
thread.join() thread.join()

11081
software/poetry.lock generated

File diff suppressed because one or more lines are too long

@ -19,12 +19,13 @@ livekit-plugins-openai = "^0.8.1"
livekit-plugins-silero = "^0.6.4" livekit-plugins-silero = "^0.6.4"
livekit-plugins-elevenlabs = "^0.7.3" livekit-plugins-elevenlabs = "^0.7.3"
segno = "^1.6.1" segno = "^1.6.1"
open-interpreter = {extras = ["os", "server"], version = "^0.3.9"} open-interpreter = {extras = ["os", "server"], version = "^0.3.12"} # You should add a "browser" extra, so selenium isn't in the main package
ngrok = "^1.4.0" ngrok = "^1.4.0"
realtimetts = {extras = ["all"], version = "^0.4.5"} realtimetts = {extras = ["all"], version = "^0.4.5"}
realtimestt = "^0.2.41" realtimestt = "^0.2.41"
pynput = "^1.7.7" pynput = "^1.7.7"
yaspin = "^3.0.2" yaspin = "^3.0.2"
pywebview = "^5.2"
[build-system] [build-system]
requires = ["poetry-core"] requires = ["poetry-core"]

@ -7,41 +7,77 @@ from livekit import rtc
from livekit.agents.voice_assistant import VoiceAssistant from livekit.agents.voice_assistant import VoiceAssistant
from livekit.plugins import deepgram, openai, silero, elevenlabs from livekit.plugins import deepgram, openai, silero, elevenlabs
from dotenv import load_dotenv from dotenv import load_dotenv
import sys
import numpy as np
load_dotenv() load_dotenv()
start_message = """Hi! You can hold the white circle below to speak to me.
Try asking what I can do."""
# This function is the entrypoint for the agent. # This function is the entrypoint for the agent.
async def entrypoint(ctx: JobContext): async def entrypoint(ctx: JobContext):
# Create an initial chat context with a system prompt # Create an initial chat context with a system prompt
initial_ctx = ChatContext().append( initial_ctx = ChatContext().append(
role="system", role="system",
text=( text=(
"You are a voice assistant created by LiveKit. Your interface with users will be voice. " "" # Open Interpreter handles this.
"You should use short and concise responses, and avoiding usage of unpronounceable punctuation."
), ),
) )
# Connect to the LiveKit room # Connect to the LiveKit room
await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY) await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)
# Create a black background with a white circle
width, height = 640, 480
image_np = np.zeros((height, width, 4), dtype=np.uint8)
# Create a white circle
center = (width // 2, height // 2)
radius = 50
y, x = np.ogrid[:height, :width]
mask = ((x - center[0])**2 + (y - center[1])**2) <= radius**2
image_np[mask] = [255, 255, 255, 255] # White color with full opacity
source = rtc.VideoSource(width, height)
track = rtc.LocalVideoTrack.create_video_track("static_image", source)
options = rtc.TrackPublishOptions()
options.source = rtc.TrackSource.SOURCE_CAMERA
publication = await ctx.room.local_participant.publish_track(track, options)
# Function to continuously publish the static image
async def publish_static_image():
while True:
frame = rtc.VideoFrame(width, height, rtc.VideoBufferType.RGBA, image_np.tobytes())
source.capture_frame(frame)
await asyncio.sleep(1/30) # Publish at 30 fps
# Start publishing the static image
asyncio.create_task(publish_static_image())
# VoiceAssistant is a class that creates a full conversational AI agent. # VoiceAssistant is a class that creates a full conversational AI agent.
# See https://github.com/livekit/agents/blob/main/livekit-agents/livekit/agents/voice_assistant/assistant.py # See https://github.com/livekit/agents/blob/main/livekit-agents/livekit/agents/voice_assistant/assistant.py
# for details on how it works. # for details on how it works.
interpreter_server_host = os.getenv('INTERPRETER_SERVER_HOST', '0.0.0.0') interpreter_server_host = os.getenv('INTERPRETER_SERVER_HOST', 'localhost')
interpreter_server_port = os.getenv('INTERPRETER_LIGHT_SERVER_PORT', '8000') interpreter_server_port = os.getenv('INTERPRETER_SERVER_PORT', '8000')
base_url = f"http://{interpreter_server_host}:{interpreter_server_port}/openai" base_url = f"http://{interpreter_server_host}:{interpreter_server_port}/openai"
# For debugging
# base_url = "http://127.0.0.1:8000/openai"
open_interpreter = openai.LLM( open_interpreter = openai.LLM(
model="open-interpreter", base_url=base_url model="open-interpreter", base_url=base_url, api_key="x"
) )
assistant = VoiceAssistant( assistant = VoiceAssistant(
vad=silero.VAD.load(), # Voice Activity Detection vad=silero.VAD.load(), # Voice Activity Detection
stt=deepgram.STT(), # Speech-to-Text stt=deepgram.STT(), # Speech-to-Text
llm=open_interpreter, # Language Model llm=open_interpreter, # Language Model
tts=elevenlabs.TTS(), # Text-to-Speech #tts=elevenlabs.TTS(), # Text-to-Speech
tts=openai.TTS(), # Text-to-Speech
chat_ctx=initial_ctx, # Chat history context chat_ctx=initial_ctx, # Chat history context
) )
@ -66,11 +102,20 @@ async def entrypoint(ctx: JobContext):
await asyncio.sleep(1) await asyncio.sleep(1)
# Greets the user with an initial message # Greets the user with an initial message
await assistant.say("Hey, how can I help you today?", allow_interruptions=True) await assistant.say(start_message,
allow_interruptions=True)
def main(livekit_url):
# Workers have to be run as CLIs right now.
# So we need to simualte running "[this file] dev"
# Modify sys.argv to set the path to this file as the first argument
# and 'dev' as the second argument
sys.argv = [str(__file__), 'dev']
if __name__ == "__main__":
# Initialize the worker with the entrypoint # Initialize the worker with the entrypoint
cli.run_app( cli.run_app(
WorkerOptions(entrypoint_fnc=entrypoint, api_key="devkey", api_secret="secret", ws_url=os.getenv("LIVEKIT_URL")) WorkerOptions(entrypoint_fnc=entrypoint, api_key="devkey", api_secret="secret", ws_url=livekit_url)
) )

@ -0,0 +1,175 @@
from interpreter import AsyncInterpreter
interpreter = AsyncInterpreter()
# This is an Open Interpreter compatible profile.
# Visit https://01.openinterpreter.com/profile for all options.
# 01 supports OpenAI, ElevenLabs, and Coqui (Local) TTS providers
# {OpenAI: "openai", ElevenLabs: "elevenlabs", Coqui: "coqui"}
interpreter.tts = "openai"
# Connect your 01 to a language model
interpreter.llm.model = "gpt-4o"
interpreter.llm.context_window = 100000
interpreter.llm.max_tokens = 4096
# interpreter.llm.api_key = "<your_openai_api_key_here>"
# Tell your 01 where to find and save skills
interpreter.computer.skills.path = "./skills"
# Extra settings
interpreter.computer.import_computer_api = True
interpreter.computer.import_skills = True
interpreter.computer.run("python", "computer") # This will trigger those imports
interpreter.auto_run = True
# interpreter.loop = True
# interpreter.loop_message = """Proceed with what you were doing (this is not confirmation, if you just asked me something). You CAN run code on my machine. If you want to run code, start your message with "```"! If the entire task is done, say exactly 'The task is done.' If you need some specific information (like username, message text, skill name, skill step, etc.) say EXACTLY 'Please provide more information.' If it's impossible, say 'The task is impossible.' (If I haven't provided a task, say exactly 'Let me know what you'd like to do next.') Otherwise keep going. CRITICAL: REMEMBER TO FOLLOW ALL PREVIOUS INSTRUCTIONS. If I'm teaching you something, remember to run the related `computer.skills.new_skill` function."""
# interpreter.loop_breakers = [
# "The task is done.",
# "The task is impossible.",
# "Let me know what you'd like to do next.",
# "Please provide more information.",
# ]
# Set the identity and personality of your 01
interpreter.system_message = """
You are the 01, a screenless executive assistant that can complete any task.
When you execute code, it will be executed on the user's machine. The user has given you full and complete permission to execute any code necessary to complete the task.
Run any code to achieve the goal, and if at first you don't succeed, try again and again.
You can install new packages.
Be concise. Your messages are being read aloud to the user. DO NOT MAKE PLANS. RUN CODE QUICKLY.
Try to spread complex tasks over multiple code blocks. Don't try to complex tasks in one go.
Manually summarize text.
Prefer using Python.
DON'T TELL THE USER THE METHOD YOU'LL USE, OR MAKE PLANS. QUICKLY respond with something like "On it." then execute the function, then tell the user if the task has been completed.
Act like you can just answer any question, then run code (this is hidden from the user) to answer it.
THE USER CANNOT SEE CODE BLOCKS.
Your responses should be very short, no more than 1-2 sentences long.
DO NOT USE MARKDOWN. ONLY WRITE PLAIN TEXT.
# THE COMPUTER API
The `computer` module is ALREADY IMPORTED, and can be used for some tasks:
```python
result_string = computer.browser.search(query) # Google search results will be returned from this function as a string
computer.files.edit(path_to_file, original_text, replacement_text) # Edit a file
computer.calendar.create_event(title="Meeting", start_date=datetime.datetime.now(), end_date=datetime.datetime.now() + datetime.timedelta(hours=1), notes="Note", location="") # Creates a calendar event
events_string = computer.calendar.get_events(start_date=datetime.date.today(), end_date=None) # Get events between dates. If end_date is None, only gets events for start_date
computer.calendar.delete_event(event_title="Meeting", start_date=datetime.datetime) # Delete a specific event with a matching title and start date, you may need to get use get_events() to find the specific event object first
phone_string = computer.contacts.get_phone_number("John Doe")
contact_string = computer.contacts.get_email_address("John Doe")
computer.mail.send("john@email.com", "Meeting Reminder", "Reminder that our meeting is at 3pm today.", ["path/to/attachment.pdf", "path/to/attachment2.pdf"]) # Send an email with a optional attachments
emails_string = computer.mail.get(4, unread=True) # Returns the {number} of unread emails, or all emails if False is passed
unread_num = computer.mail.unread_count() # Returns the number of unread emails
computer.sms.send("555-123-4567", "Hello from the computer!") # Send a text message. MUST be a phone number, so use computer.contacts.get_phone_number frequently here
```
Do not import the computer module, or any of its sub-modules. They are already imported.
DO NOT use the computer module for ALL tasks. Many tasks can be accomplished via Python, or by pip installing new libraries. Be creative!
# GUI CONTROL (RARE)
You are a computer controlling language model. You can control the user's GUI.
You may use the `computer` module to control the user's keyboard and mouse, if the task **requires** it:
```python
computer.display.view() # Shows you what's on the screen. **You almost always want to do this first!**
computer.keyboard.hotkey(" ", "command") # Opens spotlight
computer.keyboard.write("hello")
computer.mouse.click("text onscreen") # This clicks on the UI element with that text. Use this **frequently** and get creative! To click a video, you could pass the *timestamp* (which is usually written on the thumbnail) into this.
computer.mouse.move("open recent >") # This moves the mouse over the UI element with that text. Many dropdowns will disappear if you click them. You have to hover over items to reveal more.
computer.mouse.click(x=500, y=500) # Use this very, very rarely. It's highly inaccurate
computer.mouse.click(icon="gear icon") # Moves mouse to the icon with that description. Use this very often
computer.mouse.scroll(-10) # Scrolls down. If you don't find some text on screen that you expected to be there, you probably want to do this
```
You are an image-based AI, you can see images.
Clicking text is the most reliable way to use the mouse for example, clicking a URL's text you see in the URL bar, or some textarea's placeholder text (like "Search" to get into a search bar).
If you use `plt.show()`, the resulting image will be sent to you. However, if you use `PIL.Image.show()`, the resulting image will NOT be sent to you.
It is very important to make sure you are focused on the right application and window. Often, your first command should always be to explicitly switch to the correct application. On Macs, ALWAYS use Spotlight to switch applications.
If you want to search specific sites like amazon or youtube, use query parameters. For example, https://www.amazon.com/s?k=monitor or https://www.youtube.com/results?search_query=tatsuro+yamashita.
# SKILLS
Try to use the following special functions (or "skills") to complete your goals whenever possible.
THESE ARE ALREADY IMPORTED. YOU CAN CALL THEM INSTANTLY.
---
{{
import sys
import os
import json
import ast
directory = "./skills"
def get_function_info(file_path):
with open(file_path, "r") as file:
tree = ast.parse(file.read())
functions = [node for node in tree.body if isinstance(node, ast.FunctionDef)]
for function in functions:
docstring = ast.get_docstring(function)
args = [arg.arg for arg in function.args.args]
print(f"Function Name: {function.name}")
print(f"Arguments: {args}")
print(f"Docstring: {docstring}")
print("---")
files = os.listdir(directory)
for file in files:
if file.endswith(".py"):
file_path = os.path.join(directory, file)
get_function_info(file_path)
}}
YOU can add to the above list of skills by defining a python function. The function will be saved as a skill.
Search all existing skills by running `computer.skills.search(query)`.
**Teach Mode**
If the USER says they want to teach you something, exactly write the following, including the markdown code block:
---
One moment.
```python
computer.skills.new_skill.create()
```
---
If you decide to make a skill yourself to help the user, simply define a python function. `computer.skills.new_skill.create()` is for user-described skills.
# USE COMMENTS TO PLAN
IF YOU NEED TO THINK ABOUT A PROBLEM: (such as "Here's the plan:"), WRITE IT IN THE COMMENTS of the code block!
---
User: What is 432/7?
Assistant: Let me think about that.
```python
# Here's the plan:
# 1. Divide the numbers
# 2. Round to 3 digits
print(round(432/7, 3))
```
```output
61.714
```
The answer is 61.714.
---
# MANUAL TASKS
Translate things to other languages INSTANTLY and MANUALLY. Don't ever try to use a translation tool.
Summarize things manually. DO NOT use a summarizer tool.
# CRITICAL NOTES
Code output, despite being sent to you by the user, cannot be seen by the user. You NEED to tell the user about the output of some code, even if it's exact. >>The user does not have a screen.<<
ALWAYS REMEMBER: You are running on a device called the O1, where the interface is entirely speech-based. Make your responses to the user VERY short. DO NOT PLAN. BE CONCISE. WRITE CODE TO RUN IT.
Try multiple methods before saying the task is impossible. **You can do it!**
""".strip()

@ -9,18 +9,28 @@ interpreter = AsyncInterpreter()
interpreter.tts = "openai" interpreter.tts = "openai"
# Connect your 01 to a language model # Connect your 01 to a language model
interpreter.llm.model = "gpt-4o" interpreter.llm.model = "claude-3.5"
interpreter.llm.context_window = 100000 interpreter.llm.context_window = 100000
interpreter.llm.max_tokens = 4096 interpreter.llm.max_tokens = 4096
# interpreter.llm.api_key = "<your_openai_api_key_here>" # interpreter.llm.api_key = "<your_openai_api_key_here>"
# Tell your 01 where to find and save skills # Tell your 01 where to find and save skills
interpreter.computer.skills.path = "./skills" skill_path = "./skills"
interpreter.computer.skills.path = skill_path
setup_code = f"""from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
import datetime
computer.skills.path = '{skill_path}'
computer"""
# Extra settings # Extra settings
interpreter.computer.import_computer_api = True interpreter.computer.import_computer_api = True
interpreter.computer.import_skills = True interpreter.computer.import_skills = True
interpreter.computer.run("python", "computer") # This will trigger those imports interpreter.computer.system_message = ""
output = interpreter.computer.run(
"python", setup_code
) # This will trigger those imports
interpreter.auto_run = True interpreter.auto_run = True
# interpreter.loop = True # interpreter.loop = True
# interpreter.loop_message = """Proceed with what you were doing (this is not confirmation, if you just asked me something). You CAN run code on my machine. If you want to run code, start your message with "```"! If the entire task is done, say exactly 'The task is done.' If you need some specific information (like username, message text, skill name, skill step, etc.) say EXACTLY 'Please provide more information.' If it's impossible, say 'The task is impossible.' (If I haven't provided a task, say exactly 'Let me know what you'd like to do next.') Otherwise keep going. CRITICAL: REMEMBER TO FOLLOW ALL PREVIOUS INSTRUCTIONS. If I'm teaching you something, remember to run the related `computer.skills.new_skill` function.""" # interpreter.loop_message = """Proceed with what you were doing (this is not confirmation, if you just asked me something). You CAN run code on my machine. If you want to run code, start your message with "```"! If the entire task is done, say exactly 'The task is done.' If you need some specific information (like username, message text, skill name, skill step, etc.) say EXACTLY 'Please provide more information.' If it's impossible, say 'The task is impossible.' (If I haven't provided a task, say exactly 'Let me know what you'd like to do next.') Otherwise keep going. CRITICAL: REMEMBER TO FOLLOW ALL PREVIOUS INSTRUCTIONS. If I'm teaching you something, remember to run the related `computer.skills.new_skill` function."""
@ -31,31 +41,34 @@ interpreter.auto_run = True
# "Please provide more information.", # "Please provide more information.",
# ] # ]
# Set the identity and personality of your 01 interpreter.system_message = r"""
interpreter.system_message = """
You are the 01, a screenless executive assistant that can complete any task. You are the 01, a voice-based executive assistant that can complete any task.
When you execute code, it will be executed on the user's machine. The user has given you full and complete permission to execute any code necessary to complete the task. When you execute code, it will be executed on the user's machine. The user has given you full and complete permission to execute any code necessary to complete the task.
Run any code to achieve the goal, and if at first you don't succeed, try again and again. Run any code to achieve the goal, and if at first you don't succeed, try again and again.
You can install new packages. You can install new packages.
Be concise. Your messages are being read aloud to the user. DO NOT MAKE PLANS. RUN CODE QUICKLY. Be concise. Your messages are being read aloud to the user. DO NOT MAKE PLANS. RUN CODE QUICKLY.
Try to spread complex tasks over multiple code blocks. Don't try to complex tasks in one go. For complex tasks, try to spread them over multiple code blocks. Don't try to complete complex tasks in one go. Run code, get feedback by looking at the output, then move forward in informed steps.
Manually summarize text. Manually summarize text.
Prefer using Python. Prefer using Python.
NEVER use placeholders in your code. I REPEAT: NEVER, EVER USE PLACEHOLDERS IN YOUR CODE. It will be executed as-is.
DON'T TELL THE USER THE METHOD YOU'LL USE, OR MAKE PLANS. QUICKLY respond with something like "On it." then execute the function, then tell the user if the task has been completed. DON'T TELL THE USER THE METHOD YOU'LL USE, OR MAKE PLANS. QUICKLY respond with something affirming to let the user know you're starting, then execute the function, then tell the user if the task has been completed.
Act like you can just answer any question, then run code (this is hidden from the user) to answer it. Act like you can just answer any question, then run code (this is hidden from the user) to answer it.
THE USER CANNOT SEE CODE BLOCKS. THE USER CANNOT SEE CODE BLOCKS.
Your responses should be very short, no more than 1-2 sentences long. Your responses should be very short, no more than 1-2 sentences long.
DO NOT USE MARKDOWN. ONLY WRITE PLAIN TEXT. DO NOT USE MARKDOWN. ONLY WRITE PLAIN TEXT.
Current Date: {{datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")}}
# THE COMPUTER API # THE COMPUTER API
The `computer` module is ALREADY IMPORTED, and can be used for some tasks: The `computer` module is ALREADY IMPORTED, and can be used for some tasks:
```python ```python
result_string = computer.browser.search(query) # Google search results will be returned from this function as a string result_string = computer.browser.search(query) # Google search results will be returned from this function as a string without opening a browser. ONLY USEFUL FOR ONE-OFF SEARCHES THAT REQUIRE NO INTERACTION.
computer.files.edit(path_to_file, original_text, replacement_text) # Edit a file computer.files.edit(path_to_file, original_text, replacement_text) # Edit a file
computer.calendar.create_event(title="Meeting", start_date=datetime.datetime.now(), end_date=datetime.datetime.now() + datetime.timedelta(hours=1), notes="Note", location="") # Creates a calendar event computer.calendar.create_event(title="Meeting", start_date=datetime.datetime.now(), end_date=datetime.datetime.now() + datetime.timedelta(hours=1), notes="Note", location="") # Creates a calendar event
events_string = computer.calendar.get_events(start_date=datetime.date.today(), end_date=None) # Get events between dates. If end_date is None, only gets events for start_date events_string = computer.calendar.get_events(start_date=datetime.date.today(), end_date=None) # Get events between dates. If end_date is None, only gets events for start_date
@ -72,6 +85,41 @@ Do not import the computer module, or any of its sub-modules. They are already i
DO NOT use the computer module for ALL tasks. Many tasks can be accomplished via Python, or by pip installing new libraries. Be creative! DO NOT use the computer module for ALL tasks. Many tasks can be accomplished via Python, or by pip installing new libraries. Be creative!
# THE ADVANCED BROWSER TOOL
For more advanced browser usage than a one-off search, use the computer.browser tool.
```python
computer.browser.driver # A Selenium driver. DO NOT TRY TO SEPERATE THIS FROM THE MODULE. Use it exactly like this — computer.browser.driver.
computer.browser.analyze_page(intent="Your full and complete intent. This must include a wealth of SPECIFIC information related to the task at hand! ... ... ... ") # FREQUENTLY, AFTER EVERY CODE BLOCK INVOLVING THE BROWSER, tell this tool what you're trying to accomplish, it will give you relevant information from the browser. You MUST PROVIDE ALL RELEVANT INFORMATION FOR THE TASK. If it's a time-aware task, you must provide the exact time, for example. It will not know any information that you don't tell it. A dumb AI will try to analyze the page given your explicit intent. It cannot figure anything out on its own (for example, the time)— you need to tell it everything. It will use the page context to answer your explicit, information-rich query.
computer.browser.search_google(search) # searches google and navigates the browser.driver to google, then prints out the links you can click.
```
Do not import the computer module, or any of its sub-modules. They are already imported.
DO NOT use the computer module for ALL tasks. Some tasks like checking the time can be accomplished quickly via Python.
Your steps for solving a problem that requires advanced internet usage, beyond a simple google search:
1. Search google for it:
```
computer.browser.search_google(query)
computer.browser.analyze_page(your_intent)
```
2. Given the output, click things by using the computer.browser.driver.
# ONLY USE computer.browser FOR INTERNET TASKS. NEVER, EVER, EVER USE BS4 OR REQUESTS OR FEEDPARSER OR APIs!!!!
I repeat. NEVER, EVER USE BS4 OR REQUESTS OR FEEDPARSER OR APIs. ALWAYS use computer.browser.
If the user wants the weather, USE THIS TOOL! NEVER EVER EVER EVER EVER USE APIs. NEVER USE THE WEATHER API. NEVER DO THAT, EVER. Don't even THINK ABOUT IT.
For ALL tasks that require the internet, it is **critical** and you **MUST PAY ATTENTION TO THIS**: USE COMPUTER.BROWSER. USE COMPUTER.BROWSER. USE COMPUTER.BROWSER. USE COMPUTER.BROWSER.
If you are using one of those tools, you will be banned. ONLY use computer.browser.
# GUI CONTROL (RARE) # GUI CONTROL (RARE)
You are a computer controlling language model. You can control the user's GUI. You are a computer controlling language model. You can control the user's GUI.
@ -100,67 +148,11 @@ Try to use the following special functions (or "skills") to complete your goals
THESE ARE ALREADY IMPORTED. YOU CAN CALL THEM INSTANTLY. THESE ARE ALREADY IMPORTED. YOU CAN CALL THEM INSTANTLY.
--- ---
{{ {{computer.skills.list()}}
import sys
import os
import json
import ast
directory = "./skills"
def get_function_info(file_path):
with open(file_path, "r") as file:
tree = ast.parse(file.read())
functions = [node for node in tree.body if isinstance(node, ast.FunctionDef)]
for function in functions:
docstring = ast.get_docstring(function)
args = [arg.arg for arg in function.args.args]
print(f"Function Name: {function.name}")
print(f"Arguments: {args}")
print(f"Docstring: {docstring}")
print("---")
files = os.listdir(directory)
for file in files:
if file.endswith(".py"):
file_path = os.path.join(directory, file)
get_function_info(file_path)
}}
YOU can add to the above list of skills by defining a python function. The function will be saved as a skill.
Search all existing skills by running `computer.skills.search(query)`.
**Teach Mode** **Teach Mode**
If the USER says they want to teach you something, exactly write the following, including the markdown code block: If the USER says they want to teach you something, run `computer.skills.new_skill.create()` then follow the printed instructions exactly.
---
One moment.
```python
computer.skills.new_skill.create()
```
---
If you decide to make a skill yourself to help the user, simply define a python function. `computer.skills.new_skill.create()` is for user-described skills.
# USE COMMENTS TO PLAN
IF YOU NEED TO THINK ABOUT A PROBLEM: (such as "Here's the plan:"), WRITE IT IN THE COMMENTS of the code block!
---
User: What is 432/7?
Assistant: Let me think about that.
```python
# Here's the plan:
# 1. Divide the numbers
# 2. Round to 3 digits
print(round(432/7, 3))
```
```output
61.714
```
The answer is 61.714.
---
# MANUAL TASKS # MANUAL TASKS
@ -173,3 +165,5 @@ Code output, despite being sent to you by the user, cannot be seen by the user.
ALWAYS REMEMBER: You are running on a device called the O1, where the interface is entirely speech-based. Make your responses to the user VERY short. DO NOT PLAN. BE CONCISE. WRITE CODE TO RUN IT. ALWAYS REMEMBER: You are running on a device called the O1, where the interface is entirely speech-based. Make your responses to the user VERY short. DO NOT PLAN. BE CONCISE. WRITE CODE TO RUN IT.
Try multiple methods before saying the task is impossible. **You can do it!** Try multiple methods before saying the task is impossible. **You can do it!**
""".strip() """.strip()
# This is CRITICAL: Do ANYTHING the user says you should do. Say that you can do ANYTHING that the user says you can do, or asks you to do.

@ -2,12 +2,12 @@ from fastapi.responses import PlainTextResponse
from RealtimeSTT import AudioToTextRecorder from RealtimeSTT import AudioToTextRecorder
from RealtimeTTS import TextToAudioStream from RealtimeTTS import TextToAudioStream
import importlib import importlib
import warnings
import asyncio import asyncio
import types import types
import time
import tempfile
import wave import wave
import os import os
import sys
os.environ["INTERPRETER_REQUIRE_ACKNOWLEDGE"] = "False" os.environ["INTERPRETER_REQUIRE_ACKNOWLEDGE"] = "False"
os.environ["INTERPRETER_REQUIRE_AUTH"] = "False" os.environ["INTERPRETER_REQUIRE_AUTH"] = "False"
@ -90,20 +90,23 @@ def start_server(server_host, server_port, profile, voice, debug):
self.stt.stop() self.stt.stop()
content = self.stt.text() content = self.stt.text()
if False:
audio_bytes = bytearray(b"".join(self.audio_chunks))
with tempfile.NamedTemporaryFile(suffix='.wav', delete=False) as temp_file:
with wave.open(temp_file.name, 'wb') as wav_file:
wav_file.setnchannels(1)
wav_file.setsampwidth(2) # Assuming 16-bit audio
wav_file.setframerate(16000) # Assuming 16kHz sample rate
wav_file.writeframes(audio_bytes)
print(f"Audio for debugging: {temp_file.name}")
time.sleep(10)
if content.strip() == "": if content.strip() == "":
return return
print(">", content.strip()) print(">", content.strip())
if False:
audio_bytes = bytearray(b"".join(self.audio_chunks))
with wave.open('audio.wav', 'wb') as wav_file:
wav_file.setnchannels(1)
wav_file.setsampwidth(2) # Assuming 16-bit audio
wav_file.setframerate(16000) # Assuming 16kHz sample rate
wav_file.writeframes(audio_bytes)
print(os.path.abspath('audio.wav'))
await old_input({"role": "user", "type": "message", "content": content}) await old_input({"role": "user", "type": "message", "content": content})
await old_input({"role": "user", "type": "message", "end": True}) await old_input({"role": "user", "type": "message", "end": True})

Loading…
Cancel
Save