https://huggingface.co/Menlo/ReZero-v0.1-llama-3.2-3b-it-grpo-250404

Go to file

thinhlpg dfa420fa49 feat: expand Makefile with serving and evaluation commands		6 months ago
data	chore: update .gitignore and add new toys data files	7 months ago
notebooks	feat: add scripts for musique data processing	6 months ago
scripts	feat: update evaluation scripts to enhance model configuration and dataset loading, including increased max tokens and added logging	6 months ago
src	feat: update evaluation scripts to enhance model configuration and dataset loading, including increased max tokens and added logging	6 months ago
tests	feat: update reward_em_chunk to match only the LAST required paragraph of the reasoning chain and adjust related tests	6 months ago
third_party	refactor: moved FlashRAG submodule from src/ to third_party/	7 months ago
.env.example	refactor: restructure code base, better centralize logging logic	7 months ago
.gitignore	feat: [WIP] add bench scripts	6 months ago
.gitmodules	refactor: moved FlashRAG submodule from src/ to third_party/	7 months ago
Makefile	feat: expand Makefile with serving and evaluation commands	6 months ago
README.md	docs: update README with setup instructions, quick demo, and data preparation steps for better clarity and usability	6 months ago
app.py	feat: add Tavily search tab and integrate TavilyClient for web search functionality	6 months ago
config.py	feat: update model configuration (longer context) and dataset loading logic for improved performance and flexibility	6 months ago
pyproject.toml	feat: add scripts for musique data processing	6 months ago
train_grpo.py	feat: update model configuration (longer context) and dataset loading logic for improved performance and flexibility	6 months ago

README.md

DeepSearch - A Hard Working Search Engine 🔍

DeepSearch trains a small language model to develop effective search behaviors instead of memorizing static data. It interacts with multiple synthetic search engines, each with unique retrieval mechanisms, to refine queries and persist in searching until it finds exact answers. The project focuses on reinforcement learning, preventing overfitting, and optimizing for efficiency in real-world search applications.

Quick Demo 🚀

Run the interactive web interface to see DeepSearch in action:

python app.py

This will launch a Gradio interface where you can interact with the model and test different search behaviors.

You can also evaluate model performance:

# Using the evaluation scripts
python scripts/eval_lora.py --lora_path "/path/to/lora"
python scripts/eval_base.py

Setup 🛠️

Clone the repository with submodules:

git clone --recurse-submodules [repository-url]
cd DeepSearch

Set up your environment variables:

cp .env.example .env
# Edit .env to add your HuggingFace token and OpenRouter API key

Install dependencies using the development setup:

make install

This installs the project in editable mode along with all dependencies specified in pyproject.toml, including:

transformers
unsloth
gradio
langchain
and other required packages

Data Preparation 📊

DeepSearch uses the Musique dataset for training and evaluation.

Download and prepare all data in one step

make prepare-all-musique

Step-by-step data preparation

Download the Musique dataset:
```
make download-musique
```
Prepare the JSONL files for training:
```
make prepare-musique-jsonl
```
Extract paragraphs for indexing:
```
make extract-musique-paragraphs
```
Build the FAISS index:
```
make build-musique-index
```
Prepare development data:
```
make prepare-dev-data
```
Validate data preparation:
```
make check-data
```

Training 🧠

Train the model using the GRPO (General Reinforcement Learning from Outer Preferences) approach:

python train_grpo.py

You can monitor training progress with TensorBoard:

make tensorboard

List available training runs:

make list-runs

Development 💻

Run tests

make test

Code quality and style

# Format code
make style

# Check code quality
make quality

# Auto-fix issues
make fix

Clean up

make clean

Models 🤖

You can find our models on Hugging Face 🤗! We're committed to open-source and easy access for the research community.

Model	Backbone	Size	Link
-	-	-	-

Datasets 📚

We've released our datasets on Hugging Face 🤗 to support reproducibility and further research.

Dataset	Description	Size	Link
-	-	-	-
-	-	-	-
-	-	-	-

References 📖

This project is kickstarted from AutoDidact