https://huggingface.co/Menlo/ReZero-v0.1-llama-3.2-3b-it-grpo-250404

Go to file

thinhlpg 14ef79a4f5 feat: [WIP] add bench scripts		3 months ago
data	chore: update .gitignore and add new toys data files	3 months ago
docs	docs: add experiment log for llama-3.2-3b-instruct experiments	3 months ago
notebooks	chores: add cook notebooks	3 months ago
scripts	feat: [WIP] add bench scripts	3 months ago
src	refactor: moved modules from src/deepsearch to src/	3 months ago
tests	refactor: moved modules from src/deepsearch to src/	3 months ago
third_party	refactor: moved FlashRAG submodule from src/ to third_party/	3 months ago
.env.example	refactor: restructure code base, better centralize logging logic	3 months ago
.gitignore	feat: [WIP] add bench scripts	3 months ago
.gitmodules	refactor: moved FlashRAG submodule from src/ to third_party/	3 months ago
Makefile	chore: update .gitignore, modify Makefile for installation, and add pyproject.toml for project configuration	3 months ago
README.md	Update README.md	3 months ago
app.py	feat: add Tavily search tab and integrate TavilyClient for web search functionality	3 months ago
config.py	feat: add Gradio demo for DeepSearch and update configuration settings	3 months ago
eval.py	refactor: change repo stucture (move code from src/ to src/deepsearch)	3 months ago
inference.py	refactor: moved modules from src/deepsearch to src/	3 months ago
pyproject.toml	feat: [WIP] add bench scripts	3 months ago
train.sh	refactor: restructure code base, better centralize logging logic	3 months ago
train_grpo.py	refactor: moved modules from src/deepsearch to src/	3 months ago

README.md

DeepSearch - A Hard Working Search Engine 🔍

DeepSearch trains a small language model to develop effective search behaviors instead of memorizing static data. It interacts with multiple synthetic search engines, each with unique retrieval mechanisms, to refine queries and persist in searching until it finds exact answers. The project focuses on reinforcement learning, preventing overfitting, and optimizing for efficiency in real-world search applications.

Setup

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Evaluation

Compare base model with LoRA-enhanced model performance:

# Quick run with defaults
./eval.sh

# Custom run
./eval.sh --lora_path "/path/to/lora" --temperature 0.7

Direct Python usage:

python eval.py --lora_path "/path/to/lora" --temperature 0.7

The tool generates a results file with accuracy metrics and improvement statistics.

Models

You can find our models on Hugging Face 🤗! We're committed to open-source and easy access for the research community.

Model	Backbone	Size	Link
-	-	-	-

Datasets

We've released our datasets on Hugging Face 🤗 to support reproducibility and further research.

Dataset	Description	Size	Link
-	-	-	-
-	-	-	-
-	-	-	-

References

This project is kickstarted from AutoDidact

Personal Notes

This is research code, so I'm prioritizing speed over code quality for now. Expect things to be messy—both the code and commit history. Roasting is welcome, but don't judge me too hard; I'll clean it up later. I don't know what I don't know, but I'm eager (and desperate) to learn and improve, so any constructive feedback is highly appreciated! 💖