https://huggingface.co/Menlo/ReZero-v0.1-llama-3.2-3b-it-grpo-250404

Go to file

thinhlpg 04d56325bb feat: add new reward functions, add less dumb data generation logic, implement better logging		8 months ago
data	feat: add initial project structure and core functionality	8 months ago
docs	chore: update worklog 250324	8 months ago
notebooks	feat: add new reward functions, add less dumb data generation logic, implement better logging	8 months ago
.env.example	feat: add initial project structure and core functionality	8 months ago
.gitignore	chore: update worklog 250324	8 months ago
README.md	feat: add initial project structure and core functionality	8 months ago
UnslothGRPOTrainerTemp.py	feat: add initial project structure and core functionality	8 months ago
embeddings.py	feat: add initial project structure and core functionality	8 months ago
generate_data.py	chore: update worklog 250324	8 months ago
generate_data_but_less_dumb.py	feat: add new reward functions, add less dumb data generation logic, implement better logging	8 months ago
requirements.txt	chore: update worklog 250324	8 months ago
rl_helpers.py	feat: add new reward functions, add less dumb data generation logic, implement better logging	8 months ago
search_module.py	feat: add initial project structure and core functionality	8 months ago
simple_qa.py	feat: add initial project structure and core functionality	8 months ago
train_autodidact.py	chore: update worklog 250324	8 months ago
train_autodidact_1B.py	chore: update worklog 250324	8 months ago

README.md

Unescape Escape

DeepSearch - A Hard Working Search Engine 🔍

DeepSearch trains a small language model to develop effective search behaviors instead of memorizing static data. It interacts with multiple synthetic search engines, each with unique retrieval mechanisms, to refine queries and persist in searching until it finds exact answers. The project focuses on reinforcement learning, preventing overfitting, and optimizing for efficiency in real-world search applications.

Setup

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Models

You can find our models on Hugging Face 🤗! We're committed to open-source and easy access for the research community.

Model	Backbone	Size	Link
-	-	-	-

Datasets

We've released our datasets on Hugging Face 🤗 to support reproducibility and further research.

Dataset	Description	Size	Link
-	-	-	-
-	-	-	-
-	-	-	-

References

This project is kickstarted from AutoDidact

Personal Notes

This is research code, so I'm prioritizing speed over code quality for now. Expect things to be messy—both the code and commit history. Roasting is welcome, but don't judge me too hard; I'll clean it up later. I don’t know what I don’t know, but I’m eager (and desperate) to learn and improve, so any constructive feedback is highly appreciated! 💖

README.md Unescape Escape