You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
thinhlpg a58722e16f
feat: add initial project structure and core functionality
2 months ago
data feat: add initial project structure and core functionality 2 months ago
docs feat: add initial project structure and core functionality 2 months ago
notebooks feat: add initial project structure and core functionality 2 months ago
.env.example feat: add initial project structure and core functionality 2 months ago
.gitignore feat: add initial project structure and core functionality 2 months ago
README.md feat: add initial project structure and core functionality 2 months ago
UnslothGRPOTrainerTemp.py feat: add initial project structure and core functionality 2 months ago
embeddings.py feat: add initial project structure and core functionality 2 months ago
generate_data.py feat: add initial project structure and core functionality 2 months ago
requirements.txt feat: add initial project structure and core functionality 2 months ago
rl_helpers.py feat: add initial project structure and core functionality 2 months ago
search_module.py feat: add initial project structure and core functionality 2 months ago
simple_qa.py feat: add initial project structure and core functionality 2 months ago
train_autodidact.py feat: add initial project structure and core functionality 2 months ago

README.md

DeepSearch - A Hard Working Search Engine 🔍

DeepSearch trains a small language model to develop effective search behaviors instead of memorizing static data. It interacts with multiple synthetic search engines, each with unique retrieval mechanisms, to refine queries and persist in searching until it finds exact answers. The project focuses on reinforcement learning, preventing overfitting, and optimizing for efficiency in real-world search applications.

Project Whiteboard

Setup

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Models

You can find our models on Hugging Face 🤗! We're committed to open-source and easy access for the research community.

Model Backbone Size Link
- - - -

Datasets

We've released our datasets on Hugging Face 🤗 to support reproducibility and further research.

Dataset Description Size Link
- - - -
- - - -
- - - -

References

Personal Notes

  • This is research code, so I'm prioritizing speed over code quality for now. Expect things to be messy—both the code and commit history. Roasting is welcome, but don't judge me too hard; I'll clean it up later. I dont know what I dont know, but Im eager (and desperate) to learn and improve, so any constructive feedback is highly appreciated! 💖