Explore Help

darius-atlas

/

ReZero-Search-LLM-Agent-Fork

1

0

You've already forked ReZero-Search-LLM-Agent-Fork

Code Issues Pull Requests Packages Projects Releases Wiki Activity

You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

main

${ item.name }

Create tag ${ searchTerm }

Create branch ${ searchTerm }

from '9009440663'

${ noResults }

ReZero-Search-LLM-Agent-Fork/docs/00_worklog.md

4.4 KiB

Raw Blame History

Worklog

Backlog

@thinhlpg transfers the project to @bachvudinh
Modify generate_dataset.py (ONLY AFTER the simple training and benchmark works):
- Optimize speed (different LLM models, api, tools, etc.)
- Optimize quality. As a data dataset maker, I want to change from LLama 3.1 8B to API call, like claude, gemini or openai. Originally they use 3.1 8B for Self-Bootstrapping demonstration, but the dataset quality is low, for sure.
- Experimenting with different chunking strategies
search-backends.md design (for more dataset noise (ONLY AFTER the simple training dataset works))
Train SFT first stage, then GRPO (new idea from @tikikun 250326)
- I think this idea is already implemented in search-r1 repo, i'll double check it later.
Implement quality of life scripts from brain-rotting-multiple-gpu-workflow-for-dummies.md
Better verification logic please (should be a fixed for every experiments, not the base model it self)

yymmdd

task description

250329

brain.exe and back.exe refused to work

250328

Watch solo leveling with bro @tikikun 🔥
Figuring out how to keep multiple experiments organized. the repos in the server are a mess 💀💀 (but at least they worked for now)

250328 - ❗❗❗D-Day❗❗❗

Show the results, or demo

250327

CLEAN THE REPO PLEASE IT'S A MESS 😭😭😭
- Double checked all script, runned well :3
Write script to train x-deepseek-r1-distil models (original script only support Llama -instruct models)
Script to continue training from last checkpoint
Make a simple demo app (or just cli inference script should be good)
Upload datasets to HF Hub
Research a little bit on Agentic Reward Modeling (for designing better reward function maybe?) agentic-reward-modeling.md

250326

Fix exact match reward function bug
Enhance the training script with better logging and monitoring
Train new models
Write new eval script

250325

Read Search-R1 to get more ideas on how to improve the reward functions (pretty similar idea i suppose)
update new reward functions in reward-functions.md
Train the model v0 (with new data and reward functions) (might be another 2 hours)
- spoiler: it's not good

250324

Make the dataset v0
Train with new data and default reward functions (it took 2 hours on 1xA6000 😭)
- Got poor result (50% Accuracy down to 35%) 📉

250323

brain.exe and back.exe refused to work 😭

250322

Moving all the scattered and disorganized stuffs that've been working on for the past week into this repo.
Write proposal for DeepSearch
- evaluation.md design (list out the metrics and why)
- dataset.md design (pipeline, data structure,...)
- reward-functions.md design (list out the functions and why)
As a new member of research team, i'm curious on how did we do GRPO with Alphamaze?, so that I can inherit the good stuff and improve the workflow!!!
- Alphamaze?
- https://www.menlo.ai/blog/alpha-maze
- https://arxiv.org/pdf/2502.14669
- Our training process involved two key stages: creating a specialized dataset and then using a combination of supervised fine-tuning (SFT) and reinforcement learning (RL) to train the model.
- LLaMA-Factory for SFT (1.5B 6xA6000 1.5 hour) and Unsloth for GRPO
- 💡 Hmm so for SFT we have 50% successful data and 50% retry data, and full successful data for GRPO. Can I also apply this to DeepSearch as well? #HACK

250321

Inspect the code of AutoDidact in a more detailed way https://github.com/menloresearch/DeepSearch/issues/4

250320

Research on GRPO https://github.com/menloresearch/DeepSearch/issues/2

250319

Research on GRPO https://github.com/menloresearch/DeepSearch/issues/2
Run the training script of AutoDidact

250318

Idea received https://github.com/menloresearch/DeepSearch/issues/1

Graveyard 💀

~~Convert this notebook to script 250324_generate_data_anatomy.ipynb~~ (no need, already have a script for that)

Powered by Gitea Version: 1.18.1 Page: 24ms Template: 1ms

English

Bahasa Indonesia Deutsch English Español Français Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語简体中文繁體中文（台灣）繁體中文（香港） 한국어

Licenses API