22 Commits (1e7514f98e707629c460ae28249ee09b7b3cb039)

Author SHA1 Message Date
thinhlpg 504f0c6c8e feat: update reward_em_chunk to match only the LAST required paragraph of the reasoning chain and adjust related tests
4 weeks ago
thinhlpg 358875a035 feat: enhance reward_em_chunk function to match multiple paragraphs, add test
4 weeks ago
thinhlpg 2df9f39fda feat: update model configuration (longer context) and dataset loading logic for improved performance and flexibility
4 weeks ago
thinhlpg 4a1d45271d feat: add scripts for musique data processing
4 weeks ago
thinhlpg eebf914a81 refactor: moved modules from src/deepsearch to src/
1 month ago
thinhlpg 0f662d4330 refactor: moved FlashRAG submodule from src/ to third_party/
1 month ago
thinhlpg 55f34b8503 feat: add FlashRAG as submodule
1 month ago
thinhlpg 2fec4f2f42 refactor: change repo stucture (move code from src/ to src/deepsearch)
1 month ago
thinhlpg 010957cd99 feat: disable randomization option to get_qa_dataset function by default
1 month ago
thinhlpg 1a18cd7bfd feat: update training and evaluation configurations (editable agent generation scripts)
1 month ago
thinhlpg c8714e0f6b feat: enhance reward_retry function to handle missing answer tags
1 month ago
thinhlpg 4de31e0f30 feat: expand reward functions with new strategies and diversity checks
1 month ago
thinhlpg d0e6068055 fix: strengthen reward correctness logic to handle final message is not asnwer form assistant. Also update logs for reward functions for better debug
1 month ago
thinhlpg 338655e563 feat: refine user prompt logic for improved clarity and structure
1 month ago
thinhlpg 6d994feeb2 feat: enhance evaluation scripts for base and LoRA models
1 month ago
thinhlpg af7f38c792 feat: add code for qwen architecture
1 month ago
thinhlpg 9009440663 chore: disable logging, enable torch complie
1 month ago
thinhlpg d2f03b96ab feat: enhance evaluation script and remove deprecated shell script
1 month ago
thinhlpg 31dcbf5d8a feat: refactor whole code base, add logic for training R1 distil base models, change some template and reward logics
1 month ago
thinhlpg c90c03267e feat: change user prompt template to search-r1 inspried format
1 month ago
thinhlpg 04593fa8fd style: change line length to 119, organize imports
1 month ago
thinhlpg 3c2deaced9 refactor: restructure code base, better centralize logging logic
1 month ago