24 Commits (9738b80353c9be77e4b0ac2163274d29354dfd2a)

Author SHA1 Message Date
thinhlpg bec864038b feat: increase max tokens and new tokens in evaluation scripts
6 months ago
thinhlpg 424459d840 feat: update evaluation scripts to enhance model configuration and dataset loading, including increased max tokens and added logging
6 months ago
thinhlpg 504f0c6c8e feat: update reward_em_chunk to match only the LAST required paragraph of the reasoning chain and adjust related tests
6 months ago
thinhlpg 358875a035 feat: enhance reward_em_chunk function to match multiple paragraphs, add test
6 months ago
thinhlpg 2df9f39fda feat: update model configuration (longer context) and dataset loading logic for improved performance and flexibility
6 months ago
thinhlpg 4a1d45271d feat: add scripts for musique data processing
6 months ago
thinhlpg eebf914a81 refactor: moved modules from src/deepsearch to src/
6 months ago
thinhlpg 0f662d4330 refactor: moved FlashRAG submodule from src/ to third_party/
6 months ago
thinhlpg 55f34b8503 feat: add FlashRAG as submodule
6 months ago
thinhlpg 2fec4f2f42 refactor: change repo stucture (move code from src/ to src/deepsearch)
6 months ago
thinhlpg 010957cd99 feat: disable randomization option to get_qa_dataset function by default
6 months ago
thinhlpg 1a18cd7bfd feat: update training and evaluation configurations (editable agent generation scripts)
6 months ago
thinhlpg c8714e0f6b feat: enhance reward_retry function to handle missing answer tags
6 months ago
thinhlpg 4de31e0f30 feat: expand reward functions with new strategies and diversity checks
6 months ago
thinhlpg d0e6068055 fix: strengthen reward correctness logic to handle final message is not asnwer form assistant. Also update logs for reward functions for better debug
6 months ago
thinhlpg 338655e563 feat: refine user prompt logic for improved clarity and structure
6 months ago
thinhlpg 6d994feeb2 feat: enhance evaluation scripts for base and LoRA models
6 months ago
thinhlpg af7f38c792 feat: add code for qwen architecture
6 months ago
thinhlpg 9009440663 chore: disable logging, enable torch complie
6 months ago
thinhlpg d2f03b96ab feat: enhance evaluation script and remove deprecated shell script
6 months ago
thinhlpg 31dcbf5d8a feat: refactor whole code base, add logic for training R1 distil base models, change some template and reward logics
6 months ago
thinhlpg c90c03267e feat: change user prompt template to search-r1 inspried format
6 months ago
thinhlpg 04593fa8fd style: change line length to 119, organize imports
6 months ago
thinhlpg 3c2deaced9 refactor: restructure code base, better centralize logging logic
7 months ago