thinhlpg
2df9f39fda
feat: update model configuration (longer context) and dataset loading logic for improved performance and flexibility
4 weeks ago
thinhlpg
eebf914a81
refactor: moved modules from src/deepsearch to src/
1 month ago
thinhlpg
2fec4f2f42
refactor: change repo stucture (move code from src/ to src/deepsearch)
1 month ago
thinhlpg
1a18cd7bfd
feat: update training and evaluation configurations (editable agent generation scripts)
...
Increased max_generations parameter in agentic_generate and run_eval functions for improved output flexibility.
1 month ago
thinhlpg
bf480574a2
fix: minor bug
1 month ago
thinhlpg
4de31e0f30
feat: expand reward functions with new strategies and diversity checks
...
- Added reward functions for search strategy and search diversity
- Updated reward_format to include validation for proper message endings.
1 month ago
thinhlpg
af7f38c792
feat: add code for qwen architecture
1 month ago
thinhlpg
31dcbf5d8a
feat: refactor whole code base, add logic for training R1 distil base models, change some template and reward logics
...
- Break down rl_helpers into smaller modules
- Removed deprecated rl_helpers module to streamline the codebase.
- Enhance initial user prompt template inspired by Search-R1
1 month ago
thinhlpg
da79e986b6
feat: add new script and functionality in train script to save model in 16 bit format
1 month ago
thinhlpg
04593fa8fd
style: change line length to 119, organize imports
1 month ago
thinhlpg
3c2deaced9
refactor: restructure code base, better centralize logging logic
1 month ago