thinhlpg
4de31e0f30
feat: expand reward functions with new strategies and diversity checks
...
- Added reward functions for search strategy and search diversity
- Updated reward_format to include validation for proper message endings.
1 month ago
thinhlpg
af7f38c792
feat: add code for qwen architecture
1 month ago
thinhlpg
31dcbf5d8a
feat: refactor whole code base, add logic for training R1 distil base models, change some template and reward logics
...
- Break down rl_helpers into smaller modules
- Removed deprecated rl_helpers module to streamline the codebase.
- Enhance initial user prompt template inspired by Search-R1
1 month ago
thinhlpg
da79e986b6
feat: add new script and functionality in train script to save model in 16 bit format
1 month ago
thinhlpg
04593fa8fd
style: change line length to 119, organize imports
1 month ago
thinhlpg
3c2deaced9
refactor: restructure code base, better centralize logging logic
1 month ago