10 Commits (e3163081a04e0c895cfeb9e241fed5b1202dac1d)

Author SHA1 Message Date
thinhlpg d0e6068055 fix: strengthen reward correctness logic to handle final message is not asnwer form assistant. Also update logs for reward functions for better debug
3 months ago
thinhlpg 1047e2fa1c chore: update .gitignore and requirements for unsloth versions
3 months ago
thinhlpg 83f86869f6 chore: update .gitignore and add new toys data files
3 months ago
thinhlpg d2f03b96ab feat: enhance evaluation script and remove deprecated shell script
3 months ago
thinhlpg 60233f2113 chore: update .gitignore
3 months ago
thinhlpg fd32bcacfd chores: update worklog and research progress
3 months ago
thinhlpg 3c2deaced9 refactor: restructure code base, better centralize logging logic
3 months ago
thinhlpg 7d4de89186 chore: update worklog 250324
3 months ago
thinhlpg a58722e16f feat: add initial project structure and core functionality
3 months ago
Thinh Le bf32fdd897 Initial commit
3 months ago