10 Commits (bf480574a2d93758185075015fe8d028a01bcf0c)

Author SHA1 Message Date
thinhlpg d0e6068055 fix: strengthen reward correctness logic to handle final message is not asnwer form assistant. Also update logs for reward functions for better debug
4 months ago
thinhlpg 1047e2fa1c chore: update .gitignore and requirements for unsloth versions
4 months ago
thinhlpg 83f86869f6 chore: update .gitignore and add new toys data files
4 months ago
thinhlpg d2f03b96ab feat: enhance evaluation script and remove deprecated shell script
4 months ago
thinhlpg 60233f2113 chore: update .gitignore
4 months ago
thinhlpg fd32bcacfd chores: update worklog and research progress
4 months ago
thinhlpg 3c2deaced9 refactor: restructure code base, better centralize logging logic
4 months ago
thinhlpg 7d4de89186 chore: update worklog 250324
4 months ago
thinhlpg a58722e16f feat: add initial project structure and core functionality
4 months ago
Thinh Le bf32fdd897 Initial commit
4 months ago