4 Commits (3081d6e36b22baea937801fd66031d5e8e85c10f)

Author SHA1 Message Date
thinhlpg 3081d6e36b test: added tests for new reward functions: search strategy and search diversity
1 month ago
thinhlpg d0e6068055 fix: strengthen reward correctness logic to handle final message is not asnwer form assistant. Also update logs for reward functions for better debug
1 month ago
thinhlpg 1bd609dfae test: enhance reward correctness tests with validation logic
1 month ago
thinhlpg 3910ef343a test: add unit tests for agent, reward functions, and tokenizer adapters
1 month ago