You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
thinhlpg d0e6068055
fix: strengthen reward correctness logic to handle final message is not asnwer form assistant. Also update logs for reward functions for better debug
1 month ago
..
__init__.py test: add unit tests for agent, reward functions, and tokenizer adapters 1 month ago
test_agent.py test: add unit tests for agent, reward functions, and tokenizer adapters 1 month ago
test_rewards.py fix: strengthen reward correctness logic to handle final message is not asnwer form assistant. Also update logs for reward functions for better debug 1 month ago
test_tokenizer_adapters.py test: add Qwen tokenizer adapter tests 1 month ago