You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
thinhlpg d0e6068055
fix: strengthen reward correctness logic to handle final message is not asnwer form assistant. Also update logs for reward functions for better debug
3 months ago
..
__init__.py test: add unit tests for agent, reward functions, and tokenizer adapters 3 months ago
test_agent.py test: add unit tests for agent, reward functions, and tokenizer adapters 3 months ago
test_rewards.py fix: strengthen reward correctness logic to handle final message is not asnwer form assistant. Also update logs for reward functions for better debug 3 months ago
test_tokenizer_adapters.py test: add Qwen tokenizer adapter tests 3 months ago