You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

11 lines
594 B

# Agentic Reward Modeling
- <https://medium.com/@techsachin/agentic-reward-modeling-combine-human-preferences-with-verifiable-correctness-signals-for-reliable-76c408b3491c>
- <https://arxiv.org/pdf/2502.19328>
- <https://github.com/THU-KEG/Agentic-Reward-Modeling>
- <https://www.themoonlight.io/en/review/agentic-reward-modeling-integrating-human-preferences-with-verifiable-correctness-signals-for-reliable-reward-systems>
- [x] Research a bit more on this because I'm a bit outdated on the training side
- [x] How does the dataset look like?
- [x] How to evaluate the performance?