# Agentic Reward Modeling - <https://medium.com/@techsachin/agentic-reward-modeling-combine-human-preferences-with-verifiable-correctness-signals-for-reliable-76c408b3491c> - <https://arxiv.org/pdf/2502.19328> - <https://github.com/THU-KEG/Agentic-Reward-Modeling> - <https://www.themoonlight.io/en/review/agentic-reward-modeling-integrating-human-preferences-with-verifiable-correctness-signals-for-reliable-reward-systems> - [x] Research a bit more on this because I'm a bit outdated on the training side - [x] How does the dataset look like? - [x] How to evaluate the performance?