ReZero-Search-LLM-Agent-Fork/docs/agentic-reward-modeling.md

# Agentic Reward Modeling

- <https://medium.com/@techsachin/agentic-reward-modeling-combine-human-preferences-with-verifiable-correctness-signals-for-reliable-76c408b3491c>
- <https://arxiv.org/pdf/2502.19328>
- <https://github.com/THU-KEG/Agentic-Reward-Modeling>
- <https://www.themoonlight.io/en/review/agentic-reward-modeling-integrating-human-preferences-with-verifiable-correctness-signals-for-reliable-reward-systems>

- [x] Research a bit more on this because I'm a bit outdated on the training side
    - [x] How does the dataset look like?
    - [x] How to evaluate the performance?