You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
11 lines
594 B
11 lines
594 B
# Agentic Reward Modeling
|
|
|
|
- <https://medium.com/@techsachin/agentic-reward-modeling-combine-human-preferences-with-verifiable-correctness-signals-for-reliable-76c408b3491c>
|
|
- <https://arxiv.org/pdf/2502.19328>
|
|
- <https://github.com/THU-KEG/Agentic-Reward-Modeling>
|
|
- <https://www.themoonlight.io/en/review/agentic-reward-modeling-integrating-human-preferences-with-verifiable-correctness-signals-for-reliable-reward-systems>
|
|
|
|
- [x] Research a bit more on this because I'm a bit outdated on the training side
|
|
- [x] How does the dataset look like?
|
|
- [x] How to evaluate the performance?
|