Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Reinforcement learning in general didn't come out of AI alignment work, but RHLF in particular did. The initial idea and paper [1] were from AI alignment folks, as was most of the later development [2][3][4][5]. Overview: https://www.alignmentforum.org/posts/vwu4kegAEZTBtpT6p/thoug...

[1] Deep Reinforcement Learning from Human Preferences https://arxiv.org/abs/1706.03741

[2] Fine-Tuning Language Models from Human Preferences https://arxiv.org/abs/1909.08593

[3] Learning to summarize from human feedback https://arxiv.org/abs/2009.01325

[4] Recursively Summarizing Books with Human Feedback https://arxiv.org/abs/2109.10862

[5] Training language models to follow instructions with human feedback https://arxiv.org/abs/2203.02155



Maybe I am missing something, but I don't entirely understand what exactly is the novelty of RLHF in this case. RL is a long-standing field and it is being trained based on human inputs for decades. Even the first paper you mention as the "initial idea" simply claims that: "Our algorithm follows the same basic approach as Akrour et al. (2012) and Akrour et al. (2014)" and that "A long line of work studies reinforcement learning from human ratings or rankings" and finally that "our key contribution is to scale human feedback up to deep reinforcement learning and to learn much more complex behaviors".


If you want more details the last paper, on InstructGPT, is probably the most interesting?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: