It usually refers to fine tuning language models using data labelled by humans.
Hugging face have a good overview in this article: https://huggingface.co/blog/rlhf