We run online A/B tests to objectively measure quality against our ranking algorithms and other baselines. As you mentioned it's crucial that the measure of quality for these tests chosen is fair and correlates with the topline business objective. E.g. if you just evaluate clicks then the system will show click-baity content and overall perform worse.
To handle this, we make it really easy to define different objectives and experiment with how it changes results. So although we don't claim to solve the issue directly, we believe that if users can quickly experiment with different proxy objectives, that'll be able to find the one that correlates with their topline objective quicker.
We run online A/B tests to objectively measure quality against our ranking algorithms and other baselines. As you mentioned it's crucial that the measure of quality for these tests chosen is fair and correlates with the topline business objective. E.g. if you just evaluate clicks then the system will show click-baity content and overall perform worse.
To handle this, we make it really easy to define different objectives and experiment with how it changes results. So although we don't claim to solve the issue directly, we believe that if users can quickly experiment with different proxy objectives, that'll be able to find the one that correlates with their topline objective quicker.