Checking if a recommendation system is actually good in practice is kind of tough to do without owning a whole internet media platform as well. At best, you'll get the table scraps from these corporations (in the form of toy datasets/models made available), and you still will struggle to make your dev loop productive enough without throwing similar amounts of compute that the ~FAANGs do so as to validate whether that 0.2% improvement you got really meant anything or not. Oh, and also, the nature of recommendations is that they get very stale very quickly, so be prepared to check that your method still works when you do yet another huge training run on a weekly/daily cadence.
> you still will struggle to make your dev loop productive enough without throwing similar amounts of compute that the ~FAANGs do so as to validate whether that 0.2% improvement you got really meant anything or not
And do not forget the incredible of number of actual humans FAANG pays every day to evaluate any changes in result sets for top x,000 queries.
As someone whose customers do this stuff, I'm 100% for most academics chasing harder and more important problems
Most of these papers are specialized increments on high baselines for a primarily commercial problem. Likewise, they focus on optimizing phenomena that occur in their product, which may not occur in others. Eg, Netflix sliding window is neato to see the result of, but I rather students user their freedom to explore bigger ideas like mamba, and leave sliding windows to a masters student who is experimenting with intentionally narrowly scoped tweaks.