While progress has been made in computer vision, that progress has been relative...

While progress has been made in computer vision, that progress has been relatively narrow up until now, and I think the activation energy required to produce this level of quality would be more than it's worth. As others have mentioned, new footage comes out all the time.

However, I agree with the sentiment. Someday, we will have a massive foundation model capable of producing any video with a little conditioning on text. But we don't currently have such a model. In some sense, we're still in the era of easily verifiable video, and this era might end someday soon.