When I worked in a research lab with some finicky bespoke equipment, I remember the boss making a related point. If our experimental device is made up of N components that are each 95% reliable, that sounds pretty good... until you start thinking about how big N is, and how the overall reliability is only 0.95^N. It doesn’t take long before you’re only getting data one day a week or whatever.
Imagine you have 20 experiments, each with a 5% chance that the null hypothesis shows a positive result, then there is a good chance that one of them will be positive even if null is true for all of them. That experiment is bunk but you say p=0.05 so all is good!