A specific software implementation for a specific algorithm may be the optimal solution for one specific type of hardware, but not for others. Or: it doesn't make much sense to think about software performance without taking into account the specific hardware it needs to run on.
Not sure if this is at all surprising for the more academical 'computer science types', but for low level coders it's been very obvious since pretty much forver.
And then you quickly reach the conclusion that the only way to assess whether one of the implementation choices will be faster/less memory consuming/etc. in production is to implement both, deploy to production and observe the performance on the actual workloads. And the findings don't generalize, so you can't actually learn from them and build a predictive theory.
And this conclusion doesn't sit well with quite a number of developers, myself included, because I personally would like to be able to do the "right" choice without the brute force of "try everything and see what works better".
Not sure if this is at all surprising for the more academical 'computer science types', but for low level coders it's been very obvious since pretty much forver.