Systems might want to anticipate changes in LLM architectures (even small change...

Systems might want to anticipate changes in LLM architectures (even small changes can make a big difference kernel wise), so it's good to not "bake" too much in ahead of time.

That said, at some point it just depends where the costs lie and it might make sense hiring some GPU engineers to do what they did here for whatever architecture you're optimising for.

Not as low-hanging as you might imagine.