What I'm saying is, don't get to thinking that the memory problem is some kind of insurmountable, permanent barrier that's going to keep us safe. It's already being addressed, maybe crudely at first, but the situation is already much better than it was - I no longer have to bring the model up to speed completely every time I start a new session. Part of this is much larger context windows (1M tokens now). New architectures are also being proposed to deal with the issue, as well.
Context windows are a natural improvement, but new architectures are completely speculative and it’s unclear we can make any sort of predictable progress with new, better architectures. Most progress has been made on essentially the same architecture paradigms, although we did move from dense models to MoE at some point.