> For 1. Apple is certainly trying to fit models onto an iPhone with OpenELM and...

> For 1. Apple is certainly trying to fit models onto an iPhone with OpenELM and I think the M4 debuting on an iPad is also posturing for local compute as well.

You misunderstand point 1. Three-digit billion parameter models like GPT-4 (or even 3.5) aren't going to fit on modern iPhones, not even close. Even Llama 70b requires 35GB of RAM at q4 quantization, and that's just for the model.

Compare that to the iPhone 15's 6GB and you'll see the problem. Apple isn't about to announce an iPhone with 6x as much RAM as any previous model. I'd be shocked if they even doubled it. Their local inference models are going to be tiny and limited, which is fine, but that means they have to go to the cloud to provide all the features people are expecting.