It really doesn't make a lot of sense to do AI at the edge (in terms of the various edge providers).
But then a lot of edge cases don't make a lot of sense. The best edge use cases are fan-in (aggregation and data reduction), fan-out (replication and amplification - broadcasting, conferencing, video streaming, etc.) and caching (which is just a variant of fan-out).
The rest of the cases are IMHO largely fictional - magical latency improvements talked about in the same context as applications that are grossly un-optimized in every way imaginable, AR/VR, etc. Especially the AR/VR thing.
Beyond that the only thing left is cost arbitrage - selling bandwidth (mostly) cheaper than AWS.
What's the use case for moving inference to the edge? Most of the inference will in fact be at the edge - in the device, which has plenty of capacity - but that's not the case you're describing.
But then a lot of edge cases don't make a lot of sense. The best edge use cases are fan-in (aggregation and data reduction), fan-out (replication and amplification - broadcasting, conferencing, video streaming, etc.) and caching (which is just a variant of fan-out).
The rest of the cases are IMHO largely fictional - magical latency improvements talked about in the same context as applications that are grossly un-optimized in every way imaginable, AR/VR, etc. Especially the AR/VR thing.
Beyond that the only thing left is cost arbitrage - selling bandwidth (mostly) cheaper than AWS.
What's the use case for moving inference to the edge? Most of the inference will in fact be at the edge - in the device, which has plenty of capacity - but that's not the case you're describing.