Apple is still behind the rest of FAANG in terms of publishing ML research, but they've taken the lead in terms of real world impact of ML. If you go back a couple of years the general consensus was that Apple is very far behind and lacks any credible machine learning groups. As with most of their products, they waited for the technology to mature (a bit) and then made magical experiences from it.
Siri is still far behind vs. Alexa and Google Assistant, but for every other ML/AI application I'd argue that Apple has the smoothest experience. This should be a lesson for others building ML-powered products. You don't need to use the best, state of the art models to compete. You can compete on the overall experience.
Sharing details and publications doesn't seem to be a core part of Apple's culture, so I'm glad they're going out of their way to publish more details like this article. Their push to optimize ML inference for on-device rather than cloud is going to have the biggest impact on consumer experience. I'm fairly sure this is part of their AR/VR strategy too. Low-latency local machine learning that powers magical experiences.
So I have not used Alexa (and never used Google Assistant) in a long time. So maybe both of them have improved this situation.
But one of the things I was most impressed about Siri on the HomePod is it seemed to take context (not sure if that is the right word in this case...) into account more often.
For example:
On Alexa regardless of if I asked "how cold is it", "what is the temperature", "is it raining", "what is the weather"... etc etc etc. It would always say the same long weather response.
On my HomePod if I ask "How cold is it" I get "I don't find that particularly cold, it is cloudy and 72 degrees".
If I ask "what is the temperature" I get "its 72 degrees right now".
If I ask "is it raining" "no, I don't think its raining right now".
Then finally if I ask what the weather is I get the full thing.
Does Siri lack a lot of smarts to ask random questions and give the answer? yes, But I found when I switched to HomePod the things I normally do (like weather, timers, and similar) Siri did better.
Like I said, maybe Alexa has gotten better about this. But that is my experience. So for me Siri was better for my use cases. Home commands could use a lot of work, but that seems like an issue that no one has figured out how to make it actually better than flicking a light switch (outside of remote stuff).
I'm slowly learning to hate my Echo because of how Amazon so often tries to suggest more things I can do ("By the way..."). All I want is for it to tell me the time, weather, do timers, add stuff to my grocery app, and tell me if the Leafs won or lost yesterday.
I bought a HomePod to see if it could do everything I need and oddly enough, the Echo does a better job integrating with my iOS-based grocery list (AnyList). With the Echo, I can say "Alexa, add eggs" and it will either add eggs to AnyList or tell me if eggs are already on my list. With Siri I have to say something like "Hey Siri, using AnyList, add eggs to my grocery list".
Curse her out. Say "Alexa, STFU" every time she hits you with a "By the way..." It seems to help; I get BTW messages quite rarely compared to other users.
Seems that the solution to your problem could come if Apple gives you the option of choosing a default list app.
I use the built-in Lists app, so I just say, "Hey, Siri, add ginger ale to my groceries list." Hopefully alternate defaults will come, the way iOS is slowly embracing alternate defaults for other functions.
That said, about 50% of the time when I say, "Hey, Siri, add sandwich bread to my groceries list," she responds, "OK, I've added those two things to your list."
Siri weather responses are somewhat egocentrism. They often don’t take into account what might be comfortable in one locale isn’t what is comfortable in another.
A -5° Centigrade winter daytime temp might be completely normal and expected in my area.
However, Siri will love to tell you, like a long-lost distant relative from tropical climes, about how terrible and how uncomfortable it is.
> Does Siri lack a lot of smarts to ask random questions and give the answer? yes, But I found when I switched to HomePod the things I normally do (like weather, timers, and similar) Siri did better
Disclaimer- I have never worked on Siri and don’t know. But this smacks of “hard coding” the common cases, something that apple is much more likely to do than google or some of the other big tech companies because of how poorly it scales. For me this is not evidence at all that Apple has caught up in the ML space, just that they have good enough speech to text and better UX people willing to invest in identifying and manually outlining a lot of cases.
>Siri on HomePods are a disaster. You have to shout at it even for it to pickup what you are saying. Mostly when the volume is a little high.
I have the opposite experience on the HomePod mini, it's actually one of the thing that they are better at. My google home device is entirely useless to try to talk to if something is already playing on it. Compared to Siri that will understand me.
The issue is more with Siri itself being dumb and not that great.
I’ve only used Alexa in stand-alone units, and Siri on my phone and watch. For me, Alexa regularly fails, and the overwhelming majority of my interaction attempts play out like this:
Me: Alexa, Badezimmer 10%
Alexa: Ich kann nicht Prozent auf Spotify finden
*Five attempts later, it actually works*
(It was even less reliable in English)
Siri mostly hears the right words when I ask questions, and the voice keyboard (is that the same speech-to-text engine?) is correct somewhat more often than the swipe keyboard (although that doesn’t say very much as the latter seems to have become very confused as a result of me learning German).
I‘ve had Siri set to German for forever (my native language). It regularly has problems recognizing me saying things in English mid-sentence, like when I request it to play a song. I do this a lot (car commuter).
For fun, I set Siri to English for a day and I was blown away at how much better it is. Both in recognizing requests, despite my accent, and in the quality of the voice synthesizer. Song requests became trivial and everything felt much more fluent. English Siri is even able to express emotions (happiness), which is something I’ve never heard in the German version.
I’ve simply never found Siri useful - I don’t know how much of that is fighting comprehension, vs fighting Siri’s generally not great search behaviour. I find talking to machines awkward in general as even for the best cases I still find myself making deliberately structured sentences such that I prefer typing. (Obviously typing doesn’t work for the “assistants”, but I don’t have any of them due to the above issues essentially ruling them out)
I have to make a list of it sometime. My primary issues are with how apple deal with multiple devices in somewhat close proximity. The HomePod always picks up the request irrespective of how close or far I’m from it. My phone might be right next to me but unless I pick it up and unlock it, HomePod takes priority. Which might be fine if there’s a way to either control other devices or to handoff the request to a closer device.
“Not you, I want the Siri on my phone” or “switch to my phone/iPad”
“Skip the song on my iphone”.
Much worse is the responsiveness.
“I don’t understand”
“Still working on it”
“I can’t seem to be able to connect to the internet at the moment”
“I’m sorry. Can you please repeat?”
With all the on-device ML stuff I assumed they’ll get Siri to operate without the internet for basic recognition and switch to network mode for more complex use cases.
I wish Siri had a hierarchy of preference. In the end I switched it off on the Mac and iPad and mainly use it with my phone. I have to press the button to use it on my watch.
> Siri on HomePods are a disaster. You have to shout at it even for it to pickup what you are saying. Mostly when the volume is a little high.
I always speak in a normal (or slightly quieter) volume and never have an issue with Siri activating.
My biggest issue is that my HomePods need to get restarted or power cycled periodically (e.g. after a power outage or network update). I wish they were a little faster and had better self-healing but generally work pretty well as room speakers which is primarily what I got them for.
I think it's at least partly due to Siri attempting to be more privacy respecting and doing more work on-device rather than server-side like Google and Amazon's offerings.
Counterpoint: I have a HomePod mini in one of my rooms and it reacts easily to any mention of Siri in other rooms. So I am actually impressed with how good it is in reacting.
I had a similar issue where a HomePod mini in our living room always responded when in the kitchen, so I just disabled the microphone on the living room HomePod minis.
I have a google home and it is insanely good at hearing you speak. Even with the exhaust fan above the stove and the sink running, I can still talk to it like I would talk to another person in the room.
My Google Home mini (first gen I believe) has actually become atrocious at recognizing anything while it worked very well in the first years. Sometimes it still works well after rebooting but not always
Google ships all these too, either in Android or Google Photos. The UI is sometimes less smooth than Apple's, but ML is competitive. (Which I think is your point: Apple does have competitive ML.)
Google Photos is the wrong product to ask this question, IMHO. One of its key highlight is sharing albums with others and preserving them for the future, so why waste a consumer device's energy on unnecessary tasks if the image is going to end in a Google datacenter anyway?
There are published and opensource (Apache license, according to the github LICENSE file) models for Tensorflow Lite which runs on Pixel Edge TPUs (and most embedded devices):
Picture tagging, Face ID are really great, but Siri is such a letdown.
Absolutely useless beside setting a timer.
Over the years it got better, but it is still dimensions behind Alexa.
Amazon and Google are since the begging “web companies”, Apple comes from the hardware side and has problems keeping up with „information services“, but the Apple owned Search Engine is probably only a few years away.
Discovered a few days ago while hiking that if you take a photo, you can tap on the info icon and it will tell you what kind of plant you just photographed. It works and is so low friction! This is how its done.
I'm much more interested in the companies like Apple that are building products with ML as opposed to the ones that publish papers for the sake of publishing papers and getting more publicity like DeepMind. (Which is a personal preference, nothing wrong with what DeepMind is doing, I just don't really care about it that much)
>they've taken the lead in terms of real world impact of ML
Maybe from a US lens this is true but FB and Google (android) have much larger worldwide impact on most of these fronts where they did the leg work, published their research (which certainly aided apple who rarely publishes) and had the features out earlier.
- Sleep detection sort of works for going to bed, but doesn't recognize a lie in after sleepily hitting the button and dozing off for another hour. It also doesn't recognize being awake in the night, unless I actively get up.
- It detects maybe half of the walks I go on, and usually well after I've already triggered cyclemeter. (So, there's a simple heuristic that I'm on a walk -- I've told it so)
- It interprets the cat headbutting my wrist as a cue to change the watch face.
They did release a paper where they said there are more important measures of AI than minimizing error, and that was ease of use and real world impact.
Honestly, all of these examples have actually been solved by Google/Android before Apple/iOS, except Face ID. Google even pioneered Federated Learning with gBoard ages ago. The only difference is that a lot of it was via cloud, not on device.
The point is about ML chops, not processing locality.
So is this just it? Do we get another year of this great sounding but soon to be disproven/softened and then downplayed hype every year for the rest of time?
I don't think Siri is behind from Apple's perspective, because it has different goals. From a user perspective, Siri doesn't do as much, that's clear enough. However Siri is predictable, the others are creating an uncanny valley where it does more but you never quite know what or how.
Apple has been consistent in signaling that they want to push as much AI to the device as they can, for example on-device speech recognition was a heavily promoted milestone. That approach is broadly incompatible with using one enormous cloud model to provide rich interactions.
I think the goal for Siri is to become a better and better decision tree, rather than a conversation-driven knowledge engine like Google and Amazon have. I support that, I have actual humans to talk to and like that my interactions with Siri are based on simple phrases which do something and don't bottom out in a Google search or (yuck) a pitch to buy some service.
Addendum: Siri does bottom out on a search of the web, yes, what I mean is that Siri encourages use patterns (at least for me) where this basically never happens.
Somewhat disagree on this: in my experience, Siri is really bad even at what it supports. For example:
* Alexa (in my kitchen) almost always plays the music I ask for on Spotify. Siri (in my car) gets it wrong about half the time — often in comical and bizarre ways.
* Siri remains unable to call one of my family contacts: every time, she gets into a loop asking for clarification between two options (when I clarify, she just asks again).
* I set up an automation with Siri to start the radio playing on BBC Sounds (to stop me falling back to sleep when my alarm goes in the morning). This was stupidly fiddly to set up, and it just doesn't work: Siri tells me to unlock the phone first, which destroys the whole point.
I guess the caveat is that my experience with Siri is a bit limited, because in the light of her uselessness I often don't even try to get her to do things.
I absolutely disagree, it's very poor quality. The best example I've gathered incidentally is "what's this song" regressing between versions to replying with an admonishment to turn off the music or podcast, no matter how quiet it is. Then the one time I've tried to set a reminder, Siri confirmed it was set, and it never actually worked.
Considering how many years it's been around, these sorts of inexplicable failures are really inexcusable. Sure, I understand nobody tests their software anymore, but "predictable" is not the term I would use to describe Siri.
It didn't build on latest Xcode on the Mac Studio. I'm experienced with Cocoa and Apple's APIs, but couldn't fix the problem in 30 minutes of poking around.
This one looked promising, but failed with a runtime assertion and I was unable to figure it out.
At this point I wish Apple spent more effort on making their existing frameworks usable. If the only available sample code doesn't even work on a brand new Mac, the API isn't going to be used by third parties.
I don't want to use CoreML or Python, I wanted to learn about the lower-level implementation using Metal Performance Shaders. And the only available sample code doesn't even run.
This is sadly not a unique situation for Apple's more specialized public frameworks — they're more like semi-private because nobody outside Apple can make them work.
Ah, yeah, and that's an improvement over the historical state of their ML support. For inference the M1 is a nice platform (fastish and power efficient) but for research and training not fast or well supported enough to justify not using NVIDIA.
I wonder how approachable it would be to optimise a custom model for ANE. According to the code examples at the bottom, the current implementation seems to be a custom model, so no generic solution.
Anyway, it seems that we are at dawn of deploying mode cool models which formerly required cloud computation to the hands of the users. Really cool!
Are we going to see more federated learning being pushed to user devices or is it a dead branch only useful for a few use cases?
That was the idea behind ONNX, which has been somewhat successful.
But if you’ve ever had to deploy these models, you’ll know it’s never as simple as promised. Different libraries use different ops, have weird export requirements or even noticeable performance issues.
I would have been more surprised to find out there was. Simple is rarely goes along with ML / AI libraries.
Any abstraction layer will add overhead and probably present a lowest common denominator interface. The audience willing to accept those costs is small.
> Any abstraction layer will add overhead and probably present a lowest common denominator interface. The audience willing to accept those costs is small.
Some examples:
- On-device semantic image search
- Text highlighting (images and videos)
- FaceID
- Computational photography, depth mapping (sensor fusion)
- Background removal (iOS 16)
- Activity detection (Apple Watch)
Siri is still far behind vs. Alexa and Google Assistant, but for every other ML/AI application I'd argue that Apple has the smoothest experience. This should be a lesson for others building ML-powered products. You don't need to use the best, state of the art models to compete. You can compete on the overall experience.
Sharing details and publications doesn't seem to be a core part of Apple's culture, so I'm glad they're going out of their way to publish more details like this article. Their push to optimize ML inference for on-device rather than cloud is going to have the biggest impact on consumer experience. I'm fairly sure this is part of their AR/VR strategy too. Low-latency local machine learning that powers magical experiences.