The typical audio environment for HomePod has many challenges — echo, reverberation, and noise. Unlike Siri on iPhone, which operates close to the user’s mouth, Siri on HomePod must work well in a far-field setting. Users want to invoke Siri from many locations, like the couch or the kitchen, without regard to where HomePod sits.
This is both a detailed and fascinating look at how Apple uses machine learning to get your HomePod to recognize that Siri trigger phrase (which I would love to be able to change, someday.)
Yes, there’s a lot of detail, but if you just skip through the dense stuff, I found some interesting nuggets, like:
The corruption of target speech by other interfering talkers is challenging for both speech enhancement and recognition.
When “Hey Siri” is detected, each stream is assigned a goodness score. The stream with the highest score is selected and sent to Siri for speech recognition and task completion.
And on testing HomePod:
We evaluated the performance of the proposed speech processing system on a large speech test set recorded on HomePod in several acoustic conditions:
- Music and podcast playback at different levels
- Continuous background noise, including babble and rain noise
- Directional noises generated by household appliances such as a vacuum cleaner, hairdryer, and microwave
- Interference from external competing sources of speech
Bottom line, HomePod is ever-vigilant, constantly solving, in real-time, an incredibly difficult problem, and doing it really well.