Apple talks HomePod, machine learning, and recognizing that Siri trigger phrase

Apple:

The typical audio environment for HomePod has many challenges — echo, reverberation, and noise. Unlike Siri on iPhone, which operates close to the user’s mouth, Siri on HomePod must work well in a far-field setting. Users want to invoke Siri from many locations, like the couch or the kitchen, without regard to where HomePod sits.

This is both a detailed and fascinating look at how Apple uses machine learning to get your HomePod to recognize that Siri trigger phrase (which I would love to be able to change, someday.)

Yes, there’s a lot of detail, but if you just skip through the dense stuff, I found some interesting nuggets, like:

The corruption of target speech by other interfering talkers is challenging for both speech enhancement and recognition.

And:

When “Hey Siri” is detected, each stream is assigned a goodness score. The stream with the highest score is selected and sent to Siri for speech recognition and task completion.

And on testing HomePod:

We evaluated the performance of the proposed speech processing system on a large speech test set recorded on HomePod in several acoustic conditions:

Music and podcast playback at different levels

Continuous background noise, including babble and rain noise

Directional noises generated by household appliances such as a vacuum cleaner, hairdryer, and microwave

Interference from external competing sources of speech

Bottom line, HomePod is ever-vigilant, constantly solving, in real-time, an incredibly difficult problem, and doing it really well.