Google’s solution to figuring out who is talking

Google Research Blog:

People are remarkably good at focusing their attention on a particular person in a noisy environment, mentally “muting” all other voices and sounds. Known as the cocktail party effect, this capability comes natural to us humans. However, automatic speech separation — separating an audio signal into its individual speech sources — while a well-studied problem, remains a significant challenge for computers.

This is a major hurdle for smart speakers like HomePod and Google Home. While this post focuses on the cocktail party problem (separating individual voices when multiple people are speaking), it is part of a longer problem thread, that of identifying an individual speaker’s voice.

Consider HomePod. If HomePod Siri knew who was speaking, she could be more specific in her response. If I ask Siri to send a text, Siri could look up contacts in my database, but if my wife asked, Siri could use her contact database.

Google Home already solves this problem. And they are well on their way to solving the cocktail party problem as well.

Imagine a day when hearing aids feature the technology to distinguish speakers, offer signal boost on a voice-by-voice basis, let you know who said what, perhaps with the aid of your iOS device.

If this interests you, there’s a series of videos embedded in the Google blog post that shows the current cocktail party tech in action.