Tuesday, February 27, 2018 by David Williams
During the last Super Bowl, Amazon had an ad spot for Alexa, its intelligent voice-based personal assistant. During the spot, Alexa’s name – usually the device’s trigger to go from Sleep mode to Wake mode and await a user’s commands – was uttered some 10 times. Surprisingly, these didn’t trigger the Alexa devices that were sitting in viewers’ homes. That might not seem like much, but it’s a huge win in terms of the device’s usability in that particular situation.
Amazon later revealed that Alexa’s ability to simply ignore the sound cues from the commercial during the broadcast was made possible by something called acoustic fingerprint technology. This is what allowed all of the Alexa devices in the wild, and listening in during the live broadcast, to easily distinguish between the words that typically trigger them to wake up and the commands that were given by actual users.
Of course, as a new report now posits, Amazon had a lot of data and resources at their disposal. They were able to plan ahead of time to make sure that they wouldn’t be waking up the Alexa devices already being used by their customers while their new Superbowl ad played. But what if that technology — acoustic fingerprinting — could be taken to the next level and worked on a bunch of other sound signatures?
That’s exactly what one startup company called Audio Analytic is evidently doing: they are currently modeling sounds beyond voice and music. So far, it is said that they have encoded the sounds of a baby’s cry, a glass window being broken, a dog’s bark, and a smoke alarm going off. At the moment, they’ve also developed a method to spot any small anomalies in ambient noise, and they are licensing their software products to various consumer electronics companies.
According to Chris Mitchell, the Chief Executive Officer (CEO) of the company with a dual degree in music technology and electrical engineering from Anglia Ruskin University, their work is the result of much more complicated work than what Amazon had to do for their Super Bowl commercial. “With speech, and in particular wake words, the sound you make is constrained by the words and the broader rules of the language,” he explained, “there are a limited number of phonemes that humans can produce, and our collective knowledge here is considerable.”
In other words, audio fingerprint technology might be good enough for certain specific wake words, but for other sounds and noises, a much more robust system is required. For their work, Mitchell’s company relies on a deep learning system in order to analyze sounds and code them into ideophones — the representation of sound in speech. As a result, they could potentially give voice-based personal assistants that extraordinary ability to listen in for a variety of background noises and act accordingly, depending on the situation.
In an ideal scenario, that would be a wonderful feature to have on a device like Alexa. However, it does have the potential to be used for nefarious purposes. As one example, Alexa could listen in for certain types of background noise unhindered, even while it sleeps. That would, of course, violate the privacy of its users, who might be under the impression that the device only works once they utter the designated and supposedly necessary wake words.
The silver lining in all of this is that it may be a while before the technology that would allow such a situation to happen exists in the real world. But knowing how quickly new can be developed, it’s always good to be one step ahead in order to ensure your own personal safety and privacy during these times.
Stay alert and follow the latest news on privacy with PrivacyWatch.news.