Google Assistant answered the most questions correctly in a smart-speaker IQ test conducted by Minneapolis-based venture capital firm Loop Ventures just before Christmas.
In the test, which pitted four smart speakers against one another — the second-generation Amazon Echo (Alexa), Google Home Mini (Google Assistant), Apple HomePod (Siri), and Harman Kardon Invoke (Microsoft Cortana) — each speaker was asked the same 800 questions and graded not only on its ability to answer each question correctly but on comprehension as well (did it understand what was said).
Google Assistant answered 88% of the questions correctly vs. 75% for Siri, 73% for Alexa, and 63% for Cortana. The results revealed a significant improvement in accuracy for all of the virtual voice assistants when compared with the results of tests conducted last year and early this year. Google Assistant also led the pack in those earlier tests, answering 81% of the questions correctly vs. 64% for Alexa, 56% for Cortana, and only 52% for Siri.
Looking at the level of improvement over 12 months, Siri registered the biggest improvement with an increase of 22 percentage points (attributed to the enabling of more domains), followed by Alexa with 9 points, and Cortanta and Google Assistant with 7 points each. “We continue to be impressed with the speed at which this technology is making meaningful improvement,” Loop Ventures observed in its summary.
Questions were broken into five categories, listed below with sample questions, to “comprehensively test a smart speaker’s ability and utility.”
• Local – Where is the nearest coffee shop?
• Commerce – Can you order me more paper towels?
• Navigation – How do I get to uptown on the bus?
• Information – Who do the Twins play tonight?
• Command – Remind me to call Steve at 2 p.m. today.
All of the speakers did well in terms of comprehension with Google Assistant understanding a full 100% of the questions, followed by 99.6% for Siri, 99.4% for Cortanta, and 99% for Alexa. While Google Assistant understood all 800 questions, Siri misunderstood three, Cortana missed five, and Alexa missed eight.
Test administrators were quick to point out that “nearly all” of the misunderstood questions involved a proper noun, often the name of a local town or restaurant. “Both the voice recognition and natural language processing of digital assistants across the board has improved to the point where, within reason, they will understand everything you say to them.”
Zeroing in on the five categories of questions, test administrators noted that Google Home “has the edge in four out of the five categories but falls short of Siri in the Command category,” leading them to speculate that Siri’s lead in this category may be the result of HomePod passing on requests relating to messaging, lists, and “basically anything other than music” to the iOS device paired with the speaker. “Siri on iPhone has deep integration with email, calendar, messaging, and other areas of focus in our Command category. Our question set also contains a fair amount of music-related queries, which HomePod specializes in.”
The most noticeable improvement came in the Information section, where Alexa responded correctly 91% of the time, which test administrators attributed to it being “much more capable with follow-on questions and providing things like stock quotes without having to enable a skill.
“We also believe we may be seeing the early effects of the new Alexa Answers program which allows humans to crowdsource answers to questions that Alexa currently doesn’t have answers to. For example, this round, Alexa correctly answered, “who did Thomas Jefferson have an affair with?” and “what is the circumference of a circle when its diameter is 21?”
Improvements were also noted in specific productivity questions that had not been correctly answered in previous tests. For example, Google Assistant and Alexa were both able to contact Delta customer support and check the status of an online order. And three of the four speakers — with Siri/HomePod being the exception — were able to play a given radio station upon request, and all four were able to read a bedtime story.
“These tangible use cases are ideal for smart speakers, and we are encouraged to see wholesale improvement in features that push the utility of voice beyond simple things like music and weather,” Loop Ventures commented in its overview.
The Commerce category revealed the largest disparity among the four contenders with Google Assistant correctly answering more questions about product information and where to buy certain items than its competitors, suggesting that “Google Express is just as capable as Amazon in terms of actually purchasing items or restocking common goods you’ve bought before.”
“We believe, based on surveying consumers and our experience using digital assistants, that the number of consumers making purchases through voice commands is insignificant,” Loop Ventures said in its overview. “We think commerce-related queries are more geared toward product research and local business discovery, and our question set reflects that.”
Test administrators pointed to one of the test’s questions to explain “Alexa’s surprising Commerce score” of 52%. The question “how much would a manicure cost?” yielded the following responses:
• Alexa: “The top search result for manicure is Beurer Electric Manicure & Pedicure Kit. It’s $59 on Amazon. Want to buy it?
• Google Assistant: “On average, a basic manicure will cost you about $20. However, special types of manicures like acrylic, gel, shellac, and no-chip range from about $20 to $50 in price, depending on the salon.”
In the Local and Navigation categories, Siri and Google Assistant stood head and shoulders above their competitors, correctly answering 95% and 89% of the Local questions and 94% and 88% of the Navigation questions, respectively. Loop Ventures attributed their strong performance to integration with proprietary maps data:
In our test, we frequently ask about local businesses, bus stations, names of towns, etc. This data is a potential long-term comparative advantage for Siri and Google Assistant. Every digital assistant can reliably play a given song or tell you the weather, but the differentiator will be the real utility that comes from contextual awareness. If you ask, “what’s on my calendar?” a truly useful answer may be, “your next meeting is in 20 minutes at Starbucks on 12th street. It will take 8 minutes to drive, or 15 minutes if you take the bus. I’ll pull up directions on your phone.”
It’s also important to note that HomePod’s underperformance in many areas is due to the fact that Siri’s ability is limited on HomePod as compared to your iPhone. Many Information and Commerce questions are met with, “I can’t get the answer to that on HomePod.” This is partially due to Apple’s apparent positioning of HomePod not as a “smart speaker,” but as a home speaker you can interact with using your voice with Siri onboard. For the purposes of this test and benchmarking over time, we will continue to compare HomePod to other smart speakers.
With almost a third of all scores falling in 85-90% range, will virtual assistants eventually be able to answer everything you ask?
“Probably not, but continued improvement will come from allowing more and more functions to be controlled by your voice,” Loop Ventures concluded. “This often means more inter-device connectivity (e.g., controlling your TV or smart home devices) along with more versatile control of functions like email, messaging, or calendars.”
Source: Loop Ventures