Computers that we talk to, and which talk back to us, have always tickled the imaginations of science fiction, TV and film writers. From Hal 9000 to Johnny 5, Holly to Deep Thought, at the forefront squarely sits the idea that once computers become central to the human world they would develop into a form of Artificial Intelligence (A.I.) with which we could converse.
Unfortunately, however, science fiction writers tend to transport us forward to the times when A.I. would be so advanced that discovering the answer to life, the universe and everything would be possible, or to a scenario wherein computers kill humans to prevent their programs being shut down. In doing so they skipped over the stage where frustrated housewives screamed into handheld telephones that they want to call Shirley, not Shelley, or a spooky Greek chorus of household speakers laughing maniacally for no reason, or when news reports cover the story of a little girl using an Amazon Echo Dot to order a doll's house, thereby causing TV viewers' devices to do the same thing.
"There's a good chance I get murdered tonight," one Twitter user wrote after his Alexa let out an unprovoked, eerie laugh. Amazon says it is working on fixing the inexplicable cackles, but people are still disturbed. pic.twitter.com/E66xUxhDQf— CBS News (@CBSNews) March 8, 2018
Although most people don't think about Amazon Alexa, Google Home, Apple's Siri etc. as Artificial Intelligence, the reality is that the domestication of A.I. has come in the form of voice-operated e-commerce. Amazon Alexa and Google Home have crossed the threshold between consumer electronics and the brave new world of A.I. that was once the realm of the likes of Arthur C. Clarke and Douglas Adams. And, as most readers of Mondato Insight will doubtless know, payments are the grease that makes that sort of transition possible: unless cash-on-delivery is a common payment method, being able to order an item without being able to pay for it isn't terribly useful.
And while in-home systems like Alexa and Home offer a great deal of convenience for consumers who make regular or repeat purchases from Amazon or Google, their weaknesses and limitations are significant and obvious: unless a PIN code is activated on the device, it is possible for anyone to make purchases simply by saying the device's trigger word (as the much-too-cute example of the little girl in Texas buying 4lb of butter cookies and a $170 doll's house demonstrate). And even if the security device of an oral PIN is activated, it is not the most practical option, as saying out loud a PIN-code is just about the least secure way imaginable of authenticating a transaction (as the 'Alexa Challenge' video submission below from Discover inadvertently illustrates).
Despite these weaknesses, and the limited number of use-cases that are currently available, some analyst projections forecast that within 5 years almost 1 in 3 American adults will have adopted voice payments. This seems optimistic, to say the least, although it would seem that Business Insider takes a very expansive view of "adoption" to mean "has ever used". Nevertheless, these sorts of numbers would have to come as part of a larger wave of adoption of other A.I./ Internet-of-Things tech that can yet perhaps just be seen on the horizon.
Image courtesy of Business Insider Intelligence
Loose Lips Save Time
The value proposition offered by voice commands is obvious: it allows for multi-tasking and saves time. But the constraints of current technologies are also fairly clear in the relative lack of security and clunky UX that is currently offered. Despite this, however, machine learning and A.I. are advancing at an exponential rate, with every "Hey Siri" request potentially helping to hone the computer's ability to detect the owner's voice and theirs alone.
Indeed, a small number of companies, such as UK 'neo-bank' Atom, have gone live with voice recognition security as one of its authentication methods, alongside other biometrics such as facial recognition and fingerprinting (the latter of which Atom does not leverage). Anecdotal feedback suggests that Atom's voice recognition has a higher success rate than its facial biometrics, though neither can be relied upon without a PIN to fall back on. The tech remains, however, flawed, as demonstrated by the BBC last year when a reporter's twin brother was able to access his account information using voice authentication.
Despite all its current problems, Uber was a trailblazer in the mobile payments space through the way it demonstrated what could be achieved once the friction of paying was removed and payment became a thing that happened rather than a thing that was done. In this regard, so too will "voice payments" somewhat disappear as a concept as they become more commonplace.
Once a user's voice and identity can be securely and reliably authenticated, and their payment method securely stored, then what we will have will not be "voice payments" but, more accurately, "voice-initiated shopping" or "voice banking", a process during which payment happens, invisibly and frictionlessly. "Voice payments" will likely become reserved for that small sub-set of banking that involves the payment of bills or P2P transfers (or perhaps mobile top-ups in some markets).
Within the mobile app ecosystem, the hollowness of "voice payments" is even more brutally visible. Yes, customers can order Siri or Bixby to send $10 to John, but in reality the process is likely to involve at least one and perhaps two physical authentication processes on the phone, either with fingerprint or PIN, demonstrating that even if the mobile device has mastered unique recognition of a sole command voice, the process still is stuck in a Twilight Zone between being "voice" or merely "handsfree". Either way, it is not entirely obvious under what circumstances such a process could be useful, or notably faster than just asking Siri/ Bixby to open Venmo. Additionally, it kills the "social" aspect of apps like Venmo, which were the primary attraction of them for some in the first place.
Multimedia, But Voice Activated
So while Business Insider's highly optimistic forecasts are predicated upon major advances being made in the A.I. sub-fields of both machine learning (computers learning from context) and natural language processing (understanding natural human language and speech), unless the security aspect is perfected it seems unlikely that a wide range of "voice-only" use cases will spring up. This is one of the main reasons why voice-activated A.I. seems likely, in many cases, to develop as part of a multimedia A.I.
As yet, devices such as the Amazon Echo Show have not set the world on fire (perhaps because "Amazon Echo with a screen" doesn't really ignite the imagination), but in truth they are likely to prove to be a more accurate forebear of the future of voice-activated commerce than small speakers that play music, take orders for dogfood delivery and spontaneously burst into laughter. Even inside personal automobiles, the ultimate example of "dead time" and perhaps once seen as the perfect candidate for voice-activated A.I., the very real prospect of a driverless world means that in the near future the morning commute is more likely to involve screens rather than voice commands.
In truth, we are a long way from "voice payments", and as an industry need to be clearer on the terminology we use. While "voice payments" does indeed sound like something that could have featured in 2001: A Space Odyssey, if all we mean is invisible and frictionless payment processing of a voice-initiated order, we still have a long way to go. Nevertheless, it might be better to stay on the safe side, and make sure your phone can't see your lips if you ever decide to delete your "voice payment" app.