Amazon Advances Conversational Applications

alexanderskills

Amazon Advances Conversational Applications

Amazon Alexa Conversation

Self

Humans are innately able to speak and communicate with each other in natural language. After all, we weren’t born with screens to swipe or keyboards to type. Our ears and eyes are primed to take external information in spoken and written form and process it through a portion of our brain optimized for language. Simply put, we were born to communicate, and it therefore makes complete sense that we want our machines to communicate with us in the most optimal way that we communicate with other humans.

The conversational interaction pattern is one of the seven core patterns for Artificial Intelligence projects. Common implementations of this pattern include chatbots, voice assistants, natural language understanding tools, natural language generation systems, sentiment and mood analysis based on conversational interaction, content summarization, content intelligence, and even gesture and handwriting analysis, which is a form of human communication.

The biggest challenge in conversational interaction is not only understanding how to convert audio waveforms that form parts of spoken speech into text, or to take individual text words and generate an understanding of a sentence, but how to take those spoken or written words and generate some machine understanding of what the communicator’s intent is. Even more so, the challenge is to take multiple conversational interactions and connect them together in a cohesive manner. While most chatbots are able to respond to individual phrases or sentences, the bigger challenge is properly understanding and processing longer form back-and-forth, “multi-turn” conversations.

The main challenge with multi-turn conversations is that while a human is easily able to maintain a mental “thread” of the conversation across many back-and-forth interactions, it’s much harder for a machine. How can the machine know that something spoken earlier in the conversation relates to something farther along? How can a machine know what the goal or objective is in the conversation, especially with digressions, irrelevant side chatter, or vague or incomplete sentences that perhaps leave the other party to fill in the gaps? Humans are pretty good at handling incomplete information and using prior knowledge to fill in the gaps, but machines are really bad at it. This is why more complicated forms of conversational applications are still to come, and why interactions with voice assistants in particular often hits roadblocks when dealing with multi-turn, multi-step interactions.

Amazon’s Conversations announcement at Amazon Re:MARS

Amazon’s Alexa voice assistant is one of the most widely known and recognized conversational voice assistant devices on the market, but curiously isn’t the most widely deployed, if you count phone and laptop-based voice assistants. However, as a standalone “smart speaker” (a very poor term for what these devices truly are), Alexa is the most widely deployed of all standalone devices, with over 100 million units deployed. It is no surprise that to enable more complicated ecommerce, search, or other applications, the requirement of multi-turn conversation becomes a must. Not only does Amazon’s Alexa need this capability within a single voice application, called a “skill” on the Amazon platform, but also across multiple skills that together would operate to complete a transaction or interaction.

Introduced at the Amazon Re:MARS 2019 event in Las Vegas this past week, Amazon is using a combination of recurrent neural networks (RNNs) and other aspects of machine learning and conversational technology to enable Alexa skills developers to build multi-turn, multi-skill interactions. Rather than requiring users to query multiple skills and engage in many, redundant interactions, Alexa Conversations enables a conversational thread across multiple intent and skill interactions to tie together those threads into a single coherent conversation.

At the conference Amazon VP and Alexa Head Scientist Rohit Prasad demonstrated a potential solution illustrating what planning a night out would look like with the Alexa Conversations capability. Rather than requiring over 40 conversational turns to allow a customer to plan a movie, dinner, and rideshare, the conversation was reduced to just 13 interactions as the system maintained the conversational thread across those interactions.

While it has been previously possible for developers to create multi-turn conversations with Alexa, Alexa Conversations greatly reduces the complexity and also enables the capability to interact across the almost 100,000 Alexa skills currently available. Alexa Conversations is open for registration, but the customer experience will not be immediately available until later this year, according to Amazon sources.

Amazon is not alone in pushing for more rich multi-turn, multi-voice application interaction. Microsoft is pursuing a similar approach for its Cortana service as a result of its acquisition of Semantic Machines, promising a new conversational engine to be released by the end of 2019. LIkewise Google continues to develop its Continued Conversations capabilities that are available on the Google Assistant and Google home devices. Clearly the race is on to make conversation with devices more natural, less complicated, and more like the sort of conversations humans have with each other on a daily basis.

Published at Wed, 12 Jun 2019 21:45:00 +0000