When somebody asked this unknown illustrator in 1924 how he imagined technology of the future, the below was his answer:
While some of what is depicted in this image is not a reality yet, such as flying cars, video calling on mobile devices certainly is. The image also seems to predict the social isolation that has become commonplace while interacting with technology.
Years later, in 2013, a movie by Spike Jonze called “Her” envisioned the future of virtual voice assistants. The movie circles around a love story between a human and his voice assistant (Samantha) and shows factors that influence any relationship: trust, knowledge, distrust, lack of communication. All these factors should be taken into account when designing voice applications.
As the technology behind these features continues to improve, it is certain that we will see consumers using voice for more applications than ever before. It is likely to become the norm for many daily interactions and customers will come to expect that video services can be navigated in this way. Accedo’s focus is on creating engaging video user experiences, so it is not surprising that voice-led solutions are high on our priority list.
You may have seen our announcement last year about the work we did with Channel 4 for a Google Home deployment. The implementation meant that viewers could launch and control their viewing experience by giving voice commands to Google Assistant.
One thing we learned during that project is how important it is to ensure that the experience is as natural and simple as possible. There is nothing worse than having to repeat yourself multiple times in order to access the video you want.
We have also been busy behind the scenes researching what makes a good voice-based video experience. This started with desk research but was quickly followed by a qualitative field study to better understand how people are, and want to be, interacting with voice assistants as they relate to video experiences.
We conducted interviews, user testing and ended with a diary study to analyse how language patterns were used during interactions. Following the interview stage, we ran through user testing on a voice proof of concept; via a think-aloud protocol, the participants interacted with the app and its voice capabilities to get qualitative feedback. Finally, over the course of a week, the same participants logged in every time they interacted with a video app, thought about an action they wanted to perform, and then recorded the voice command and sent it to us as an audio message.
Our findings revealed a few different formats for requests, and we noted that interactions were often given with additional context to support the participant’s command:
Orienting – “I am done searching. Just take me back to …” – Users tried to be specific with their commands, explaining what they were currently doing and where they wanted to navigate next.
Either/Or – “How are … doing or what are the … scores” – Users provided alternative versions of the same question to accommodate assumptions on the machine’s limits of understanding.
Extra Info – “When is … on tonight and where can I watch it?” – Users expected the assistant to understand multiple commands at once.
As competition increases in the video industry, providers are looking for new ways to increase engagement and improve viewer retention. Making it easier and quicker for consumers to find the content they want is one effective way of doing that. I certainly think voice interaction is a game-changer that will likely be included to some degree in all video services over the coming few years.
At Accedo, we are defining, testing, and improving what we think a good design process will look like for Voice. We are also identifying the most meaningful use cases in the video experience and we look forward to continuing to innovate in this space.