Skip to content

Voice control in OTT

Alex Wilkinson
 September 2017

“Siri, what will the weather be like tomorrow”
“Tomorrow in London, it will be 21 degrees and sunny”

 That first conversation with Siri was quite astounding. The year was 2011 and the promise of natural conversation with a digital assistant was real. No more pinch and zoom, that was so 2010!

However, initial usage was disappointing. Perhaps the peculiarities of talking to a machine were not lost on some people. Or like early smartphones, the technology needed to catch-up with the big idea. In 2011 Automatic Speech Recognition (ASR) sat just above 70% accuracy, so almost one in three voice searches couldn’t be properly recognised. Exponential advances in the technology since then sees 95% of searches recognised today.

Jennifer, Alison, Phillipa, Sue

 Advances in both recognition and processing technology are paving the way for all the big tech companies to release consumer ‘voice first’ devices. There’s a growing school of thought that screens, long the primary interface between human and computer, will become a secondary method of interaction. We are now entering an era of always-on bluetooth speakers which listen for and respond to questions and commands: “Alexa, what is the most populated city in China” or “Google, remind me to call John tonight”.

Amazon Alexa is now available in the UK, Germany and the US, Google home supports the UK & US currently with Canada, Australia, Japan, France and Germany coming soon. Samsung’s Bixby and Microsoft’s Cortana are mobile only for now but we can probably expect voice-first devices soon. Apple’s HomePod will be released before Christmas this year, and although it’s late to the party in terms of stand-alone hardware it does come with a couple of tricks up its sleeve. Firstly, Siri has learnt over 20 languages since 2011, many with regional variations, which could allow for a global launch of HomePod. Secondly HomeKit, Apple’s 3rd party smart home software comes with plenty of integrations with well known devices & services. Whilst perhaps not quite yet at the ‘tipping point’, It’s clear we’re in the early days of voice control’s commercialisation.

Where IoT meets OTT

Whilst increasingly connected devices and appliances is a natural first step for voice control in the home – switching lights on, turning the heating up, annoying your family by remotely changing the colours of lights in the house and so on – Media and Entertainment has been neglected until now. In recent months, Amazon Alexa and Google Home launched the ability to control Fire TV, Chromecast and 3rd party devices using voice commands. Amazon and Google both offer dev kits for 3rd party device manufacturers and applications to control the video experience with voice. Google Home supports just Google Play, Netflix and YouTube for Chromecast but is of course well positioned to capitalise on its Android TV market. Amazon only currently supports it’s own native collections on FireTV but Netgem / EE has integrated Alexa onto their hardware in the UK with YouView and Dish in the US to launch on their STB’s in the coming months.

So either integrating a voice API into your existing Chromecast or FireTV apps or as a stand-alone service on your hardware, with such a large amount of devices the benefits of early integration and adoption are clear.

The Future

 Voice-control is quicker and more user friendly than other input methods and is thankfully becoming more accurate, to the point that we can leverage it as an entry to AI-driven services. Together with IBM Watson, we have created “Aria”, a cognitive discovery chabot prototype that can help the user not only discover more of the video content they love but also new content that they may not have known is available to them. And all of that from the Cloud, leveraging Accedo One and the IBM Watson ecosystem (Video Enrichment, Speech-to-text, Natural Language Understanding & Tone Analyzer). We welcome you at our booth at IBC to experience it (Hall 14 Stand 14)!

Other advances in recognising our unique “voice-prints” promise to uniquely identify users making it more secure also able to understand the most likely context of a request. For example, “Play the game from last night” would obviously be a request referring to Swansea City’s latest win, and “when the new series of Narcos is available, download the first episode to my phone” means that it will be my phone rather than my daughter’s which has the latest cartel-inspired bloodbath downloaded to it.

The proliferation of low-cost streaming devices over the past few years in parallel with the availability of world-class OTT services has seen a resurgence in the 10ft, big-screen experience. The software has improved dramatically, content production is better than ever and the audience has matured… and become so much more demanding. Perhaps the only thing that hasn’t changed in the new dawn of TV is the remote control…click,click,click.

Voice-control with its Zero-UI paradigm coupled with stripped back visual design and data-driven, intelligent, contextual recommendations is the next seismic shift in TV.

Accedo is working with companies and services to create next generation TV experiences. We are pretty excited about this area. Come and visit us at IBC (Hall 14, Stand E14) to experience the future of TV. You can book a meeting here –

Stay in the Know

Sign up for Accedo's latest blogs straight to you inbox.

Pardot form (Text field + button)

By signing up I agree to Accedo’s Privacy Policy and Terms of Service