Where is voice tech going?

2020 has been all but normal. For businesses and brands. For innovation. For people.

The trajectory of business growth strategies, travel plans and lives have been drastically altered due to the COVID-19 pandemic, a global economic downturn with supply chain and market issues, and a fight for equality in the Black Lives Matter movement — amongst all that complicated lives and businesses already.

One of the biggest stories in emerging technology is the growth of different types of voice assistants:

  • Niche assistants such as Aider that provide back-office support.
  • Branded in-house assistants such as those offered by BBC and Snapchat.
  • White-label solutions such as Houndify that provide lots of capabilities and configurable tool sets.

With so many assistants proliferating globally, voice will become a commodity like a website or an app. And that’s not a bad thing — at least in the name of progress. It will soon (read: over the next couple years) become table stakes for a business to have voice as an interaction channel for a lovable experience that users expect. Consider that feeling you get when you realize a business doesn’t have a website: It makes you question its validity and reputation for quality. Voice isn’t quite there yet, but it’s moving in that direction.

Voice assistant adoption and usage are still on the rise

Adoption of any new technology is key. A key inhibitor of technology is often distribution, but this has not been the case with voice. Apple, Google, and Baidu have reported hundreds of millions of devices using voice, and Amazon has 200 million users. Amazon has a slightly more difficult job since they’re not in the smartphone market, which allows for greater voice assistant distribution for Apple and Google.

Image Credits: Mark Persaud

But are people using devices? Google said recently there are 500 million monthly active users of Google Assistant. Not far behind are active Apple users with 375 million. Large numbers of people are using voice assistants, not just owning them. That’s a sign of technology gaining momentum — the technology is at a price point and within digital and personal ecosystems that make it right for user adoption. The pandemic has only exacerbated the use as Edison reported between March and April — a peak time for sheltering in place across the U.S.

Image Credits: Mark Persaud

When we look at the adoption cycle, voice is evolving in different stages. Measured by monthly active users, we are still in early stages of voice’s overall adoption lifecycle with devices such as smartwatches. But use of smartphones has penetrated half the U.S. population. Voice search is mature, with two-thirds of the U.S. population using it because they’re comfortable with it. As with most technologies, change happens unevenly. “Voice first” doesn’t mean everyone is using voice the same way, rather in a breadth of ways, which speaks to its applicability across contexts.

Voice is global

It’s all too easy to think of voice just in the context of the U.S. market, but voice is a global phenomenon. China accounts for 30%-40% of smart speaker sales, and the rate of total installed base is catching up. Albeit the digital context for using voice is different in China, it’s usually tied to a super app’s ecosystem.

Regional differences become even more striking when you examine the different assistants catching on globally. The big voice assistants such as Alexa, Cortana, Google Assistant and Siri do not speak for the world.

Image Credits: Mark Persaud

This is a global technology adoption and consumer behavior movement, which makes it exceedingly exciting to be involved with and continue to explore for businesses around the world.

Voice design and sonic branding are becoming more prevalent

With all these (perhaps commoditized) voice experiences, remember that value gets created from the experience and relationship established with users. Voice design and voice user interface (VUI) creation still greatly matter, and will continue to grow in importance. It’s far too easy to create poor voice experiences — unfortunately the public has seen many, many poor Alexa skills or Google Actions that leave you in a voice interaction loop or an inability to course correct. A poor voice user experience is frustrating for users and more harmful to a brand than a bad text-based website interaction.

That’s because a voice-based experience is less forgiving. With a poorly designed VUI, the user lacks a way to decipher the content or information further. User comments like “Where do I go from here?”, “That’s not what I asked” and “I’m not sure what to do with that information” are statements that VUI designers do not want to hear. This is, of course, provided that the user was understood by the automated speech recognition (ASR) and natural language understanding (NLU), and received a response from the voice application.

All of this decreases the user’s trust in the medium and pushes them back to, say, websites or phone calls. As a result, the bad brand experience might result in the user not wanting to interact with the brand via the voice interface again, which will be a major setback when competitors are thriving in the space and voice commerce becomes more prevalent. It’s tempting and easy for users to try voice and say, “I like the old way better” because the old way is more reliable, or they know how to navigate it. That’s the common issue with the new and change altogether.

The uptake of voice assistants reminds me of the adoption of websites into mainstream society. Websites weren’t always as helpful or as beautiful as they are today. While many factors influenced the proliferation of websites (the internet, internet speed, browser compatibility, mobile versions, etc.), it all started with content sharing and simple functionality. Over time, websites have evolved into aesthetically beautiful, eye-luring, easily navigable media.

Voice will be no different, having started with a very wide breadth of voice experiences and homing in on what works and what doesn’t for the users and brands they serve, to adding contextual relevancy for where they’re being used, and last to adding personality and sonic branding.

Some brands (McDonald’s and CBS to name a few) have adopted a jingle or sonic brand. When you hear their familiar notes, you think of the brands. Those moments of familiarity pay off years of effort and user training with the voice medium.

Additionally, consider brands that have a strong brand personality such as Slim Jim, Headspace and Airbnb that are utilized to create voice-based experiences with personalities to complement their visual identities. This comes to life when brand voice experience considers tone, timbre, intonation and lexicon. Literally being able to exude the brand voice straight to a user’s ears. This will push the brand-user relationship to be even stronger (perhaps even reestablishing loyalty in newer generations), when done correctly.

Addressing 2020 head-on with voice

Contactless (commercial, public, retail) interactions

As brands address the health and safety concerns of consumers to restart their businesses, contactless interactions rise to the top. Removing (or minimizing) the physical touchpoints of a business is making people think digital-first in a quick, prioritized way as, for many businesses, their livelihood depends on it in a way not felt before. Businesses are adapting their mindset from “when I have time for digital’ to “digital has to happen now.”

Using voice-enabled applications has now become a part of that transformation — to do everything from browsing, getting information and navigating to ordering products and checking out. From a personal health standpoint, using our voices is less risky behavior than an interface that requires touching a user-shared screen or paying with and receiving unsanitized cash (activities that usually require you to be within six feet of others, especially strangers). The airport and restaurant industries will likely be the first to address these issues as they’ve been hit hard with today’s pandemic and the recessionary economy.

Assisting at-home education

In the spring of 2020, many parents everywhere suddenly became de facto home schoolers as schools shut down and kids were sent home. This unbelievably stressful burden may continue into the fall. The situation is untenable. A recently published New York Times article says it all: “In the Covid-19 Economy, You Can Have a Kid or a Job. You Can’t Have Both.

Voice is attempting to provide some relief. Google showed us one example. Earlier in 2020, Google launched a new voice assistant that helps parents who are home-schooling their kids. Titled Diya, the assistant is designed to teach children how to read. Diya uses stories and word games to help kids five and up. Diya uses Google’s speech recognition technology to spot mistakes and areas that are challenging kids. I imagine there are more ways voice can and will help parents as they attempt to manage the demands of working and home-schooling.

Empowering physical and mental health

As people sought ways to understand the health threat created by COVID-19, the Mayo Clinic introduced an Alexa skill for people to get answers to questions about COVID-19. This was an important example of how voice could contribute to the well-being of others while simplifying access.

Of course, the pandemic has created unprecedented levels of stress as people manage the health threat of an unchecked pandemic, forced isolation, and the threat of job loss and economic instability. People are struggling to cope. I see a meaningful opportunity for voice to help people manage mental health. For example, MoonPie created a virtual roommate that entertains people stuck at home in isolation — a whimsical example, to be sure, but in 2020, entertainment has taken on a more meaningful role.

Meanwhile, meditation app Headspace provides a voice-based interface to make it easier to meditate with a voice command. That kind of a tool could be a lifesaver for anyone who counts themselves among the surging numbers of people fighting mental exhaustion and stress.

Sharing workplace culture at home

The future of the workplace remains uncertain. Some companies are slowly opening their brick-and-mortar locations and offices. Others are not. Twitter famously told employees they can work at home indefinitely. This dramatic change in how we work creates new challenges, such as maintaining a sense of culture when people are not in the same place.

For example, using voice to share customized messages amongst colleagues, or using random voice Easter eggs to mimic someone stopping by your desk to share an inside joke. We miss our colleagues and their ad hoc banter, their interesting insights and their supportive attitudes (the terms “work-wife” or “work-husband” exist for a reason). Voice can help people make life apart have more lovable teammate moments and reinvigorate the culture we’re missing.

Supporting social awareness (and justice)

In the wake of the global social equality unrest that erupted around the world, Amazon, Apple and Google made some important changes to Alexa, Siri and Google Assistant. As a number of news outlets reported, if you ask Google Assistant whether Black lives matter, Google Assistant began providing more thoughtful replies, such as, “Black people deserve the same freedoms afforded to everyone in this country, and recognizing the injustice they face is the first step towards fixing it.” If you asked whether “all lives matter,” Google Assistant replies, “Saying ‘Black Lives Matter’ doesn’t mean that all lives don’t. It means Black lives are at risk in ways that others are not.” Both Alexa and Siri respond with similarly sensitive, nuanced answers instead of “of course,” or “I don’t understand your question.”

Enterprises might do well to listen to ideas bubbling up at a grassroots level. I recently read about a Reddit user who developed a Siri shortcut that makes it possible for someone when being pulled over by the police, to say, “Hey, Siri, I’m getting pulled over” — which results in Siri sending your current location to a designated person and automatically starts recording a video.

How might businesses go beyond using voice to make us more aware of Black Lives Matter to actually helping protect social justice and civic responsibility?

What does this all mean

The possibilities for voice are ever expanding — getting smarter, more personalized, in more contexts, assisting with broader messaging — especially in how it fits into a brand’s digital ecosystem, and more importantly the consumer’s ecosystem. Start investigating your voice ideas by running a voice design sprint. It’s a new world, and voice technology is shaping it.