Posted by SimonPenson
This post serves a dual purpose: it’s a practical guide to the realities of preparing for voice right now, but equally it’s a rallying call to ensure our industry has a full understanding of just how big, disruptive, and transformational it will be — and that, as a result, we need to stand ready.
My view is that voice is not just an add-on, but an entirely new way of interacting with the machines that add value to our lives. It is the next big era of computing.
Brands and agencies alike need to be at the forefront of that revolution. For my part, that begins with investing in the creation of a voice team.
Let me explain just how we plan to do that, and why it’s being actioned earlier than many will think necessary….
Jump to a section:
Why is voice so important?
When is it coming in a big way?
Who are the big players?
Where do voice assistants get their data from?
How do I shape my strategy and tactics to get involved?
What skill sets do I need in a “voice team?”
“The times, they are a-changing.”
– Bob Dylan
Back in 1964, that revered folk-and-blues singer could never have imagined just what that would mean in the 21st century.
As we head into 2018, we’re nearing a voice interface-inspired inflection point the likes of which we haven’t seen before. And if the world’s most respected futurist is to be believed, it’s only just beginning.
Talk to Ray Kurzweil, Google’s Chief Engineer and the man Bill Gates says is the “best person to predict the future,” and he’ll tell you that we are entering a period of huge technological change.
For those working across search and many other areas of digital marketing, change is not uncommon. Seismic events, such as the initial roll out of Panda and Penguin, reminded those inside it just how painful it is to be unprepared for the future.
At best, it tips everything upside down. At worst, it kills those agencies or businesses stuck behind the curve.
It’s for exactly this reason that I felt compelled to write a post all about why I’m building a voice team at Zazzle Media, the agency I founded here in the UK, as stats from BrightEdge reveal that 62% of marketers still have no plans whatsoever to prepare for the coming age of voice.
I’m also here to argue that while the growth traditional search agencies saw through the early 2000s is over, similar levels of expansion are up for grabs again for those able to seamlessly integrate voice strategies into an offering focused on the client or customer.
Winter is coming!
Based on our current understanding of technological progress, it’s easy to rest on our laurels. Voice interface adoption is still in its very early stages. Moore’s Law draws a (relatively) linear line through technological advancement, giving us time to take our positions — but that era is now behind us.
According to Kurzweil’s thesis on the growth of technology (the Law of Accelerating Returns),
“we won’t experience 100 years of progress in the 21st century – it will be more like 20,000 years.”
Put another way, he explains that technology does not progress in a linear way. Instead, it progresses exponentially.
“30 steps linearly get you to 30. One, two, three, four, step 30 you’re at 30. With exponential growth, it’s one, two, four, eight. Step 30, you’re at a billion,” he explained in a recent Financial Times interview.
In other words, we’re going to see new tech landing and gaining traction faster than we ever realized it possible, as this chart proves:
Above, Kurzweil illustrates how we’ll be able to produce computational power as powerful as a human brain by 2023. By 2037 we’ll be able to do it for less than a one-cent cost. Just 15 years later computers will be more powerful than the entire human race as a whole. Powerful stuff — and proof of the need for action as voice and the wider AI paradigm takes hold.
So, what does that mean right now? While many believe voice is still a long ways off, one point of view says it’s already here — and those fast enough to grab the opportunity will grow exponentially with it. Indeed, Google itself says more than 20% of all searches are already voice-led, and will reach 50% by 2020.
Let’s first deal with understanding the processes required before then moving onto the expertise to make it happen.
What do we need to know?
We’ll start with some assumptions. If you are reading this post, you already have a good understanding of the basics of voice technology. Competitors are joining the race every day, but right now the key players are:
- Microsoft Cortana – Available on Windows, iOS, and Android.
- Amazon Alexa – Voice-activated assistant that lives on Amazon audio gear (Echo, Echo Dot, Tap) and Fire TV.
- Google Assistant – Google’s voice assistant powers Google Home as well as sitting across its mobile and voice search capabilities.
- Apple Siri – Native voice assistant for all Apple products.
And (major assistants) coming soon:
- Samsung Bixby – Native voice assistant for Samsung products.
- (Yet to be named) Facebook assistant – They already have M for Messenger, and Mark Zuckerberg is personally testing “Jarvis AI” in his home.
All of these exist to allow consumers the ability to retrieve information without having to touch a screen or type anything.
That has major ramifications for those who rely on traditional typed search and a plethora of other arenas, such as the fast-growing Internet of Things (IoT).
In short, voice allows us to access everything from our personal diaries and shopping lists to answers to our latest questions and even to switch our lights off.
Apart from the tidal wave of tech now supporting voice, there is another key reason for investing in voice now — and it’s all to do with the pace at which voice is actually improving.
In a recent Internet usage study by KPCB, Andrew NG, chief scientist at Chinese search engine Baidu, was asked what it was going to take to push voice out of the shadows and into its place as the primary interface for computing.
His point was that at present, voice is “only 90% accurate” and therefore the results are sometimes a little disappointing. This slows uptake.
But he sees that changing soon, explaining that “As speech recognition accuracy goes from, say, 95% to 99%, all of us in the room will go from barely using it today to using it all the time. Most people underestimate the difference between 95% and 99% accuracy — 99% is a game changer… “
When will that happen? In the chart below we see Google’s view on this question, predicting we will be there in 2018!
Is this the end for search?
It is also important to point out that voice is an additional interface and will not replace any of those that have gone before it. We only need to look back at history to see how print, radio, and TV continue to play a part in our lives alongside the latest information interfaces.
Moz founder Rand Fishkin made this point in a recent WBF, explaining that while voice search volumes may well overtake typed terms, the demand for traditional SERP results and typed results will continue to grow also, simply because of the growing use of search.
The key will be creating a channel strategy as well as a method for researching both voice and typed opportunity as part of your overall process.
The key difference when considering voice opportunity is to think about the conversational nature that the interface allows. For years we’ve been used to having to type more succinctly in order to get answers quickly, but voice does away with that requirement.
Instead, we are presented with an opportunity to ask, find, and discover the things we want and need using natural language.
This means that we will naturally lengthen the phrases we use to find the stuff we want — and early studies support this assumption.
In a study by Microsoft and covered by the brilliant Purna Virji in this Moz post from last year, we can see a clear distinction between typed and voice search phrase length, even at this early stage of conversational search. Expect this to grow as we get used to interacting with voice.
The evidence suggests that will happen fast too. Google’s own data shows us that 55% of teens and 40% of adults use voice search daily. Below is what they use it for:
While it is easy to believe that voice only extends to search, it’s important to remember that the opportunity is actually much wider. Below we can see results from a major 2016 Internet usage study into how voice is being used:
Clearly, the lion’s share is related to search and information retrieval, with more than 50% of actions relating to finding something local to go/see/do (usually on mobile) or using voice as an interface to search.
But an area sure to grow is the leisure/entertainment sector. More on that later.
The key question remains: How exactly do you tap into this growing demand? How do you become the choice answer above all those you compete with?
With such a vast array of devices, the answer is a multi-faceted one.
Where is the data coming from?
To answer the questions above, we must first understand where the information is being accessed from and the answer, predictably, is not a simple one. Understanding it, however, is critical if you are to build a world-class voice marketing strategy.
To make life a little easier, I’ve created an at-a-glance cheat sheet to guide you through the process. You can download it by clicking on the banner below.
In it, you’ll find an easy-to-follow table explaining where each of the major voice assistants (Siri, Cortana, Google Assistant, and Alexa) retrieve their data from so you can devise a plan to cover them all.
The key take away from that research? Interestingly, Bing has every opportunity to steal a big chunk of market share from Google and, at least at present, is the key search engine to optimize for if voice “visibility” is the objective.
Bing is more important now.
Of all the Big Four in voice, three (Cortana, Siri, and Alexa) default to Bing search for general information retrieval. Given that Facebook (also a former Bing search partner) is also joining the fray, Google could soon find itself in a place it’s not entirely used to being: alone.
Now, the search giant usually finds a way to pull back market share, but for now a marketers’ focus should be on Microsoft’s search engine and Google as a secondary player.
Irrespective of which engine you prioritize there are two key areas to focus on: featured snippets and local listings.
The search world has been awash with posts and talks on this area of optimization over recent months as Google continues to push ahead with the roll out of the feature-rich SERP real estate.
For those that don’t know what a “snippet” is, there’s an example below, shown for a search for “how do I get to sleep”:
Not only is this incredibly valuable traditional search real estate (as I’ve discussed in an earlier blog post), but it’s a huge asset in the fight for voice visibility.
Initial research by experts such as Dr. Pete Myers tells us, clearly, that Google assistant is pulling its answers from snippet content for anything with any level of complexity.
Simple answers — such as those for searches about sports results, the weather, and so forth — are answered directly. But for those that require expertise it defaults to site content, explaining where that information came from.
At present, it’s unclear how Google plans to help us understand and attribute these kinds of visits. But according to insider Gary Illyes, it is imminent within Search Console.
Measurement will clearly be an important step in selling any voice strategy proposal upwards and to provide individual site or brand evidence that the medium is growing and deserving of investment.
User intent and purchase
Such data will also help us understand how voice alters such things as the traditional conversion funnel and the propensity to purchase.
We know how important content is in the traditional user journey, but how will it differ in the voice world? There’s sure to be a rewrite of many rules we’ve come to know well from the “typed Internet.”
Applying some level of logic to the challenge, it’s clear that there’s a greater degree of value in searches showing some level of immediacy, i.e. people searching through home assistants or mobiles for the location of something or time and/or date of the same thing.
Whereas with typed search we see greater value in simple phrases that we call “head terms,” the world is much more complex in voice. Below we see a breakdown of words that will trigger searches in voice:
To better understand this, let’s examine a potential search “conversation.”
If we take a product search example for, let’s say, buying a new lawn mower, the conversation could go a little like this:
[me] What’s the best rotary lawn mower for under £500?
[voice assistant] According to Lawn Mower Hut there are six choices [reads out choices]
Initially, voice will struggle to understand how to move to the next logical question, such as:
[voice assistant] Would you like a rotary or cylinder lawn mower?
Or, better still…
[voice assistant] Is your lawn perfectly flat?
[voice assistant] OK, may I suggest a rotary mower? If so then you have two choices, the McCulloch M46-125WR or the BMC Lawn Racer.
In this scenario, our voice assistant has connected the dots and asks the next relevant question to help narrow the search in a natural way.
Natural language processing
To do this, however, requires a step up in computer processing, a challenge being worked on as we speak in a bid to provide the next level of voice search.
To solve the challenge requires the use of so-called Deep Neural Networks (DNNs), interconnected layers of processing units designed to mimic the neural networks in the brain.
DNNs can work across everything from speech, images, sequences of words, and even location before then classifying them into categories.
It relies on the input of truckloads of data so it can learn how best to bucket those things. That data pile will grow exponentially as the adoption of voice accelerates.
What that will mean is that voice assistants can converse with us in the same way as a clued-up shop assistant, further negating the need for in-store visits in the future and a much more streamlined research process.
In this world, we start to paint a very different view of the “keywords” we should be targeting, with deeper and more exacting phrases winning the battle for eyeballs.
As a result, the long tail’s rise in prominence continues at pace, and data-driven content strategies really do move to the center of the marketing plan as the reward for creating really specific content increases.
We also see a greater emphasis placed on keywords that may not be on top of the priority list currently. If we continue to work through our examples, we can start to paint a picture of how this plays out…
In our lawnmower purchase example, we’re at a stage where two options have been presented to us (the McCulloch and the BMC Racer). In a voice 1.0 scenario, where we have yet to see DNNs develop enough to know the next relevant question and answer, we might ask:
[me] Which has the best reviews?
And the answer may be tied to a 3rd party review conclusion, such as…
[voice assistant] According to Trustpilot, the McCulloch has a 4.5-star rating versus a 3.5-star rating for the BMC lawn mower.
Suddenly, 3rd party reviews become more valuable than ever as a conversion optimization opportunity, or a strategy that includes creating content to own the SERP for a keyword phrase that includes “review” or “top rated.”
And where would we naturally go from here? The options are either directly to conversion, via some kind of value-led search (think “cheapest McCulloch M46-125W”), or to a location-based one (“nearest shop with a McCulloch M46-125WR”) to allow me to give it a “test drive.”
This single journey gives us some insight into how the interface could shape our thinking on keyword prioritization and content creation.
Pieces that help a user either make a decision or perform an action around the following trigger words and phrases will attract greater interest and traffic from voice. Examples could include:
- top rated
- best deal
Many are not dissimilar to typed search, but clearly intent priorities change. The aforementioned Microsoft study also looked at how this may work, suggesting the following order of question types and their association with purchase/action:
This also pushes the requirement for serious location-based marketing investment much higher up the pecking order.
We can clearly see how important such searches become from a “propensity to buy/take action” perspective.
It pays to invest more in ensuring the basics are covered, for which the Moz Local Search Ranking Factors study can be a huge help, but also in putting some weight behind efforts across Bing Places. If you are not yet set up fully over there, this simple guide can help.
Local doesn’t start and end with set up, of course. To maximize visibility there must be an ongoing local marketing plan that covers not just the technical elements of search but also wider marketing actions that will be picked up by voice assistants.
We already know, for instance, that engagement factors are playing a larger part of the algorithmic mix for local, but our understanding of what that really means may be limited.
Engagement is not just a social metric but a real world one. Google, for instance, knows not just what you search for but where you go (via location tracking and beacon data), what you watch (via YouTube), the things you are interested in, and where you go (via things such as Flight search and Map data). We need to leverage each of these data points to maximize effect.
As a good example of this in action, we mentioned review importance earlier. Here it plays a significant part of the local plan. A proactive review acquisition strategy is really important, so look to build this into your everyday activity by proactively incentivizing visitors to leave them. This involves actively monitoring on all the key review sites, not just your favorite!
Use your email strategy to drive this behavior as well by ensuring that newsletters and offer emails support the overall local plan.
And a local social strategy is also important. Get to know your best customers and most local visitors and turn them into evangelists.
Doing it is easier than you might think; you can use Twitter mention monitoring not only to search for key terms, but also mentions within specific latitude/longitude settings or radius.
Advanced search also allows you to discover tweets by location or mentioning location. This can be helpful as research to discover the local questions being asked.
The awesome team at Zapier covered this topic in lots of detail recently, so for those who want to action this particular point I highly recommend reading this post.
Let’s go deeper
There is new thinking needed if the opportunity is to be maximized. To understand this, we need to go back to our user journey thought process.
For starters, there’s the Yelp/Alexa integration. While the initial reaction may be simply to optimize listings for the site, the point is actually a wider one.
Knowing that many of the key vertical search engines (think Skyscanner [travel], Yelp [local], etc.) will spend big to ensure they have the lion’s share of voice market, it will pay to spend time improving your content on these sites.
Which is most important will be entirely dependent upon what niche you are working in. Many will only offer limited opportunity for optimization, but being there and spending time ensuring your profile is 110% will be key. It may even pay to take sponsored opportunities within them for the added visibility it may give you in the future.
There’s also the really interesting intellectual challenge of attempting to map out as many potential user journeys as possible to and from your business.
Let’s take our lawnmower analogy again, but this time from the perspective of a retailer situated within 20 miles of the searcher. In this scenario, we need to think about how we might be able to get front and center before anyone else if we stock the McCulloch model they are looking for.
If we take it as a given that we’ve covered the essentials, then we need to think more laterally.
It’s natural to not only look for a local outlet that stocks the right model, but when it may be open. We might also ask more specific questions like whether they have parking, or even if they are busy at specific times or offer appointments.
The latter would be a logical step, especially for businesses that work in this way; think dentists, doctors, beauty salons, and even trades. The opportunity to book a plumber at a specific time via voice would be a game changer for those set up to offer it.
Know your locality
As a local business, it is also imperative that you know the surrounding areas well and to be able to prove you’ve thought about it. This includes looking at how people talk about key landmarks from a voice perspective.
We often use slang or shortened versions of landmark naming conventions, for instance. In a natural, conversational setting, you may find that you miss out if you don’t use those idiosyncrasies within the content you produce and feature on your site or within your app.
Fun and entertainment
Then, of course, comes the “fun.” Think of it as the games section of the App Store — it makes little logical sense, but in it lies a whole industry of epic proportions.
Voice will give birth to the next era in entertainment. While some of you may be thinking about how to profit from such an active audience, the majority of brands would be smart to see it as an engagement and brand awareness world.
Game makers will clamber to create hit mind games and quizzes, but those that play around the edges may well be the monarchs of this opportunity. Think about how voice could change the dynamic for educators, play the part of unbiased referees in games, or teach birdsong and the birds to which they relate. The opportunity is endless — and it will claim 25% of the overall pie, according to current usage research.
The monetization methods are yet to be uncovered, but the advertising opportunity is significant, as well as how clever technology like Blockchain may enable frictionless payments and more.
User journey mapping
So how do you tie all of this together into a seamless plan, given the complexity and number of touch points available? The answer starts and ends with user journey mapping.
This is something I find myself doing more and more now as part of the wider marketing challenge. Fragmented audiences and a plethora of devices and technology mean it’s more difficult than ever to build an integrated strategy. Taking a user-centric approach is the only way to make sense of the chaos.
Voice is no different, and the key differentiator here is the fact that in this new world a journey is actually a conversation (or a series of them).
Conversation journey mapping
While the tech may not yet be there to support conversations in voice, given the point at the beginning of this piece around the law of Accelerating Returns, it’s clear that it’s coming — and faster than we realize.
In some respects, the timing of that advancement is irrelevant, however, as the process of working through a series of conversations that a potential client or customer may have around your product or service is invaluable as research for your plan.
To go back to our lawnmower example, a conversation mapping exercise may look a little like this:
[me] What’s the best lawnmower for under £500?
[voice assistant] How large is your lawn?
[me] It’s not very big. I don’t need a ride-on.
[voice assistant] OK so would you prefer a cylinder or rotary version?
[me] I don’t know. How do I choose?
[voice assistant] If you want stripes and your lawn is very flat, a cylinder gives a better finish. If not, a rotary is better.
[me] OK, definitely a rotary then!
[voice assistant] Good choice. In that case, your best options are either the McCulloch M46-125WR or the BMC Lawn Racer.
[me] Which is best?
[voice assistant] According to Trustpilot, the McCulloch has 4.5 stars from 36 reviews versus 3.5 stars for the BMC. The McCulloch is also cheaper. Do you want me to find the best deal or somewhere nearby that stocks it?
[me] I’d like to see it before buying if possible.
[voice assistant] OK, ABC Lawn Products is 12 miles away and has an appointment at 11am. Do you want to book it?
Where are the content or optimization opportunities?
Look carefully above and you’ll see that there are huge swathes of the conversation that lend themselves to opportunity, either through content creation or some other kind of optimization.
To spell that out, here’s a possible list:
- Guide – Best lawnmower for £500
- Guide – Rotary versus cylinder lawnmowers
- Review strategy – Create a plan to collect more reviews
- Optimization – Evergreen guide optimization strategy to enhance featured snippet opportunities
- Local search – Optimize business listing to include reviews, opening times, and more
- Appointments – Open up an online appointment system and optimize for voice
In developing such a roadmap, it’s also important to consider the context within which the conversation is happening.
Few of us will ever feel entirely comfortable using voice in a crowded, public setting, for instance. We’re not going to try using voice on a bus, train, or at a festival anytime soon.
Instead, voice interfaces will be used in private, most likely in places such as homes and cars and places where it’s useful to be able to do multiple things at once.
Setting the scene in this way will help as you define your conversation possibilities and the optimization opportunities from it.
What people do we need to create all this?
The one missing piece of the jigsaw as we prepare for the shift to voice? People.
All of the above require a great deal of work to perfect and implement, and while the dust still needs to clear on the specifics of voice marketing, there are certain skill sets that will need to pull together to deliver a cohesive strategy.
For the majority, this will simply mean creating project groups from existing team members. But for those with the biggest opportunities (think recipe sites, large vertical search plays, and so on), it may be that a standalone team is necessary.
Here’s my take on what that team will require:
- Developer – with specific skill in creating Google Home Actions, Alexa Skills, and so on.
- Researcher – to work with customer groups to understand how voice is being used and capture further opportunities for development.
- SEO – to help prioritize content creation and how it’s structured and optimized.
- Writer – to build out the long-tail content and guides necessary.
- Voice UX expert – A specialist in running conversation mapping sessions and turning them into brilliant user journeys for the different content and platforms your brand utilizes.
If you’ve read to this point, you at least have an active interest in this fast-moving area of tech. We know from the minds of the most informed experts that voice is developing quickly and that it clearly offers significant benefits to its users.
When those two key things combine, alongside a lowering cost to the technology needed to access it, it creates a tipping point that only ends one way: in the birth of a new era for computing.
Such a thing has massive connotations for both digital and wider marketing, and it will pay to have first-mover advantage.
That means educating upwards and beginning the conversation around how voice interfaces may change your own industry in the future. Once you have that running, who knows where it might lead you?
For some, it changes little, for others everything, and the good news for search marketers is that there are a lot of existing tactics and skill sets that will have an even bigger part to play.
- The ability to claim featured snippets and answer boxes becomes even more rewarding as they trigger millions of voice searches.
- Keyword research has a wider role in forming strategies to reach into voice and outside traditional search, as marketers become more interested in the natural language their audiences are using.
- Local SEO wins become wider than simply appearing in a search engine.
- Micro-moments become more numerous and even more specific than ever before. Research to uncover these becomes even more pivotal.
New opportunities to consider
- Increases in content consumption through further integration in daily life — so think about what other kinds of content you can deliver to capture them.
- Think Internet of Things integration and how your brand may be able to provide content for those devices or to help people use connected home.
- Look at what Skills/Actions you can create to play in the “leisure and entertainment” sector of voice. This may be as much about an engagement/awareness play than pure conversion or sales, but it’s going to be a huge market. Think quick games, amazing facts, jokes, and more…
- Conversation journey mapping is a powerful new skill to be learned and implemented to tie all content together.
Here’s to the next 50 years of voice interface progress!
Sign up for The Moz Top 10, a semimonthly mailer updating you on the top ten hottest pieces of SEO news, tips, and rad links uncovered by the Moz team. Think of it as your exclusive digest of stuff you don’t have time to hunt down but want to read!