Technical SEO in a Semantic Search World

This is the deck I presented at a recent session “Advanced SEO” for the Digital Summit conference in Denver, called “Technical SEO in a Semantic Search World”.

This presentation was actually a really fun opportunity for me to talk about something that’s been on my mind – the changing role of SEO in the world of Semantic Search. I’ve been doing SEO professionally since 2003. In the past 3 years, I’ve seen more big, substantive changes to the field than I saw at any time in the 9 years that preceded that. We as SEOs, and as an industry, need to think about how our roles evolve and develop so that we can continue to add value to our organizations and clients, rather than selling them a Chinese menu of outdated optimization tactics.

On slide 2, I talked a bit about the state of the web at large. And the state of the web is that it’s a mess. The web is a heaving mass of unstructured data, that grows exponentially, is completely unsupervised, and constantly has new technologies stirred into the stew.

On slide 3 my point was that technical SEO represents our best toolset for elevating our sites out of the mess that is the web. But today, technical SEO is table stakes. In the game of organic search, technical SEO is simply the minimum amount you can ante to be in the game. 10 years ago, doing technical SEO meant you were guaranteed some solid first page rankings in return for your effort. Today, basic technical SEO will get you ranked somewhere in the top 100 for a few of your keywords. It’s that much more competitive.

And so, on slide 4, I talked about all of the things that we as SEOs have to keep our eyes on. SearchMetrics rankings correlations show all the factors that might be correlated with a single URL ranking. MOZ study data show the difference between page-level vs domain-level ranking factors, and whether they’re keyword dependent, or keyword independent. Add to that the Penguin and Panda algorithms, and keyword data disappearing into (not provided), and you get a sense for how busy the average workday of an SEO is.

I think that we as an industry are falling into the trap of paying attention to the tactical elements of SEO that we are familiar with, and not thinking about the forest from which all these trees are growing.

On slide 5 I discussed the new world of semantic search. Specifically, if you were to ask most any SEO today “What is semantic search?” you would be likely to hear something along the lines of:

  • Semantic search is about understanding the meaning of queries, and delivering results based on an understanding of the context in which the query was made
  • Semantic search is about “things not strings
  • Things in semantic search are “entities”
  • Entities are organized according to an ontology
  • Entities can have multiple properties based on their type
  • Entities can have relationships to other entities
  • Semantic search has its underpinnings in artificial intelligence (AI), deep learning, and natural language processing (NLP)

However, despite knowing all these things academically, they often haven’t filtered through to the extent that we have identified specific tactical strategies to address semantic search with technical SEO.  We may be fine talking about semantic search in the abstract, but we often default back to obsessing over character count in our title tags, because that is a more “knowable” field where we feel efficacious in our action.

All of that has to change.  So we know that semantic search changes things for our profession, but how?  Specifically, what does this change mean for us in our day-to-day jobs as we think about how to align our strategy and tactical delivery to the world of semantic search?

And honestly, I don’t think that semantic search requires a massive retrofit for us as SEOs.  Most of the stuff that you consider fundamental, technical SEO is still important, it just might be important for different reasons.  I.e., having descriptive, compelling title tags is still a best practice, because it allows you to put an effective marketing message in front of a human visitor right when they’re likely to be responsive to it.  In practical terms, a visitor may not see your title tag exactly as you wrote it. Google may be rewriting your title tags based on what it thinks is best for the visitor, or changing the visual formatting of the title.

So the fundamentals are still important, but they’re evolving.  You don’t have to throw out everything you know to adjust your SEO strategy to the semantic search model.  And there’s nothing more fundamental than the 3 pillars of on-page SEO:

  1. Information architecture
    • Depth of site architecture is one of the most important and overlooked aspects of site design
    • I provide steps so you can do your own analysis of your site’s IA
  2. Code
    • Organizations need to start thinking about how they’re going to integrate into developer workflows now
    • I provide examples for how you can insert correct syntax onto your pages to give your developers examples to code to
  3. Content
    • With semantic search, your content has never been more critical to your site’s ability to produce good ROI over the long term
    • Panda is just the most visible (and frightening) manifestation of this new understanding of the “quality” of your content
    • I provide steps for how perform an audit for thin content at scale.


What Google’s Changes Mean for Your Content Marketing Efforts

91% of B2B marketers use content marketing as a tactic, spending $118 billion in 2013 on content marketing, social media, and video.

If you don’t understand some key aspects of Google’s new search algorithm, you may be flushing your content marketing dollars down the drain.  True, talking about search algorithms tends to make eyes glaze over.  But, if you’re like the millions of other businesses that have identified content marketing as a key channel for educating prospective customers and getting them into the sales funnel, you need to know how the game has changed, and what that means for your business.

In the past 2 months, Google has made some significant changes.  First, they completely replaced their core web search algorithm.  Then, they hid the keywords that visitors use to find websites in organic search.

1.        Google Hummingbird

In August 2013, Google completed the change-over to their new search algorithm, Hummingbird.  A complete algorithm change is a BIG event for Google.

If the algorithm is the “recipe” that Google uses for ranking and retrieving results, then individual ranking factors within the recipe (such as using keywords in your title tags) are “ingredients” in the recipe, and Google has stated that there are over 200 ingredients.

All the early evidence indicates that Hummingbird still uses the vast majority of the same ingredients, in the same way.  So, doing good on-page and off-page SEO is just as important as always – more than ever, it’s table stakes for effective web marketing.

The big change with Hummingbird is in how it understands natural-language queries and processes them.  Hummingbird allows Google to understand the intent of queries in a much more intelligent way.  Now, Google may return results that may not contain the exact keywords you used in your query, in that exact order, but the results will match the intent of what you asked for.  For example, you may have Googled “the best French Cajun food in Baton Rouge” and get a website returned that only talks about Acadian cuisine, and doesn’t use the words “French” or “Cajun” anywhere on the page.  It matches your intent, but not the exact words.

Google is now much better at understanding “entities” or “things” and not just keywords or “strings”.

2.       Google Hides Your Organic Keyword Data

Next, in September, Google announced they’re encrypting all searches performed on Google.  That doesn’t mean anything to the average user of Google, but it means a foundational shift for companies that care about the organic search traffic that their site gets.  Anyone with access to web analytics for any site they work on will notice that a huge percentage of organic search traffic is now being lumped into the black box of “(not provided)”.  What this means is that most companies that get 80% or more of their organic search traffic from Google will no longer be able to see what terms visitors typed in to find their sites.

There are ways to work around this new limitation, and your resident SEO should be able to speak to them.  But those are tactical fixes, and what you should be looking for is a strategic solution.

1+2 = 4

So, let’s do a quick summary of what we know:

  • Hummingbird allows Google to understand the intent of queries much better
  • Google applies this refined query knowledge to entities (“things, not strings”) which allows them to consider a larger set of documents as potential results
  • Google no longer lets you see exactly what keywords visitors used to find your site

If Google is now very good at understanding the intent of a searcher’s query, but no longer lets me, as the site owner, see exactly what that query was, where does that leave us?

We are now in a place where the burden is on you, as an organization, to really understand your customers, identify each facet of the business problems that they face, and provide solutions to those problems, or at least being able to describe how your product fits into the landscape.

Absent query data is actually a big problem for lots of organizations.  Query data allowed us, at a tactical level, to see exactly how users were finding our sites, and then use standard web engagement and conversion metrics to attempt to rationalize how well our content matched that user’s needs.  If we didn’t have the right content, we could create new content, or re-write existing content to do give us context-appropriate coverage for that keyword.  It was a very tactical fix.

That approach resulted in SEOs feeding keyword “opportunities” back up the funnel to content writers to plug obvious gaps.  Often, the result was creating a shallow piece of content that contained the keywords, but didn’t address the end users’ needs in a substantive way.

What is required, now more than ever, is a top-down approach to understanding customer needs and developing comprehensive content sets that service those needs.  User personas need to be a driving force behind your content creation.  Are your editorial team and SEO team able to articulate all of the major points along the customer journey for each of your main user personas?  Do you have user personas that you update?  Do you regularly interview your sales staff to integrate feedback from the front lines into your personas?

More than ever, Hummingbird underscores the important of writing content that addresses real customer needs, not just specific keywords.  Conceptually, your content needs to be substantive and solve problems, not just fill “keyword opportunities”.

Best Practices : Integrating SEO Into Your Business

One of the most common problems that I’ve encountered in my 11 years in organic search marketing is around the operational integration of SEO into web projects.  Many organizations have an SEO resource, agency, or department, but still ship websites, redesigns, and products that are not well-optimized.  How does this happen so frequently?

Organizations frequently struggle to achieve the full benefits that search engine optimization has to offer, simply because SEO is not well-integrated into existing project teams and workflows. SEO needs to be fully integrated into an organization at the operational level, having touch-points with multiple functional groups across the website’s (or project’s) lifecycle.

This presentation outlines a framework for the operational integration of SEO with existing functional groups / business units that are stakeholders in the success of a web marketing channel.  A few notes:

  • I outline some “what is SEO?” basics (assuming that it will be presented to a broad audience).
  • I address the issue of SEO deliverables early.
    • Frequently, I’ve found that project teams need to have defined what the specific work product is that they’ll be receiving to relieve anxiety.
    • Because of inadequate information, education, or misinformation, project teams may come into the project with a preconceived notion of what SEO is.  As an SEO, your job is to re-frame the discussion and expectations, early.
  • I go on to discuss how SEO interacts with each of the more common functional groups that are involved with web projects at 3 phases – project initiation, during development, and at launch / ongoing.
    • This is necessarily a simplification of all of the work that will need to occur, but helps to give some shape to the project, what the interactions will be, and the reasoning behind the required interactions.


The SEO Health Check

The metrics that a large number of businesses track related to SEO are, frequently, the wrong ones.  If your SEO team tracks metrics that skew too macro, or too micro, you can end up with a skewed view of your progress in organic search:

The 10,000 Foot View

If your metrics are too high-level, the numbers can mask both problems and opportunity.  A good example is looking only at the total number of visits that your site gets on a monthly basis from organic search.  You may be satisfied if your numbers are stable year-over-year, but what if 90% of your traffic comes from 5 terms, and the only page that ranks is your home page?  That’s a problem.

The 10 Foot View

Many marketers obsess over the rankings of a handful of keywords – sometimes called “vanity rankings”.  Often, these are keywords that drive large volumes of search traffic, or keywords that an executive has decided are strategically important for the business to have visibility for.  While tracking a handful of keyword rankings is fine, you need a broader focus to ensure that your site is performing well in the long tail of search, where 70% of total search volume is.

SEO is simple.  But it’s simple the way baseball is simple – the old saying “throw the ball, hit the ball, catch the ball” comes to mind.  Simple, right?  There’s always complexity lurking below surface, but just like in baseball, your SEO team need to have a quick set of metrics they can refer to that give you meaningful information about how well you’re performing.  I call these metrics the SEO Health Check.

The SEO Health Check

The metrics tracked here are designed to give you a quick, directionally accurate view of how well your site is doing in organic search.  It tracks indexation, landing pages, keywords and traffic (the basis of IRTA, covered in my previous post).

SEO Health Check Dashboard with Dummy Data

Let’s take a look at how to track each of these areas, and then dive into how to interpret the data:

URLs and Indexed Pages

First, you need to know how big your site actually is.  Usually, this will require feedback from your Engineering or Product teams.  You need to determine the number of valid content URLs that your site is capable of generating.  Relying on a “guesstimate” is not sufficient here.

Next, you need to ensure that search engines can find your content, and consider it important enough to keep in their indexes.  Organic search is like fishing with a drag net – the bigger your net, the more fish you’re likely to catch.  The number of your site’s URLs that a search engine has indexed defines the size of that net, so it’s an important metric to keep an eye on.

The easiest way to get an accurate view of this number is via Google Webmaster Tools.  From the Dashboard, click on “Health” and then “Index Status”.  You’ll see a line graph that gives you a year’s worth of historical data on how many URLs from your site Google has indexed.  How close is this number to the total number of valid URLs on your site?  It’s very rare for sites over 10,000 pages to get more than 80% of their pages indexed, so this is a relative measure.

Google Webmaster Tools Chart of Site Indexation Over Time

In the fictitious example, above, you can see that the site had been experiencing some healthy gains in indexation, and then toward the end of June, there was a large, sudden drop.  This kind of data should send your SEO into high alert.  Rapid drops in indexation are usually due to problems with accessibility (your URLs can’t be found by the engines), or penalty situations (engines remove your URLs from the index due to violations of their guidelines).

There are other ways to check indexation levels, like using the site operator.  This method works in Google and Bing, and you can use it to get data on any site you’re interested in (as opposed to just your own).  However, this method is much less precise – it’s only directionally accurate, and less deduplication processing has been done compared to the numbers you see in Google Webmaster Tools, which has its good and bad sides.  You simply go to the engine and type:

The number of results is the number of URLs that the engine has indexed from that domain.

If your site has 10,000 pages and only 1,000 are indexed, your site may have barriers to indexation, or may not be authoritative enough for engines to consider indexing the content worthwhile.  In that case, you probably need help either with improving information architecture, increasing link authority, or both.

However, if your site has 10,000 pages and the site operator returns 40,000 results, that points to a duplicate content problem.  You want to make sure that your site is only rendering 1 piece of content on 1 URL, and not rendering the same content on multiple URLs due to technical issues.  There are many causes for this problem, but it’s something your SEO needs to address.


Landing Pages

You also want a significant number of URLs contributing to your site’s success in organic search – you want as many pages as possible “earning their keep”.  If you have 10,000 pages on your site, and 5,000 of them were landing pages for visitors from organic search, that’s great.  However, if you have 10,000 pages, and only your home page is getting organic search traffic, you have a big problem.

In Google Analytics, you want to first select the Advanced Segment of “Non-paid Search Traffic”.  Then in the left navigation, click “Content”, “Site Content”, then “Landing Pages”.

Tracking this metric year-over-year gives you an idea of how new pages are being found via organic search as their link authority increases, and they begin to rank for new keyword combinations.



Closely related to landing pages, you want people to find your site via a large number of keywords.  70% of total search query volume is in the long tail, so if people are only finding your site through a small handful of terms, you’re missing out on a lot of opportunity.

Even if the number of pages on your site has remained the same year-over-year, if you’re doing the right things in marketing your content, then your site is becoming more authoritative, which means that a given page will be able to rank for new keyword combinations.

Additionally, you want to segment your keywords to see what kinds keywords are driving traffic, because they’re not all created equal.

Brand Keywords:  This includes your brand name, variations on your brand, misspellings, and your domain name.  You can create a Custom Segment in Google Analytics that uses regular expressions to match lots of different variations and misspellings with a single line of code, e.g. (mybrand|my brand|mybrnd|my brnd|.com).

If your brand is strong enough that people search for you by name, that’s great, but you should expect  to dominate the SERPs for your brand.  Doing well here doesn’t tell you much about how well you’re doing SEO.

Non-brand Keywords:  As the name implies, this is every non-branded keyword that drove traffic to your site, and defines the real opportunity in organic search.  A sporting goods store is doing well when they start getting traffic for “basketball shoes” and “football jerseys”, etc, rather than just their store name.


Organic Search Traffic

Last, but certainly not least, you want to look at the traffic you’re getting from organic search.  I prefer looking at data year-over-year, because it corrects for seasonal differences that skew data when you look month-over-month.

You can get as granular with your segmentation as you like, but often it’s enough to look at traffic segmented by the keyword groups we outlined, above:

  • Google Brand Keywords
  • Google Non-brand Keywords
  • Bing Brand Keywords
  • Bing Non-brand Keywords


Interpreting the Data

Let’s take another look at the example SEO Health Check data:

SEO Health Check Dashboard with Dummy Data

Total Site URLs:  This shows that the site grew substantially over the past year, adding 61% more pages (launching new content pages, developing new products, etc).

Indexed URLs:  The number of URLs that engines have indexed also increased substantially, growing by almost a quarter.

% of URLs Indexed:  This is an interesting metric.  Despite the fact that the number of Indexed URLs grew by 24% year-over-year, the rate of increase in indexation is not keeping pace with the growth of the site itself in new pages published.  The number of URLs is increasing faster than the authority of the site is able to drive indexation.  This points to an opportunity to engage in content marketing and link building to get more of those new pages into the index, ranked, and earning traffic, as well as improving the flow of link authority through improving information architecture.

Landing Pages:  The total number of landing pages has increased modestly year-over-year.  However, you also want to be very aware of the traffic distribution amongst these landing pages.  It’s perfectly natural to have a handful of pages account for a relatively large percentage of organic search traffic (e.g., the home page and 4 category pages account for 10% of total organic search visits).  However, the more even your traffic distribution is amongst your pages, the more defensible your traffic is likely to be over the long term.  You want to be very wary of having too much of your traffic dependent on a handful of keyword rankings, which can change at a moment’s notice.

Landing Pages vs Indexed Pages:  This is a further refinement of the Indexation metrics, above.  Getting a URL in the index is an important first step, but that page has to rank highly enough that a search actually sees it and clicks for it to be of any value.  As we saw previously, despite the total number of landing pages increasing, the rate of increase is not keeping pace with the rate that new pages are being indexed.  Lack of link authority is likely the culprit.

Brand Keywords:  The number of brand –related queries people are using to find the site is almost steady-state year over year, which is perfectly fine, and would be expected (unless you had done a major rebranding).

Non-brand Keywords:   We’re seeing a healthy gain here in the number of generic keywords that people are using to find our site, which is a very positive sign.  This indicates a healthy presence in the long tail of search, which is vital to sustaining long-term growth in organic search.

Google Traffic – Brand:  Using the Custom Segment you built for Brand Keywords, you see that Google is driving a modest increase in traffic year-over-year.  How you interpret this metric is highly dependent on how much effort and money you devote to brand building.  If you’ve invested a huge amount in awareness marketing via online banner advertising, television, or the like, you would hope for a much larger increase.  If you don’t invest in brand-building, then numbers that are about even year-over-year would be expected.

Google Traffic – Non-brand:  This is what we’re really after.  Seeing a healthy increase here indicates that we are making real forward progress with our SEO – indexation is driving ranking, which in turn drives traffic.


The metrics that we discussed take some work to generate initially, but once you get them set up, pulling them on a bi-weekly or monthly basis is a trivial amount of work.  Importantly, they are the appropriate level of granularity that, allowing for some introduction to the core concepts, anyone on the product or marketing teams will be able to understand them (this level of detail will most likely not be shared with the C suite).  Most importantly, the SEO Health Check surfaces data that give you meaningful insights into your site’s progress in SEO.

The SEO Framework Part 1: I.R.T.A.

One of the challenges of managing an SEO program effectively is getting colleagues across your organization to understand, at the highest level, what the outcomes of a well-executed SEO campaign are. What defines success? The answer is I.R.T.A.
IRTA stands for:

• Indexation
• Ranking
• Traffic
• Actions

IRTA is the most high-level view of your site’s progress in organic search – the “stratosphere” view, above even the “10,000 foot” view. The particular genius of I.R.T.A. is that it applies to any web site, in any industry, operating with any monetization structure, and is therefore a universally useful mental model for web businesses. If your IRTA is good, your web marketing channel is performing well, no matter if you’re selling industrial equipment, or monetizing content via CPM advertising.

Once you dive beneath this “stratosphere” view, things get complex very quickly, and metrics that can seem obtuse or arcane to an executive-level audience start popping up, potentially putting a roadblock between you and your audience.

Focusing on IRTA avoids that complexity, and helps your team come away from a meeting feeling that they learned something substantive about your site’s progress in organic search, rather than shutting down when confronted by a blinding spreadsheet full of metrics they don’t grok. If you can communicate the concepts behind IRTA to a C-suite audience, you’ve made a significant stride toward achieving a meaningful level of understanding of SEO in your organization. Let’s take a closer look:


Indexation is metric that tells you how many of your site’s URLs have been indexed by a search engine. The goal is to get as many of your site’s valid, valuable URLs into the index as possible. E.g. if your site can render 100,000 valid URLs, and you have 80,000 of them in Google’s index, that’s very good.

High Indexation numbers are important because organic search is like fishing with a drag net, where the size of your net is how many pages you have in the index – the bigger your net, the get more fish you’re likely to get.

Low indexation numbers communicate critical information, as well. Indexation levels are often closely tied to Information Architecture (IA). If you have a poor IA, or technical issues that are preventing engines from finding your pages, low indexation numbers are a canary in the coal mine telling you that your site requires technical SEO work.

Additionally, low indexation numbers can also occur if your site has insufficient link authority. Engines may be able to find your content, but don’t consider it “important” enough to warrant the cost of keeping in their indexes. Again, this is a very useful indicator that points to a need for content marketing activity to increase your site’s authority.


Once you have your pages in the index, you have to get them in front of human visitors – they have to rank. Ranking for a given query is a function of your page’s relevance for that query, in addition to how authoritative that page is. If your site gets traffic from a very large number of keywords, that’s a very positive sign – it shows that you’re producing valuable, on-topic content (relevance), that is seen as trustworthy and important (authority).

There are lots of different ways to measure ranking, and only a few of them are likely to help you make meaningful progress toward your business goals. In a subsequent post, I’ll provide specific recommendations for metrics that you can use to track this at an aggregate level, and avoid the trap of focusing too much energy on a tiny handful of keyword rankings, when many thousands of keywords are driving traffic to your site.


Once a URL is ranking for a visitor’s query, you need to entice the click. Solid on-page SEO provides good relevance signals to engines, and also to potential visitors. You want to make sure that your title tags and meta descriptions contain important keywords, and are also readable and compelling. They’re marketing copy, and you need to use that space wisely.

In a subsequent post, I’ll outline some quick segments that you can use to slice traffic and determine where the opportunities are for your business.


Increasing organic search traffic year-over-year is a great sign that you’re doing some things right in SEO, but traffic is not an end in and of itself. Actions are the name of the game. You want a visitor to download a whitepaper, sign up for a webinar, turn extra pages, share your blog post via social media, buy a product.

Focusing on actions makes you pay close attention not just to the amount of traffic that you’re getting, but to the quality of that traffic, as well. And if you’re getting highly relevant traffic, but still not seeing the business outcomes you expect, you need to start thinking critically about your site’s usability, and conversion rate optimization testing. Once you have a steady flow of qualified organic search visits to your pages, conversion rate optimization can be the difference to revenue numbers that are flat year-over-year, versus double digit growth.

By walking your team through these concepts, and explaining their importance, IRTA becomes a shorthand for “what does that massive spreadsheet full of metrics tell us about our progress in organic search?” Every SEO will remember the first time that their CEO asks them “how’s our IRTA?” Finally, you’re all speaking the same language.

*Hat tip to Derrick Wheeler of Microsoft, the guy I learned IRTA from way back in the day.