Google and the Tip of the AI Iceberg

I was surprised by the amount of surprise in the technology community when Google recently announced that its RankBrain artificial intelligence (AI) is being used to help understand natural language queries and serve results.  This should come as a surprise to no one.  Google has always been an AI company.

This statement from Larry Page is about as clear as it gets (emphasis mine).

Around 2002 I attended a small party for Google—before its IPO, when it only focused on search. I struck up a conversation with Larry Page, Google’s brilliant cofounder… “Larry, I still don’t get it. There are so many search companies. Web search, for free? Where does that get you?”… But Page’s reply has always stuck with me: “Oh, we’re really making an AI.”

For most people, the term “AI” tends to conjure up ideas of an all-knowing computer program that understands humans completely, and returns answers in a context that is immediately understandable to humans, like the Star Trek computer.  At SXSW in 2013, Google’s Head of Search, Amit Singhal said, “The destiny of search is to become that ‘Star Trek’ computer and that’s what we are building.” In fact, Singhal recently demoed a Star Trek-like lapel pin that interacts via voice with Google Now.

Even if you don’t take public statements and product demos very seriously, you can simply look at where Google has been investing in creating IP.  Here’s a count of research papers published by Google.  You’ll notice that Artificial Intelligence & Machine learning have 143% more published papers than areas like Information Retrieval & the Web, which is at the core of traditional approaches to search.

google research papers by topic

However, despite this massive amount of AI and machine learning (ML) work being done at Google, it’s just in the last year that RankBrain is being used to help field queries.

Google have been working actively on AI & ML in relation to search for a long time now (in Internet years).  In 2008, Anand Rajamaran had a discussion with Peter Norvig (former Director of Search Quality at Google and author of Artificial Intelligence: a Modern Approach) about ML (emphasis mine).

“The big surprise is that Google still uses the manually-crafted formula for its search results. They haven’t cut over to the machine learned model yet. Peter suggests two reasons for this. The first is hubris: the human experts who created the algorithm believe they can do better than a machine-learned model. The second reason is more interesting. Google’s search team worries that machine-learned models may be susceptible to catastrophic errors on searches that look very different from the training data. They believe the manually crafted model is less susceptible to such catastrophic errors on unforeseen query types.”

And this has been corroborated by other sources.  A former Google Search Quality engineer had this to say on Quora in 2011 (emphasis mine):

google search quality and machine learning

I want to call out this sentence in particular (emphasis mine): “In a machine learning system, it’s hard to explain and ascertain why a particular search result ranks more highly than another result for a given query. The explainability of a certain decision can be fairly elusive.

This is a result of how machine learning models work.  ML models optimize for accurate predictions, and don’t care much about why they are accurate.  From HBR:

“One important difference from traditional statistics is that you’re not focused on causality in machine learning.”

This is very important.  ML models are just interested in the best result, not an understandable explanation of how each ingredient in the recipe contributes to making that recipe so delicious.

ML models decide what variables to use, and sometime build their own variables called “features” in order to make better predictions.

“Think of “feature extraction” as the process of figuring out what variables the model will use. Sometimes this can simply mean dumping all the raw data straight in, but many machine learning techniques can build new variables — called “features” — which can aggregate important signals that are spread out over many variables in the raw data. In this case the signal would be too diluted to have an effect without feature extraction.”

This leads to the “black box” problem that Edmond Lau pointed out.  ML models can build their own synthetic metrics, and do not explain causally why a particular combination of metrics leads to a better result.

From Amit Singhal’s perspective, this leads to team that lacks “direct control” of its own algorithm, which makes it harder to intentionally shape its direction based on inputs that make sense to humans.

Fundamentally, this is because machines do not “think” in the way that humans do.  The problem with the label “Artificial Intelligence” and batting around reference to the Star Trek computer is that we anthropomorphize these computer systems and want them to be like human minds.  We created them, so in a way they’re just mimicking us, right?

Wrong, of course.  Machines execute a series of operations until the program tells them to stop.  Machines don’t “think” the way we do, but that won’t stop them from doing things that astound us. What an AI needs is massive data to learn from, and massive computing power to crunch it all, and Google has both.

Andrew Ng, who has taught AI at Stanford, built AI at Google, and then moved to Baidu to continue developing AI. He recently said:

“When machines have so much muscle behind them that we no longer understand how they came up with a novel move or conclusion, we will see more and more what look like sparks of brilliance emanating from machines.”

Right now, Google’s AI isn’t really being given that chance.  It’s essentially being asked to work clean-up duty on all the completely new / novel and hard-to-understand queries that Google sees in a given day.  From Bloomberg:

“The system helps Mountain View, California-based Google deal with the 15 percent of queries a day it gets which its systems have never seen before.”

This isn’t glamorous work, but it doesn’t need to be.  The AI doesn’t care.  What it does is learn.  A “learning machine” creates a positive feedback loop (as it learns, it discovers way to accelerate learning). From Demis Hassabis, CEO of deep learning company DeepMind (acquired by Google in 2014 for $400MM)

“I also think the only path to developing really powerful AI would be to use this unstructured information. It’s also called unsupervised learning— you just give it data and it learns by itself what to do with it, what the structure is, what the insights are. We are only interested in that kind of AI.”

RankBrain is just the tip of the AI iceberg. As we’ve seen, Google has thought of itself as an AI company from the beginning, but, they’ve been cautious in their use of AI. And we know that their ambitions are toward a much more powerful and self-directed AI. The fact that RankBrain has advanced sufficiently to be included in Google’s ranking algorithm is a big step, but it’s just the beginning.

Google is in a uniquely powerful position in the AI field. They have massive data and massive computing power. Put the two together and very interesting things can happen. It may seem like a big leap to get from typing queries into a search box to speaking to an omniscient Star Trek computer that understands every word we say and the context around it. But we’re much closer than you think.

What the Healthcare.gov Disaster Means for Your Website

Putting politics aside, the launch of the Healthcare.gov website has been a train wreck. I’ve been building and marketing websites for a dozen years, and I have never seen a launch go so spectacularly, publicly wrong. This is partly due to the highly visible and contentious nature of the Affordable Care Act. But more interestingly, this marks the first time in my memory that there was an implicit assumption that a big, complex program and a website were essentially the same thing.

If:             Healthcare.gov = Affordable Care Act

Then:    Healthcare.gov is broken = Affordable Care Act is broken

This logic is fundamentally flawed, but that doesn’t stop it from being the dominant perception in the marketplace.  In fact, if you look at the relative popularity of these terms on social media, “healthcare.gov” gets 2.6 times as many tweets per day as the name of the program itself, “affordable care act”, and Google Trends shows the terms neck-and-neck in popularity:

Topsy Analytics for healthcare.gov and affordable care act

Google Trends data for healthcare.gov and affordable care act

What this marks is a shift in the mindset of the American public – your website is not a part of your business, it is your business.  If your website doesn’t represent your company well, or even worse, doesn’t function smoothly, the bad smell doesn’t just hover over your website, but over your business as a whole.

But that knowledge comes with a huge upside.  As a website owner, you are in a uniquely powerful position – no one in the world knows your customers better than you do.  You will have some or all of the following ingredients at your disposal:

  • Customer knowledge
    • Direct customer experience and feedback
    • Your sales and support staff who talk with customers every day
  • Keyword data
    • Years of search keyword data specific to your audience finding your website
    • Social media keyword data
    • Keyword data from publicly available tools
  • Content performance
    • Engagement and conversion metrics of content you’ve published over the years
    • Analysis of competitors’ content marketing and social media campaigns
  • Conversion path knowledge
    • Knowledge of the specific steps involved in your customers’ journey
    • Knowledge of the common stumbling blocks your customers encounter
    • The ability to proactively present solutions to problems that customers may experience

These ingredients should be combined and recombined with user personas to continually refine the kinds of content and services that you offer to address changing audience needs.  If a piece of content does not directly satisfying one of the primary needs of your core audience, you should question whether it deserves to be prioritized.

Create content that solves user needs as its first priority, and then rigorously QA and analyze conversion bottlenecks to ensure that customers aren’t encountering barriers along the conversion path.

Granted, if your website fails at providing a good user experience, and the efficient delivery of its primary value to your customers, you’re not likely to have your failure broadcast on The Daily Show.

People will just quietly click the Back button, and find your competitors.

Daily Show tweet about healthcare.gov issues

This is a re-post of my Forbes article.

The Cobbler’s Son

The saying “the cobbler’s son has no shoes” is so overwrought that I feel like an idiot even bringing it up.  But, here we are, all the same.  This is my site.  Woof.  When I stop wallowing in all my free time, I will post more, make it better, do more push-ups, etc.

I work in online marketing.  Specifically, I’ve spent the last seven years learning and practicing SEO.  It’s been pretty great, actually.  Outside of the SEO bubble that I live in, I don’t know many other people who can say that they love what they do, and mean it.

You can find out more about me here:

http://www.linkedin.com/in/ethanhays