Making the Leap from Big Data to Big Knowledge

Making the Leap from Big Data to Big Knowledge

Arbel Deutsch PeledDirector of Core Technology

These days, it's rare to go an entire hour without hearing the word “data” thrown around. Some say data has become the most important commodity in the world. Indeed, the ripples of its effect on society can be found in almost any field from entertainment to transportation to sports to healthcare. It is also a tremendous driving force in online retail, enabling e-commerce sites to boost personalization, suggestions, ad targeting, and search quality.

And then there’s the “big,” as in: “big data.” The infinite abundance of data has become a cornerstone of cutting-edge technology. It is manipulated in many ways, the results of which are often incredible. Big data combined with machine learning has led to human-like – and in some cases better-than-human – performance, even on tasks in which humans seem to have an inherent advantage, like facial recognition.

This puts titans like Google, Facebook and others ahead of the game, each within the market sectors they control. As the go-to provider of a specific service for most of the population, they are able to collect vast amounts of data that are simply not available to anyone else. They harness this data to sharpen their already existing edge, and further entrench their leadership position.

Knowledge Is Power

But here is the twist: it turns out that while some specialized tasks, like identifying brain cancer, inherently require vast amounts of data, other tasks can be performed just as well not with data, but with knowledge. What exactly is knowledge? The definition is not entirely clear, but some progress has been made. Here’s the key though: data is used to generate knowledge. One example can be found in the e-commerce realm, when trying to determine complementary products. A common approach is to log transactions over many unique products and to establish connections over time for suggestion mechanisms. This is a form of knowledge.

However, a human sales associate who needs to suggest a complementary product to a customer wouldn’t go about it the same way. The salesperson wouldn't need to know that 10% of people who bought a basic t-shirt in blue bought similar shirts in different colors, or that 12% of people who bought a pair of elegant pants also bought a dress shirt or a black belt to go along with it. People can generalize better. They can suggest other articles of clothing or accessories that match a particular outfit, or direct people with a certain taste to a particular collection that would suit their style. These are all valuable pieces of knowledge that are vastly more precious than data, in the same way that experience is sometimes more important than academic background.

And yet, currently, search engines are largely driven by big data, not by knowledge. As an initial step, they link keywords in search queries to the keywords in product descriptions. Then they link query keywords to behavioral data indicating what products people ended up clicking on. When enough behavioral data is accumulated, search engines can draw connections between specific keywords and specific products in the catalogs, and then boost these products to the top of the results page for similar queries.

For example, when people search for “smartphone with big screen,” most people are likely to click on smartphones with screens that they perceive as big. Over time, as the engine collects enough data on the products selected, it will display the most commonly selected products first in the results page for similar queries.

But, unfortunately, this approach is not good enough.

Why?

The Data Is Not Dynamic

What counts as a “big” screen today will not count as a “big” screen in the future – it’s subjective. Using yesterday’s consumer selections to predict tomorrow’s customer intent is not reliable. Not to mention that product catalogs change all the time. Quality search for relative terms like “big” or “affordable,” rapidly changing collections, and evolving product lines should rely on knowledge. Big data from the past can be lacking at best and misleading at worst.

The Data Can’t Be Generalized

Imagine you wanted to optimize search for the query “smartphone with big screen.” You would need many, probably thousands, of behavioral data points on similar queries to make meaningful connections with preferred products in your catalog. Let’s say you have enough data. Now you want to optimize for “laptop with a big screen.” You would have to collect the same amount of data on the new query even though the two share commonalities. Search engines can’t make the inferences that a person would between “big” smartphone screen and a “big” laptop screen. They can’t generalize and so they are not efficient in using data.

The Data Is Messy

“Big” might mean one thing to some people and another thing to others. The order in which products are displayed in the results page has a lot to do with our behavior but doesn’t often cater to the consumer’s original intent. Behavioral data, therefore, can be messy and misleading. Relying on knowledge solves that.

The Data Is Not Semantic

Search engines view words as discrete pieces of data. They don’t draw meaningful connections between them or read between the lines. When you search for “notebook laptop” you are not looking for a notebook but a laptop. Search engines can’t tell which of the product types you are after. When you search for “dress under 50,” you mean $50, not size 50. But your search engine will be frazzled and end up ignoring the part that wasn’t understood. If the search engine can’t tell what people are looking for, users won’t stick around to click further. Not to mention that today, people tend to use brief, choppy queries, leaving out connecting words and normal sentence structures. However, this will not be the case when voice search takes over its massive projected market share. In the future, search will have to be semantic to be right.

So how can e-commerce make the great leap from big data to knowledge?

Leveraging advanced technology that is currently available, search engines can bring consumers the best of both worlds, combining the power of big data with the dynamic knowledge one would expect from a human sales associate. Through Natural Language Processing technology, search engines can extract meaning from customers’ search queries – and those queries can also be added to search engines’ knowledge banks, generating illuminating insights into what shoppers actually want. Bringing products and shoppers onto the same platform of knowledge will enable the next great revolution in e-commerce. Shoppers and businesses alike will reap the benefits.