Q&A with Vanja Josifovski, CTO of Pinterest

Q&A with Vanja Josifovski, CTO of Pinterest

Amir KonigsbergCo-Founder & CEO, Twiggle

As CTO and VP Engineering, Vanja Josifovski leads the Search, Discovery and Machine Learning teams powering the user experience at Pinterest. Twiggle CEO Amir Konigsberg sits down with the search veteran to discuss innovation, search and everything in between.

Amir:What are some specific challenges for search teams at Pinterest?

Vanja: Pinterest is the visual discovery engine where people search over a corpus that is a combination of visual, textual and graph data. As such it is important for us to develop a user experience that captures all three properties of the data in both a product sense and data sense. In the textual domain, our queries are different than the queries typically issued in search engines. We see far fewer navigational queries and many more broad searches that initiate discovery experience. In fact, the majority of text searches on Pinterest are under three words. That’s why technologies that power search personalization are more important to us than traditional search engines where the query intent is stronger and narrower.

Amir: As the CTO at Pinterest, you lead various initiatives leveraging machine learning and deep networks. What excites you most about leveraging these technologies at Pinterest?

Vanja: Pinterest is advancing the state-of-the-art in several areas of machine learning. For example, our visual search products are based on state-of-the-art visual supervised and unsupervised models. Currently, there are several approaches to blend graph data with neural networks, one of which has been developed at Pinterest and advances the state of the art. Another type of innovation is our machine learning strategy, which enables relatively small teams of engineers to leverage this technology across the product. We’ve quickly bootstrapped new use cases and reused infrastructure by standardizing components such as training, serving, model debugging and experimentation. Not only does this make the user experience better for Pinners, it improves developer velocity and productivity.

Amir: Among social networks or communities, what unique value proposition does Pinterest offer to retailers?

Vanja: Pinterest is often categorized with social networks but the intention of our users is actually much different. Pinterest is the world’s first visual discovery engine, and we’re building a product that helps people try new things by showing them personal, useful and relevant ideas. Other apps might be where you go to share dinner party photos, but Pinterest is where you go to plan the dinner party. According to Nielsen, 98% of Pinners report trying new things they find on Pinterest, compared to an average of only 71% across other social media platforms. We find that people use Pinterest throughout the entire shopping process to find new ideas, refine their purchase criteria and make a decision. In fact, 90% of weekly active Pinners told us they make purchase decisions on the platform and 70% said they use it to find new products (GfK Path to Purchase study, Dec 2017). As a result, retailers can connect with people on Pinterest who are making a decision about what to do or buy next, but are often still undecided on which brand they want it from.

Amir: Pinterest is investing heavily in visual search. What do you think the future of text (or voice to text) search look like?

Vanja: Visual and voice/text search will both be used in different scenarios. Voice search is great for pointed queries that can be easily answered quickly, such as asking about the weather or reordering paper towel is a pointed operation with relatively little information coming back to the user. The question is how far this technology will expand. It would be hard to consume an average Wikipedia article through voice, or search for something that has a visual component. Visual search is suitable for things that are hard to describe in text, such as interior design and fashion. We’ve been working on visual search at Pinterest over the last four years, but we’re still just getting started. I feel there will be a flywheel of product-technology innovation here that will combine into experiences we don’t have today and will allow for more natural integration into a holistic search experience for users.

Amir: At Twiggle, we use machine learning and natural language understanding to improve relevance and recall for text (or voice to text) search. Is optimizing for relevance and recall in image-based search different than text-based search?

Vanja: At some level, visual data is very different from textual data. It still amazes me that on the web, 2.7 words (on average) can produce such great results. This is the case since language and words have so many semantics. Visual data starts with pixels, which are basically two or three numbers, and contains individually far less semantics. Thus today’s model uses spatial transformations, such as convolutions, to essentially produce a semantically richer representation that’s then used for inference. However, I’d say that the difference in modeling is mostly at the candidate generation level. At the ranking level, there are more similarities, especially in the head and torso of the query volume curve. As in text, signal as query-result interaction counters, and query independent scores still rule. I don’t know a successful search and recommendations engine that can ignore these. Understanding a users’ preference in a simple and very powerful way is key to any final ranking layer, regardless of the underlying data.

Amir: Prior to joining Pinterest, you ran large-scale search projects at Google, Yahoo, and IBM. What makes search so hard, and what are the biggest misconceptions around it?

Vanja: In some ways, search has significantly evolved, while in other ways, it’s stagnated. The simple and powerful signals I mentioned previously brought search to the level that made it so useful on large-scale corpora, such as web, and that changed the world. Scale was also solved in the early days. The next evolution in search will require much more complex reasoning and use a variety of data. I don’t believe new machine learning paradigms themselves will bring revolutionary changes. Instead, we need much deeper understanding of the artifacts we search as well as queries and intent and taste of a platform’s users. Furthermore, both hardware and software advances will change both the context and the form of search. We are only at the beginning of this profound change in how people think about search.

Amir: My parents still have a black & white television that requires getting up from the sofa in order to change the channel. What about product discovery today will seem ridiculous to our children?

Vanja: I suspect a product change is needed for us to take full advantage of the mobile environment. Whether this is an evolution of current uses of voice and camera for search is yet to be seen. Product changes are in some ways much more difficult to get working, because they require changes in user behavior.