Why Haystack? Democratizing Search Relevance!


We’re proud to be putting on another Haystack in late April. But why? A few people have asked me why we need another search conference? What’s the purpose behind Haystack?

OSC’s mission is to empower search teams. Instead of having these great ideas, practices, and tools locked up in obscure academic papers or behind closed doors and walled gardens, we want search to be open and free. Because, if search is crucial to your bottom line, you can’t afford to outsource search. You need to understand the practices and principles yourself, to take charge of the search experience, and really own what your users experience. But you can’t do it alone: mid-size companies need to work together as a community to find and explore the menu of approaches.

If search is crucial to your bottom line, no area is more crucial than search relevance. Understanding what your users are telling them with search queries is hard enough. But if you move from search to relevance more generally it gets harder: with the advent of chatbots and personalization and other situational metadata (location, time, etc) users expect search to ‘just work’. They expect “Simple Made Easy” without seeing all the complexity behind the scenes.

And no area has seemed more closed off and harder for teams to build a practice around than relevance. Which seems surprising. It seems these search relevance problems should be well solved. Google has been around for decades. Why should it be so hard?

For two reasons:

  1. Academia has been primarily a job and research funnel for Web search giants. Only recently have broader use cases been looked at.
  2. Your application’s niche is more unique than you realize: the industry has broadened to cover a long tail niches, use cases, and types of interactions than what academia could ever cover

I like to imagine this situation as a pyramid, like the image below:

alt text

At the bottom of the pyramid are the many many many sites where investing in search or relevance won’t pay off. Many products exist to add a ‘good enough’ search experience to a site. At the top is where Information Retrieval research has been in the last several decades: a strong connection between large scale Web search and academia.

In the middle is probably where you are if you’ve read this far in this article. You have a search experience where investment is called for. Yet when you look at the legacy of search research, you don’t see that much that helps you. The open source search engines help: they have some sane defaults for a search application. They give you a deep toolbox with query primitives, scoring methods, analysis tools, and basic NLP. Yet there’s little guidance in what you should do for Your Application(™). Open source search is a framework, not a solution!

It’s tempting being in this ‘middle class’ of applications to not appreciate how specialized your niche is. It’s tempting instead to think of search as ‘one thing’ with a single relevance solution, that ought to just be plugged in and work for everyone. Maybe read an academic paper, sit through a conference talk, or hear a product pitch that doing this One Cool Trick just makes our search work ‘like Google’. It’s less satisfying to hear that you’ll have to take on hard work, understanding that your search shouldn’t work like Google or Amazon, that it requires product experimentation, and as much engineering as the rest of your product.

Luckily that realization is changing. Organizations realize that search relevance - or some kind of ‘matching’ between users and [products|content|articles|jobs|etc…] is very important. Last year’s Haystack was a testament to this: practitioners coming together to share what solutions worked and didn’t. To give real stories about search and relevance, discuss tools that could be open sourced for the community to use, and share practices and tools that really make a difference.

Some highlights from past Haystacks

  • Measurement is hard, but foundational! Measurement of search quality is perhaps the hardest problem to solve! But probably the most important. Many times people derive judgments from analytics or human judges, but they don’t turn out to successfully correlate to a successful A/B test. See Liz Haubert’s talk and Peter Fries’s
  • Machine learning isn’t magic. It’s powerful but not necessarily easier than any other method for mid-sized companies just because its a machine learning method. All of Haystack’s Learning to Rank talks (here here and here ) have shared painful lessons learned by some of the smartest people I know
  • Open sourcing is good for business and community! Kudos to Sease for open sourcing Rated Ranking Evaluator. By open sourcing Solr Learning to Rank Bloomberg helped themselves by making their learning to rank toolchain, everyones learning to rank toolchain - and thus more maintainable than if it was proprietary!
  • Decreasing cost of experimentation is the silver bullet. The silver bullet to search isn’t one particular solution, it’s decreasing how hard it is to experiment with solutions against our users quality data - to ‘fail fast’ with relevance ideas. See here
  • Taxonomies, taxonomies, taxonomies - for mid-sized companies, maintaining a taxonomy is a tried-and-true method that isn’t exciting but gets the job done for building semantic-search capabilities (see Max Irwin’s talk )
  • Vectors, vectors, vectors - as more solutions come out for performing more accurate embeddings, search engines need to be able to support vector similarity and filtering for use cases like image search or embeddings

Getting practitioners together to share and discuss how we can improve these use cases was a powerful experience. Separately mid-sized companies will whither and collapse under the weight of search giants like Google. Together, we can build practices and tools that can help our employers make more competitive products.

In conclusion - the community badly needs your talks. We want to hear your real life stories, to help the community grow and learn in relevance! Please visit the Haystack site to put in a talk. Or be prepared with a 5 minute lightning talk!

Submit your Talk!