External User-facing Search and Taxonomies

Search has become faster, cheaper and more intelligent since the days of inverted single word engines so why not just use a search engine.  Why bother with taxonomy? Let’s briefly revisit what search is suppose to do. A search engine needs to make a pretty good guess of what a user wants to find – an unspoken intent which is expressed in staccato keywords  and then search needs to  match the user query with some content (documents, data, digital objects, people with expertise, adserving, product information), and take some action such as  read, buy, forward, share, comment, browse…..Sometimes the match is exact, sometimes the query terms are a partial match and sometimes there is no match.

In other words, search is not a perfect art.  The units of this equation are not just search  engine.  It is also the quality of the content and the query.   Good search needs good content,  no matter how great the technology.

Someone responsible for search implementation  has limited control over two of the key ingredients of search – the technology and the content.  This is why taxonomy plays a role – it can help describe concepts not in the content or in the metadata about the content.  (metadata is particularly useful for digitizing non-digital objects).   Taxonomy is not always necessary –  If you can write custom content with very precise vocabulary using Search Engine Optimization (SEO) techniques might not need a taxonomy. But documents cannot be altered, such as emails or reports, where it would be a significant protocol violation, even illegal.  When is search not enough?

1)  Developing effective measures to assess when search is not enough – the 80/20 rule

As part of the some of the early work in faceted taxonomies I did, I spent some time at MIT working on a research project that compared results when we queried a system that was based on a search engine technology alone,  and when we queried one where the query could be enhanced by adding taxonomy terms. For this experiment, we had the advantage of using a system, that was the brainchild of Wendi Pohs,  in which we had 2 search engines  using the same technology processing similar documents that were made available to a user interface which had a simple search box like Google. One engine processed news feeds .  These feeds were added quickly with no intervention—directly loaded into a search engine.   What our research found was that search engines without a taxonomy, left unattended, flatlined. The recall never improved over 75-80%.   Lee Romero, who is a keen observer of search, has recently done an excellent blog post observing this same flatline phenomenon.

What to do when you want to do better than 80% and move the flatline

In the same experiment, we created second engine, using the same software, had a taxonomy function where we inserted taxonomy terms into the index. These terms were selected from query logs and analytic reports-  they were unmatched terms, misspellings,  abbreviations.   There was an added cost to add taxonomy terms, but there was no impact on speed or performance of search since search technology used the same engine.

The taxonomy  was divided into classes such as product, company, or subject. Each term was connected to another term by using user-defined cross- connections (associative terms) which was smart enough to infer other relevant terms.  At least one of these terms  in the linked sets had to be tagged by an indexer.  So,  if a product was tagged, then we could infer that the product was <made_by> a company, thus speeding the tagging process.  Taggers could override the automated suggestions, and/or add new rules, by the way.  This way we could ensure exhaustive indexing at a low cost and effort.  The taxonomy-controlled section  paid back this effort.  A search on this section  would recall content that matched user-query terms about 90% of the time.  The taxonomy-controlled part of the database  could be improved.  We also worked hard to acquire content- good content- in many formats that would improve the quality of the database and thus what goes into an engine.

By using reports, tools and measurements, we were able to proactively add equivalents and monitor emerging terms.    Dips in performance triggered action to understand what was changing in the user’s world – was it query terms, a search for emerging content, or other unmet needs.

Errors were due to 1) missing content 2) wrong application 3)new terms or spelling errors that could be quickly added to taxonomy and 4) new and emerging trends that users were identifying that had not yet been captured in the taxonomy – all issues that could be identified and corrected.For example,  in the recent flu season,  search engines would eventually learn that  H1N1 was the preferred term to Swine Flu,  but in some cases, it was much easier for a trained taxonomy editor to surgically make this connection (especially in a fast moving news and business cycle). In a search engine only scenario, these errors are not always identifiable not actionable.

Set realistic goals and explanations for what taxonomy can do

ROI discussion often mean conversations that start or end with “Taxonomy can increase sales by improving conversion”  or “lower costs.”   Here are a few reasons that might be more honest and even compellng

Help with Ambiguity and add Precision —  Use Faceted Navigation: Search engines have a hard time differentiating about very key concepts and terms.    I remember in the early days when a term like “ASK” would bring a search engine to its knees because it couldn’t tell the difference between the name of  system command or a computer company.   By sorting terms into facets, we could help differentiate and resolve ambiguity by navigating user to the right facet and by tagging more precisely.   A developer looking for information on Java applications shouldn’t be sent to Java the island.  Taxonomy can help keep users searching down paths that might lead to results that are useful.  That’s productivity.

Implement Universal Search: A taxonomy can be implemented independently from the content, which means it can be used across content types- blogs, videos, email – creating a common set of concepts from which to generate user-centered search.   That’s efficiency and smart use of limited resources.  You need to have common metadata or rdf to take advantage of universal search, but there are standards such as Dublin Core that can help jumpstart that conversation.

Think Scalability and Reuse: Taxonomy can be used across applications, which means a central, faceted taxonomy, can be reused by other applications.  The best practice however is to create smaller taxonomies that are divided into homogenous facets.   To design monolithic spaghetti-like taxonomies will, in the end, create more work, bad inferences, and sour you on the whole project.  Reuse and scalability avoid redundant efforts.  Cost savings.

Use Taxonomies to Manage Change: Since taxonomy is independent of the content, you can change the concepts in the taxonomy without impacting the content.   Taxonomies are NOT static. For example,  many organizations need to change organizational names.  These names can be subsumed in a taxonomy without impacting the existing content. It’s safer and more secure way to handle change.

Create a technical and cost plan to integrate taxonomy while maintaining speed and performance, and not adding to overhead costs.

Implementing taxonomy within search can be done at various price points —  a solution like Vivismo  is not within every budget but there are other options low cost  and effective alternatives  I’ve found include:   Here are some technical considerations in adding taxonomy.

  • You don’t need a high end faceted navigation tool to get benefits of faceted navigation. Faceted navigation allows a user to narrow or broaden or expand query at time of search. This can be done in many CMS systems including  Drupal.   WordPress, which is what I use for this blog, has a taxonomy module, allows multiple authors
  • Add custom fields or metadata  for tagging that could be loaded into the search engine to improve search (as SOLR does)
  • If you have the budget and requirement for high-thoughput as in  auto-classification and text analytics, as in nStein, Teragram or Vivismo, then taxonomy is  still very useful to improve precision of results and making collections within document sets.

The bottom line is that whether you use search engine, you should be confident that 80% of the time, the user will get what they want. If you need to find ways to improve the user experience, taxonomy is one highly viable, low-cost and effective option.  Taxonomy might be worth looking as a way to give a  insert a pacemaker into the heart of  a search engine that seems to have flatlined.

Once you have a backbone with classy taxonomies and metadata, you can then proceed to the creative activity of beautiful designs of navigation paths for your end users.  But keep your eye  For more on search and taxonomies, see also my prior book review of Peter Morville’s  Search Patterns.

~ Marlene Rockmore

Enhanced by Zemanta