A Well-Planned Taxonomy

Recently, I ran into a neighbor who is a VP at a high-tech firm working on speech recognition, so I asked if she was using taxonomies. “To me, Tom Brady is a topic and that’s enough. It’s too much work to build hierarchies.” But for me, there is way too much information about Tom Brady. I’d like to be able to find information based Tom Brady’s statistics, or how he is managed, or maybe, something about his social life.

Taxonomies are not just about hierarchies or long lists of terms. Taxonomies exist to capture how users look for information. For example, if I am interested in “Food Policy”, I might want to know where food is produced, what is added to food (food additives), how food is distributed, and where food is needed to prevent hunger, including local food banks.

A taxonomy term has to be categorized to have any meaning.  The process of categorization is called facet analysis, and here’s why it’s necessary:

  • Reduces the complexity of thousands of terms into smaller, manageable categories
  • Provides semantic, contextual meaning for a term including the power to disambiguate terms
  • Allows connections to be made between categories that can be inherited (but carefully)
  • Provides ability to recognize gaps in information
  • Provides ability to reuse concepts for multiple applications, or to identify local variations of a vocabulary
  • Provides ability to focus on important topics

For example, in one project, I was handed a taxonomy that had 4,000 terms that we reduced to 9 top nodes. In addition to improving search, we noticed another effect. Our computer products facet included attributes such as supercomputers, minicomputers and personal computers. As our application was tied to a search interface, we began to notice the uptick in searches on laptops and personal computers, which became indicative of changing demand in a changing market,    Similarly,  on another project,  we noticed emerging concepts around “Green Business” “Social Responsibility” and “Business Ethics.” One of the goals of that implementation project was to make it easy for the  taxonomy editor  to add these concepts and realign content to meet these new demands.

That’s why it’s important to integrate social networking with taxonomy tools. Terminology, whether suggested through social networks or  formally produced, increase their value  when they are linked through categorization. Be sure to evaluate your taxonomy to make sure it is categorized. I’ve heard horror stories recently of organizations with thousands of terms that were not defined or categorized.

A well-managed taxonomy can be a strategic tool to like the “canary in the mine” to help identify emerging concepts.

canary on a branch

canary on a branch


So take the planning or revisionof the taxonomy seriously. It is an opportunity to find out what the organization knows, how different groups inside and outside the organization express what they know, what an organization wants to know, and what gaps are in their content and knowledge.

Here’s a five point plan.

1. Understand the expectations and information needs of stakeholders, endusers, technical staff and production work including information flows, and bottlenecks. Gather information. Listen to what different levels perceive as existing problems and compare to what exists. Learn how indexing is currently done and what the issues are with search and terminology management. Acknowledge what works well, and discover what problems exist. Pay attention to how terminology is used in different context.

2. Develop a clear set of requirements based on needs of the organization. Determine project goals. For some organizations, the ability to tie vocabulary to search will be imperative, while other organizations need to find ways to come to common agreements about standard terminology across diverse entities. Is the taxonomy to being used to manage metadata or is it being used to search and index full-text? Is the application managing non-digital assets like people, services, and projects? How immediate are the information needs? Does a vast amount of content need to be indexed quickly which might lead to an auto-categorization solution? What statistics will demonstrate the value of the taxonomy? Are similar terms used in different context? Take, for example, a company name — a company can be simultaneously a product supplier, competitor, customer, and strategic partner. Is there a need to represent multiple views of the same term?

3. Create a deeper understanding of user needs by building a model of the domain. Without categorization, taxonomy can become a long, unwieldy list of terms that lack meaning and context. By placing a term in a category can add meaning. Use the techniques of ontological type analysis to abstract categories and create information models that link concepts (in semantic modeling, this would be creating RDF schema).  Visio or Topic Mapping can help capture these connections visually.

4. Obtain a strong set of detailed test terms by collecting terms from a variety of activities including card sorts, search analytics, content analysis, deeper text analytics, and entity extraction that represent both user need and content. Users can be involved in this process. Automated tools can help here if your content is accessible. Entity Extraction and Automated Concept Generation can help, but someone will still need to sift and winnow the output – that’s why it’s so important to have a prior understanding of what users want and need to know.

5. Define the core areas of knowledge that need more depth in the taxonomy. As part of the evaluation process, you would need to define how deep and broad the taxonomy needs to be. If you have done a facet analysis, some of those questions will be answered. As a rule of thumb, core areas of knowledge need to have depth and structure.

6. Prepare for change. In fact, having a taxonomy that quickly recognizes new concepts might be a competitive advantage.  Test your taxonomy, and be prepared for change.  It means that taxonomy is open to new ideas from the people who are on the front lines of the market – customers, sales and marketing, customer service staff, librarians, the customer service department. It means new terminology can bubble from the bottom up! A taxonomy tool needs to allow for dynamic and flexible editing of terms to grow with changing enterprises and information needs in a global economy.