A Well-Planned Taxonomy

Recently, I ran into a neighbor who is a VP at a high-tech firm working on speech recognition, so I asked if she was using taxonomies. “To me, Tom Brady is a topic and that’s enough. It’s too much work to build hierarchies.” But for me, there is way too much information about Tom Brady. I’d like to be able to find information based Tom Brady’s statistics, or how he is managed, or maybe, something about his social life.

Taxonomies are not just about hierarchies or long lists of terms. Taxonomies exist to capture how users look for information. For example, if I am interested in “Food Policy”, I might want to know where food is produced, what is added to food (food additives), how food is distributed, and where food is needed to prevent hunger, including local food banks.

A taxonomy term has to be categorized to have any meaning.  The process of categorization is called facet analysis, and here’s why it’s necessary:

  • Reduces the complexity of thousands of terms into smaller, manageable categories
  • Provides semantic, contextual meaning for a term including the power to disambiguate terms
  • Allows connections to be made between categories that can be inherited (but carefully)
  • Provides ability to recognize gaps in information
  • Provides ability to reuse concepts for multiple applications, or to identify local variations of a vocabulary
  • Provides ability to focus on important topics

For example, in one project, I was handed a taxonomy that had 4,000 terms that we reduced to 9 top nodes. In addition to improving search, we noticed another effect. Our computer products facet included attributes such as supercomputers, minicomputers and personal computers. As our application was tied to a search interface, we began to notice the uptick in searches on laptops and personal computers, which became indicative of changing demand in a changing market,    Similarly,  on another project,  we noticed emerging concepts around “Green Business” “Social Responsibility” and “Business Ethics.” One of the goals of that implementation project was to make it easy for the  taxonomy editor  to add these concepts and realign content to meet these new demands.

That’s why it’s important to integrate social networking with taxonomy tools. Terminology, whether suggested through social networks or  formally produced, increase their value  when they are linked through categorization. Be sure to evaluate your taxonomy to make sure it is categorized. I’ve heard horror stories recently of organizations with thousands of terms that were not defined or categorized.

A well-managed taxonomy can be a strategic tool to like the “canary in the mine” to help identify emerging concepts.

canary on a branch

So take the planning or revisionof the taxonomy seriously. It is an opportunity to find out what the organization knows, how different groups inside and outside the organization express what they know, what an organization wants to know, and what gaps are in their content and knowledge.

Here’s a five point plan.

1. Understand the expectations and information needs of stakeholders, endusers, technical staff and production work including information flows, and bottlenecks. Gather information. Listen to what different levels perceive as existing problems and compare to what exists. Learn how indexing is currently done and what the issues are with search and terminology management. Acknowledge what works well, and discover what problems exist. Pay attention to how terminology is used in different context.

2. Develop a clear set of requirements based on needs of the organization. Determine project goals. For some organizations, the ability to tie vocabulary to search will be imperative, while other organizations need to find ways to come to common agreements about standard terminology across diverse entities. Is the taxonomy to being used to manage metadata or is it being used to search and index full-text? Is the application managing non-digital assets like people, services, and projects? How immediate are the information needs? Does a vast amount of content need to be indexed quickly which might lead to an auto-categorization solution? What statistics will demonstrate the value of the taxonomy? Are similar terms used in different context? Take, for example, a company name — a company can be simultaneously a product supplier, competitor, customer, and strategic partner. Is there a need to represent multiple views of the same term?

3. Create a deeper understanding of user needs by building a model of the domain. Without categorization, taxonomy can become a long, unwieldy list of terms that lack meaning and context. By placing a term in a category can add meaning. Use the techniques of ontological type analysis to abstract categories and create information models that link concepts (in semantic modeling, this would be creating RDF schema).  Visio or Topic Mapping can help capture these connections visually.

4. Obtain a strong set of detailed test terms by collecting terms from a variety of activities including card sorts, search analytics, content analysis, deeper text analytics, and entity extraction that represent both user need and content. Users can be involved in this process. Automated tools can help here if your content is accessible. Entity Extraction and Automated Concept Generation can help, but someone will still need to sift and winnow the output – that’s why it’s so important to have a prior understanding of what users want and need to know.

5. Define the core areas of knowledge that need more depth in the taxonomy. As part of the evaluation process, you would need to define how deep and broad the taxonomy needs to be. If you have done a facet analysis, some of those questions will be answered. As a rule of thumb, core areas of knowledge need to have depth and structure.

6. Prepare for change. In fact, having a taxonomy that quickly recognizes new concepts might be a competitive advantage.  Test your taxonomy, and be prepared for change.  It means that taxonomy is open to new ideas from the people who are on the front lines of the market – customers, sales and marketing, customer service staff, librarians, the customer service department. It means new terminology can bubble from the bottom up! A taxonomy tool needs to allow for dynamic and flexible editing of terms to grow with changing enterprises and information needs in a global economy.

Is Taxonomy Dead?

Recently, Theresa Regli announced in a CMS Watch about predictions for 2009 that taxonomy is dead, and that metadata was the future. The argument for death sentence is that taxonomies are viewed as too authoritarian, that it might be possible to auto-generate topics and concepts through computer processes and finally, that the work of taxonomists is to police vocabulary, and not to invite a multiple views of information. So let’s examine this assumption.   So let’s confront a challenging information problem like health care insurance information systems. 

To begin, let’s take a look at some of the heavily-used consumer websites for health care information such as Medicare website (www.medicare.gov) and the widely-touted Massachusetts Health Care Program. In each system, take the challenge what you can find out about benefits for specific conditions like type of cancer, asthma or allergies. Try to figure out what coverage is for routine office visits.

What you will notice is that both Medicare and the Massachusetts State public-facing information sources are hard to search.

Medicare Home Page with Search Tools

Buried in Medicare under “Search Tools – Find Out What Medicare Covers“ and under “Find Out What Medicare Covers” is a picklist of about 150 alphabetically-arranged terms. A picklist is not  a taxonomy.  Let’s see what the picklist offers:

  • · Multiple terms for Wheelchairs and Powered Operated Vehicles (POVS) and Motorized Wheelchairs, which are POVs.   There are also multiple synonymous terms for Office Visit
  • · No overarching concept for “Equipment.
  • One term for all Surgical Services, but no specificity of terms by Surgical Specialty. That might lead to an assumption that all surgical services are covered.
  • Important concepts are missing. There is no entry for “Asthma” or “Psoriasis” or “Dermatology or many other common complaints or hundreds of procedures.
  • Multiple terms for Lab tests and Diagnostic Procedures with no overarching category and none of these terms are linked to standard medical coding systems.
  • Over time, it’s difficult to scroll through hundreds of unorganized terms
  • Picklists are not compatible with web accessibility needs, particularly important for the audience of health care (or any) website.

One of the problems is that taxonomists have NOT been involved in solving these serious information problems. What would a taxonomist do? Taxonomists help design other ways that users, such as consumers, patients, caretakers, advocates, doctors, insurance companies, and policy analysts look for information. They group terms in meaningful categories based on proven methodologies that are used to analyze predictable categories of knowledge. Taxonomists perform gap analysis to identify missing concepts. Some taxonomists work with auto-classification and ontological tools to develop rules and semantic models.

Wouldn’t it be useful to have a health care information system that look at care based on a various levels of modeling such as ”point of need” such as Routine Care, Non-Routine Care, Emergency Care, Rehabiliation and Restorative Care, Chronic Care (including preexisting conditions), and Life-Threatening and Palliative Care. At the lower, concrete levels, this taxonomy would connect to the detailed services, which could then be connected to cost control data.

Look at www.cancer.gov, while not providing health care insurance benefits, at least promotes finding information by type of cancer. http://www.Cancer.gov has a taxonomy that is faceted in that it is organized by types of cancer. Here is a good example of taxonomy at work and an example of what taxonomy can do to help make these interfaces simpler and more friendly to its audience.

I am a fan of faceted taxonomies, but now I am of the belief that simply categorizing a term to a canonical form might be sufficient, because it captures the context of the term in one moment in time. But as many as 80% categories of knowledge are predictable based on our shared knowledge and can be suggested as part of the web interface design process.   But taxonomies also need to friendly to user terminology.  Who cares if an office visit to the doctor is called “Wellness Visit” “Routine Visit”  or “A day at the beach” as long as the terms link back to the same basic concept.

Is taxonomy dead. Old style authoritarian taxonomies are gone, but taxonomies as capturing models of how we think are very much alive and very necessary to improve public access to important information. Words matter. Long live taxonomy!

