How to Teach Taxonomies

On several occasions   Heather Hedden, one of our authors,  gives full-day workshops on how to create taxonomies and other controlled vocabularies, and it’s interesting how different issues or problems arise in different sessions. It may be just one individual who has difficulty grasping a concept and asks lots of questions, but in answering the questions, it turns out that others have similar questions. It then becomes apparent that some principles, while simple on the general abstract level, can get muddled when we start looking at specific examples.

Here are Heather’s strategies that she’s  learned to help train new taxonomists in building taxonomies:

1. Use different methods of explanation to serve different backgrounds and mindsets

In one session I gave, an exercise on creating correct polyhierarchies had some people perplexed. The exercise was to propose two or more broader terms for “Paint Brushes.” Had I asked the participants to suggest just a single broader term, I probably would have gotten mostly correct answers, such as “Painting tools” or “Brushes.” But when having to propose two broader terms at once, many taxonomy students had gotten off-track and proposed a pair such as “Artists tools” and “Contractors’ tools.” This, of course, is not correct, since not all paint brushes are used by artists, and not all paint brushes are used by contractors. In one session, several minutes of attempted explanation to one individual were insufficient.

For the trainees who are more mathematical, an analogy with Boolean logic might make more sense. In such a case, we can say that the polyhierarchy represents the Boolean AND, rather than OR. The narrower concept must always belong to both Broader Term A AND Broader Term B, and thus the narrower term in a polyhierarchy represents the intersection or union of its two broader terms. For others with a reference searching or indexing background, however, it is necessary to explain the search and retrieval implications of the hierarchy. For example, “Paint brushes” is a term indexed to documents on all kinds of paint brushes, including those for commercial exterior painting. Thus, it would be incorrect to retrieve the latter documents with the broader term for “Artists’ tools.”

2. Teach the standards in the most practical sequence

At a recent corporate training I gave, the topic that was most challenging for the participants was associative relationships. The specific issue was what kinds of term pairs could legitimately use this kind of relationship. Listing all the possible types of term pairs for associative relationships as described in the ANSI-NISO Z39.19 standard (13 of them!) can add more confusion than needed. Training participants wanted to refer to this list when completing an exercise on proposing related terms. As a result, they proposed some rather unconventional related terms, that, while legitimate, were not entirely practical.

Teaching principles from the ANSI-NISO standard, I realized, should not necessarily rely on the same sequence or emphasis as written in the standard. It is probably better to present the list of 13 associative relationship types at the conclusion of an explanation on the associative relationship, rather than at the beginning. Similarly, I found that the ANSI-NISO standard’s order of presenting relationships between terms belonging to the same hierarchies followed by term terms belonging to different hierarchies may also not be best, since relationships between terms belonging to different hierarchies are more common.

3. Simplify demonstration exercises

Setting up a practice card-sorting exercise for classifying things can be a part of training session, but the card-sorting exercise should be modified. I had attempted once to create a number of cards (at least 25) considered desirable for a card-sorting exercise, only to find that the training participants only want to spend a few minutes on it, rather than the more intense 10-20 minutes that actual card-sorting participants would be expected to devote. Thus, a demonstration card-sort should be a much small set, along with the clear statement that an actual card-sorting exercise would be larger.

4. Revise sample exercises

Finally, the exercises included in training workshops may need to be tweaked after they are tried out. For example, asking for suggested broader terms for the term “Financial Management” obtained such varied and questionable results, that I had to remove that example and replace it with another. What I had in mind as broader terms were quite simple “Finance” and “Management,” but my audience was looking for other, more specific meanings of the term. Examples should not be ambiguous.

Conclusions

Even if you are not a professional taxonomy trainer, a lot of training is needed in the taxonomy field, and a lot of people learn how to create taxonomies on the job. Thus, if you are a taxonomist, you may find yourself required to teach taxonomy principles to others. Although some of it may come naturally to you and those you teach, other taxonomy principles will be more difficult to teach, and creative strategies are needed.

– Heather Hedden

Heather can be contacted for more information about her workshops and upcoming presentations at  Hedden Information Management.

5 Types of Taxonomies: From Lists to Ontologies

Taxonomy, strictly defined, is a hierarchical arrangement of terms, but the form of a taxonomy depends on the information problem at hand. After all, taxonomy is a method for organizing knowledge or concepts, which requires flexibility in how to capture and represent concepts. The complexity depends on factors such as what’s the core area of information for the application, user’s vocabulary, the size of the content collection and how much specificity is needed, how the content will be tagged or indexed, and how result sets will displayed and refined. The taxonomy is not an end, but a means to help users navigate information, find out what is in the collection, and get to meaningful results. And most of all, the taxonomy needs to provide clear, unambiguous access to information.

Here’s a primer on the basic ways to organize concepts:

Form 1: Lists (picklists, authority lists or controlled vocabularies)

Good Ol’ Picklists ensure that a specific term is when creating or searching content. A picklist is really a list of lead or preferred terms such as Geographical Names, and/or other proper names including proper names for people, organizations or projects. Certainly not many of us can properly spell the name of the current Iranian President (Ahmadinejad) so it makes sense to pick that off a pre-defined list. The problem with picklists is that they are often buried in applications instead of right there on the home page as a search assistance or the design is so tied to the relational database design that you have to drill multiple levels to get to a reasonable query. That means misery for the searcher as well as the database programmer.

Many excellent databases have transformed their picklists and controlled vocabularies into picklists that can be searched from the home page. For a great example of a taxonomy as picklist, look at Proquest or Cars. Com. These content sources manage their picklists as taxonomies, but each taxonomy is a clearly defined list of terms. Combine 2 or more of these lists and voila! You now have a faceted taxonomy where the user can now browse your content from the homepage.  (of course, you need a powerful content management software as well but that’s another story).

Form 2: Synonym Lists

Synonyms are a wonderful use of taxonomies which are easy to track in taxonomy tools and spreadsheets, but that surprisingly difficult to implement on the User Interface. If the Search Box is used, you will need a taxonomy rich in synonyms so that users don’t have to worry about the preferred form of a term or even a misspelling. Why do synonyms matter? First, they can be used to track words that mean the same thing such as “car” and “auto” or “automobile.” For example, the environmental movement prefers the term “Climate Change” be used instead of” Global Warming”. The use of synonyms allows one concept to be instantiated as the same as the other, but still allows a term to be preferred over another.
But synonyms can be used to assist search in other ways. Synonyms can be used for
• alternate spellings, such as British versus American spellings of terms like “organization” vs. “’organisation”
• allow search misspellings such as alternate spellings of proper names or even common words (Honestly, how do you spell “broccoli?”)
• allow alternate versions of proper names such as Hillary Clinton for Hillary Rodham Clinton
• creates variations on concepts or phrases such as allow “Current Iranian President” to be used as an variant for Ahmadinejad, which is a name that few of us can spell correctly.
In other words, have a liberal and generous policy about what’s a synonym, but be sure to test your application as you may get some unexpected false results as well because there will always be ambiguities. Adding synonyms to search is surprisingly challenging to implement. That’s why synonym-based searched systems are often paired with autocategorization rules-based systems such as Teragram but that’s a topic for another blog. On the other hand, if as terms evolve, adding synonyms to a taxonomy is a quick way to improve access without changing the database.

Form 3: Hierarchies

Taxonomies are used to create the familiar drill-down type of interfaces. Traditionally viewed as hierarchies or tree structure, hierarchies capture the following types of interrelationships:
• Parent/child
• Broad Term/Narrow term
• Is a part of
• Is a type of
The biggest mistake made in creating hierarchies is by associating items that do not have an inherent hierarchical relationship. If you are creating a hierarchy with two terms that do not fall in the links above, you may be better of considering building two separate lists (or facets) — that’s the start of a faceted navigation. You have to conduct a logical “sniff test” when constructing hierarchies. Let’s go back to our government example. Let’s say you are building a simplistic hierarchy with United States as a term and Hillary Clinton as a narrow term. But Hillary Clinton is not actually a parent/child, part of, or type of United States. That is not a true hierarchical relationship. In that case, wouldn’t it be better to think about modeling how users look for information about government and then structure separate taxonomies — Countries, Leadership and perhaps a third facet for Forms of Government. So the best rule in creating hierarchies to make sure hierarchies are about the same category of knowledge. Taxonomy geeks called this “orthogonal.” Better for the programmer and better for the end user.

Form 4: Faceted Navigation and Thesauri

If you stepped through the process above, the look how fast and easily you moved from Lists (authority or otherwise) to Hierarchies to Faceted Navigation. A faceted navigation is basically hierarchical taxonomies that have been normalized and categorized so that terms do not cross categories. . Thesauri is a fully-fleshed out taxonomy where all the synonyms and hierarchies within a category are labeled. Thesauri also allow related terms. Related terms or associative relations are links between categories of terms. Faceted design has several advantages
• By having a faceted structure, you can begin the process of disambiguating terms. For example, if my application is about House and Garden Design where the term “”green” is common, I might have Green Building Products under the Products category, while “green” as a color would be in the color and decorating category. By categorizing terms under the appropriate facet, the term is now unique based on the meaning in context. Thus, there are now two distinct, disambiguated topics.
• Take the government example we are building above where we have taxonomies for countries, leaders and government structure. By creating a top-level of facets or categories, you now building a model of the domain. If a country changes leaders, or forms of government, I can change that concept without reindexing or relinking my entire application. And now I have the added benefit of having a framework to build User Interfaces that might be easier to navigate because I have designed a better conceptual framework. I am also now well on my way to designing an ontology. How? Read on:

Form 5: Ontologies

An ontology is basically a faceted taxonomy where all the ambiguities have been resolved and where all the concepts have been described as completely as possible. The other feature of ontology is the potential use of links or RDFa as a language to describe the links between the categories and terms. Now there is one more step in my progression from lists to ontologies — to create links between categories. For example, we know Countries Have Governments, and that Governments have Leaders. The phrases are called triples, which is a subject and object inked with a predicate. While many of the issues about how to implement ontologies are still cooking, so to say, it is worth thinking through how to implement ontology. After all, good information access is about clarifying questions and resolving ambiguity. The downside of ontologies is the inferencing. For example, if you look at Friend-of-a-Friend (FOAF) application, we all know that we know people with diverse interests and beliefs, but those are not our beliefs. This type of syllogistic inferencing might have unintended negative consequences so be judicious.

Take a renewed look at those picklists, and start to see the connections between those terms. You’ll be on the first step towards styling your taxonomies and building unambiguous, powerful ontologies.

~ Marlene Rockmore

A Well-Planned Taxonomy

Recently, I ran into a neighbor who is a VP at a high-tech firm working on speech recognition, so I asked if she was using taxonomies. “To me, Tom Brady is a topic and that’s enough. It’s too much work to build hierarchies.” But for me, there is way too much information about Tom Brady. I’d like to be able to find information based Tom Brady’s statistics, or how he is managed, or maybe, something about his social life.

Taxonomies are not just about hierarchies or long lists of terms. Taxonomies exist to capture how users look for information. For example, if I am interested in “Food Policy”, I might want to know where food is produced, what is added to food (food additives), how food is distributed, and where food is needed to prevent hunger, including local food banks.

A taxonomy term has to be categorized to have any meaning.  The process of categorization is called facet analysis, and here’s why it’s necessary:

  • Reduces the complexity of thousands of terms into smaller, manageable categories
  • Provides semantic, contextual meaning for a term including the power to disambiguate terms
  • Allows connections to be made between categories that can be inherited (but carefully)
  • Provides ability to recognize gaps in information
  • Provides ability to reuse concepts for multiple applications, or to identify local variations of a vocabulary
  • Provides ability to focus on important topics

For example, in one project, I was handed a taxonomy that had 4,000 terms that we reduced to 9 top nodes. In addition to improving search, we noticed another effect. Our computer products facet included attributes such as supercomputers, minicomputers and personal computers. As our application was tied to a search interface, we began to notice the uptick in searches on laptops and personal computers, which became indicative of changing demand in a changing market,    Similarly,  on another project,  we noticed emerging concepts around “Green Business” “Social Responsibility” and “Business Ethics.” One of the goals of that implementation project was to make it easy for the  taxonomy editor  to add these concepts and realign content to meet these new demands.

That’s why it’s important to integrate social networking with taxonomy tools. Terminology, whether suggested through social networks or  formally produced, increase their value  when they are linked through categorization. Be sure to evaluate your taxonomy to make sure it is categorized. I’ve heard horror stories recently of organizations with thousands of terms that were not defined or categorized.

A well-managed taxonomy can be a strategic tool to like the “canary in the mine” to help identify emerging concepts.

canary on a branch

canary on a branch


So take the planning or revisionof the taxonomy seriously. It is an opportunity to find out what the organization knows, how different groups inside and outside the organization express what they know, what an organization wants to know, and what gaps are in their content and knowledge.

Here’s a five point plan.

1. Understand the expectations and information needs of stakeholders, endusers, technical staff and production work including information flows, and bottlenecks. Gather information. Listen to what different levels perceive as existing problems and compare to what exists. Learn how indexing is currently done and what the issues are with search and terminology management. Acknowledge what works well, and discover what problems exist. Pay attention to how terminology is used in different context.

2. Develop a clear set of requirements based on needs of the organization. Determine project goals. For some organizations, the ability to tie vocabulary to search will be imperative, while other organizations need to find ways to come to common agreements about standard terminology across diverse entities. Is the taxonomy to being used to manage metadata or is it being used to search and index full-text? Is the application managing non-digital assets like people, services, and projects? How immediate are the information needs? Does a vast amount of content need to be indexed quickly which might lead to an auto-categorization solution? What statistics will demonstrate the value of the taxonomy? Are similar terms used in different context? Take, for example, a company name — a company can be simultaneously a product supplier, competitor, customer, and strategic partner. Is there a need to represent multiple views of the same term?

3. Create a deeper understanding of user needs by building a model of the domain. Without categorization, taxonomy can become a long, unwieldy list of terms that lack meaning and context. By placing a term in a category can add meaning. Use the techniques of ontological type analysis to abstract categories and create information models that link concepts (in semantic modeling, this would be creating RDF schema).  Visio or Topic Mapping can help capture these connections visually.

4. Obtain a strong set of detailed test terms by collecting terms from a variety of activities including card sorts, search analytics, content analysis, deeper text analytics, and entity extraction that represent both user need and content. Users can be involved in this process. Automated tools can help here if your content is accessible. Entity Extraction and Automated Concept Generation can help, but someone will still need to sift and winnow the output – that’s why it’s so important to have a prior understanding of what users want and need to know.

5. Define the core areas of knowledge that need more depth in the taxonomy. As part of the evaluation process, you would need to define how deep and broad the taxonomy needs to be. If you have done a facet analysis, some of those questions will be answered. As a rule of thumb, core areas of knowledge need to have depth and structure.

6. Prepare for change. In fact, having a taxonomy that quickly recognizes new concepts might be a competitive advantage.  Test your taxonomy, and be prepared for change.  It means that taxonomy is open to new ideas from the people who are on the front lines of the market – customers, sales and marketing, customer service staff, librarians, the customer service department. It means new terminology can bubble from the bottom up! A taxonomy tool needs to allow for dynamic and flexible editing of terms to grow with changing enterprises and information needs in a global economy.