Taxonomy, strictly defined, is a hierarchical arrangement of terms, but the form of a taxonomy depends on the information problem at hand. After all, taxonomy is a method for organizing knowledge or concepts, which requires flexibility in how to capture and represent concepts. The complexity depends on factors such as what’s the core area of information for the application, user’s vocabulary, the size of the content collection and how much specificity is needed, how the content will be tagged or indexed, and how result sets will displayed and refined. The taxonomy is not an end, but a means to help users navigate information, find out what is in the collection, and get to meaningful results. And most of all, the taxonomy needs to provide clear, unambiguous access to information.
Here’s a primer on the basic ways to organize concepts:
Form 1: Lists (picklists, authority lists or controlled vocabularies)
Good Ol’ Picklists ensure that a specific term is when creating or searching content. A picklist is really a list of lead or preferred terms such as Geographical Names, and/or other proper names including proper names for people, organizations or projects. Certainly not many of us can properly spell the name of the current Iranian President (Ahmadinejad) so it makes sense to pick that off a pre-defined list. The problem with picklists is that they are often buried in applications instead of right there on the home page as a search assistance or the design is so tied to the relational database design that you have to drill multiple levels to get to a reasonable query. That means misery for the searcher as well as the database programmer.
Many excellent databases have transformed their picklists and controlled vocabularies into picklists that can be searched from the home page. For a great example of a taxonomy as picklist, look at Proquest or Cars. Com. These content sources manage their picklists as taxonomies, but each taxonomy is a clearly defined list of terms. Combine 2 or more of these lists and voila! You now have a faceted taxonomy where the user can now browse your content from the homepage. (of course, you need a powerful content management software as well but that’s another story).
Form 2: Synonym Lists
Synonyms are a wonderful use of taxonomies which are easy to track in taxonomy tools and spreadsheets, but that surprisingly difficult to implement on the User Interface. If the Search Box is used, you will need a taxonomy rich in synonyms so that users don’t have to worry about the preferred form of a term or even a misspelling. Why do synonyms matter? First, they can be used to track words that mean the same thing such as “car” and “auto” or “automobile.” For example, the environmental movement prefers the term “Climate Change” be used instead of” Global Warming”. The use of synonyms allows one concept to be instantiated as the same as the other, but still allows a term to be preferred over another.
But synonyms can be used to assist search in other ways. Synonyms can be used for
• alternate spellings, such as British versus American spellings of terms like “organization” vs. “’organisation”
• allow search misspellings such as alternate spellings of proper names or even common words (Honestly, how do you spell “broccoli?”)
• allow alternate versions of proper names such as Hillary Clinton for Hillary Rodham Clinton
• creates variations on concepts or phrases such as allow “Current Iranian President” to be used as an variant for Ahmadinejad, which is a name that few of us can spell correctly.
In other words, have a liberal and generous policy about what’s a synonym, but be sure to test your application as you may get some unexpected false results as well because there will always be ambiguities. Adding synonyms to search is surprisingly challenging to implement. That’s why synonym-based searched systems are often paired with autocategorization rules-based systems such as Teragram but that’s a topic for another blog. On the other hand, if as terms evolve, adding synonyms to a taxonomy is a quick way to improve access without changing the database.
Form 3: Hierarchies
Taxonomies are used to create the familiar drill-down type of interfaces. Traditionally viewed as hierarchies or tree structure, hierarchies capture the following types of interrelationships:
• Broad Term/Narrow term
• Is a part of
• Is a type of
The biggest mistake made in creating hierarchies is by associating items that do not have an inherent hierarchical relationship. If you are creating a hierarchy with two terms that do not fall in the links above, you may be better of considering building two separate lists (or facets) — that’s the start of a faceted navigation. You have to conduct a logical “sniff test” when constructing hierarchies. Let’s go back to our government example. Let’s say you are building a simplistic hierarchy with United States as a term and Hillary Clinton as a narrow term. But Hillary Clinton is not actually a parent/child, part of, or type of United States. That is not a true hierarchical relationship. In that case, wouldn’t it be better to think about modeling how users look for information about government and then structure separate taxonomies — Countries, Leadership and perhaps a third facet for Forms of Government. So the best rule in creating hierarchies to make sure hierarchies are about the same category of knowledge. Taxonomy geeks called this “orthogonal.” Better for the programmer and better for the end user.
Form 4: Faceted Navigation and Thesauri
If you stepped through the process above, the look how fast and easily you moved from Lists (authority or otherwise) to Hierarchies to Faceted Navigation. A faceted navigation is basically hierarchical taxonomies that have been normalized and categorized so that terms do not cross categories. . Thesauri is a fully-fleshed out taxonomy where all the synonyms and hierarchies within a category are labeled. Thesauri also allow related terms. Related terms or associative relations are links between categories of terms. Faceted design has several advantages
• By having a faceted structure, you can begin the process of disambiguating terms. For example, if my application is about House and Garden Design where the term “”green” is common, I might have Green Building Products under the Products category, while “green” as a color would be in the color and decorating category. By categorizing terms under the appropriate facet, the term is now unique based on the meaning in context. Thus, there are now two distinct, disambiguated topics.
• Take the government example we are building above where we have taxonomies for countries, leaders and government structure. By creating a top-level of facets or categories, you now building a model of the domain. If a country changes leaders, or forms of government, I can change that concept without reindexing or relinking my entire application. And now I have the added benefit of having a framework to build User Interfaces that might be easier to navigate because I have designed a better conceptual framework. I am also now well on my way to designing an ontology. How? Read on:
Form 5: Ontologies
An ontology is basically a faceted taxonomy where all the ambiguities have been resolved and where all the concepts have been described as completely as possible. The other feature of ontology is the potential use of links or RDFa as a language to describe the links between the categories and terms. Now there is one more step in my progression from lists to ontologies — to create links between categories. For example, we know Countries Have Governments, and that Governments have Leaders. The phrases are called triples, which is a subject and object inked with a predicate. While many of the issues about how to implement ontologies are still cooking, so to say, it is worth thinking through how to implement ontology. After all, good information access is about clarifying questions and resolving ambiguity. The downside of ontologies is the inferencing. For example, if you look at Friend-of-a-Friend (FOAF) application, we all know that we know people with diverse interests and beliefs, but those are not our beliefs. This type of syllogistic inferencing might have unintended negative consequences so be judicious.
Take a renewed look at those picklists, and start to see the connections between those terms. You’ll be on the first step towards styling your taxonomies and building unambiguous, powerful ontologies.
~ Marlene Rockmore