Skills of a Classy Taxonomist

At SemTech in June 2010,  several speakers including Professor Deb McGuiness drew a very clear line was drawn between what a taxonomist does and what an ontologist does.  Taxonomists build hierarchies, and ontologists determine classes or categories.   In other words, ontologies are neat and unambiguous, and taxonomies are a bit messy.

Defining classes or ontology work  typically precedes building the taxonomy.  Defining the classes is like writing a specification for the taxonomy; in fact defining classes is the same as defining facets.   The goal of a taxonomist and ontologies should be to define a specific, unambiguous description of a term that helps manage how we find and organize content so the pathways are clear and specific; adding an ontology ensures that the term is placed in the most specific categories to help ensure clarity and lack of ambiguity. I would argue that no taxonomy is useful unless it is faceted – that is, has been divided into classes. Taxonomies work best when they share homogenous properties, and when they are smaller and focused.

By using class analysis, or facet analysis,  several problems are solved:

1)       Clarify specific terms by situation or functions: If I am interested in Java as a programming language, I want to see material related to Java as software, not as slang for coffee or  an island in the South Pacific.  If I am looking for “drill bits,”  it might be important to understand if the drill bits are for my home electric screwdriver  or for an oil rig.   Classes capture these distinctions, and help to create precise specific tagging and information retrieval.

2)       Ease longterm  maintenance issues: Christine Connors points to a simple but common example where taxonomies are built where people’s names are included as narrow terms under the role such as “Hillary Clinton” is “Secretary of State”  or “Charles Windsor” is the “Prince of Wales.” The problem is that when people filling these roles change, there is a maintenance headache.   A classy taxonomy recognizes that there is a separate class for <people> as an entity, as distinguished from <role>.  <People> and <Role>  can be connected by a predicate such as <isA>.  These distinctions are necessary for fast-changing information (such as who is dating whom in an entertainment application) or (who owns whom in a business application).

Abstraction <person> <has> <role>

Instance: Hillary Clinton <is>  Secretary of State

3)    Facilitate sharing  and importing taxonomies: Having taxonomies that are specified by a class description means the taxonomy will be more homogenous, have shared properties, and be more focused.  This will make it easier to import with less cleanup and review.  It will facilitate the use of SKOS for example. Messy taxonomies are harder to merge.

Anyone working with semantic technologies will tell you that most problems in inference happen when hierarchies in source taxonomies create odd associations by inserting a narrow or broad term. A taxonomist needs to be attentive to inferences in order to prevent false statements.   Professor Deb McGuiness calls this issue “truth maintenance.”

To keep these categories clear and distinct, ontologists rely on building a conceptual model or a picture of the domain (see earlier post on Taxonomies and modeling.)   Modeling strategies involve skills of most taxonomists.  Most taxonomists have been taught how to capture vocabulary and how to identify facets.  Check out the blog post Taxonomies and Modelling for more information.

Elaine Kendall  of Sandpiper Software, which is a concept-modelling tool.  suggested that “one could build an ontology in 2 hours.”   With new generation of tools that can create RDF/OWL from data and content,  this statement might be true.

    With good modelling tools that automatically generate RDF/OWL,such as TopQuadrant,  taxonomists might  be able to slide into the needed role as ontologists.  Taxonomists need to understand  some basic concepts in RDF/OWL to extend their skills such as what is a class, what is a property and what is a slot facet, what is class inheritance, what is meant by reciprocation and inverse properties and how to write a SPARQL query.  But more importantly,  a classy taxonomist can help become a facilitator to help build bridges between user and development communities and  to help diagnose and prevent technical problems.

    A taxonomist who is trained in ontologies  should bring the following skills:

    • Ability to create processes to identify the requirements for each class,
    • Develop  metrics to assess good results
    • Identify what vocabularies are needed and use skills to evaluate existing vocabularies, import and adapt these vocabularies to the current needs
    • Ensure the integrity and focus of vocabularies particularly when sourced from an outside vendor,
    • Develop processes to keep vocabularies current, and understand how to use metrics to “measure and improve” any vocabularies.
    • To be part of the development team to help identify if a source vocabulary might be part of false inference.

    The taxonomist works with different user communities as well as developers and helps bridge the gap between what users and experts know and what is needed to build a useful application.   A classy taxonomist has a well-rounded set of skills that can work with development teams and user organizations to build intelligent systems.

    Enhanced by Zemanta

    Is Taxonomy Dead?

    Recently, Theresa Regli announced in a CMS Watch about predictions for 2009 that taxonomy is dead, and that metadata was the future. The argument for death sentence is that taxonomies are viewed as too authoritarian, that it might be possible to auto-generate topics and concepts through computer processes and finally, that the work of taxonomists is to police vocabulary, and not to invite a multiple views of information. So let’s examine this assumption.   So let’s confront a challenging information problem like health care insurance information systems. 

    To begin, let’s take a look at some of the heavily-used consumer websites for health care information such as Medicare website ( and the widely-touted Massachusetts Health Care Program. In each system, take the challenge what you can find out about benefits for specific conditions like type of cancer, asthma or allergies. Try to figure out what coverage is for routine office visits.

    What you will notice is that both Medicare and the Massachusetts State public-facing information sources are hard to search.

    Medicare Home Page with Search Tools

    Medicare Home Page with Search Tools

    Buried in Medicare under “Search Tools – Find Out What Medicare Covers“ and under “Find Out What Medicare Covers” is a picklist of about 150 alphabetically-arranged terms. A picklist is not  a taxonomy.  Let’s see what the picklist offers:

    • · Multiple terms for Wheelchairs and Powered Operated Vehicles (POVS) and Motorized Wheelchairs, which are POVs.   There are also multiple synonymous terms for Office Visit
    • · No overarching concept for “Equipment.
    • One term for all Surgical Services, but no specificity of terms by Surgical Specialty. That might lead to an assumption that all surgical services are covered.
    • Important concepts are missing. There is no entry for “Asthma” or “Psoriasis” or “Dermatology or many other common complaints or hundreds of procedures.
    • Multiple terms for Lab tests and Diagnostic Procedures with no overarching category and none of these terms are linked to standard medical coding systems.
    • Over time, it’s difficult to scroll through hundreds of unorganized terms
    • Picklists are not compatible with web accessibility needs, particularly important for the audience of health care (or any) website.

    One of the problems is that taxonomists have NOT been involved in solving these serious information problems. What would a taxonomist do? Taxonomists help design other ways that users, such as consumers, patients, caretakers, advocates, doctors, insurance companies, and policy analysts look for information. They group terms in meaningful categories based on proven methodologies that are used to analyze predictable categories of knowledge. Taxonomists perform gap analysis to identify missing concepts. Some taxonomists work with auto-classification and ontological tools to develop rules and semantic models.

    Wouldn’t it be useful to have a health care information system that look at care based on a various levels of modeling such as ”point of need” such as Routine Care, Non-Routine Care, Emergency Care, Rehabiliation and Restorative Care, Chronic Care (including preexisting conditions), and Life-Threatening and Palliative Care. At the lower, concrete levels, this taxonomy would connect to the detailed services, which could then be connected to cost control data.

    Look at, while not providing health care insurance benefits, at least promotes finding information by type of cancer. has a taxonomy that is faceted in that it is organized by types of cancer. Here is a good example of taxonomy at work and an example of what taxonomy can do to help make these interfaces simpler and more friendly to its audience.

    I am a fan of faceted taxonomies, but now I am of the belief that simply categorizing a term to a canonical form might be sufficient, because it captures the context of the term in one moment in time. But as many as 80% categories of knowledge are predictable based on our shared knowledge and can be suggested as part of the web interface design process.   But taxonomies also need to friendly to user terminology.  Who cares if an office visit to the doctor is called “Wellness Visit” “Routine Visit”  or “A day at the beach” as long as the terms link back to the same basic concept.

    Is taxonomy dead. Old style authoritarian taxonomies are gone, but taxonomies as capturing models of how we think are very much alive and very necessary to improve public access to important information. Words matter. Long live taxonomy!

    A pdf version of this article will be available on website