Skills of a Classy Taxonomist

At SemTech in June 2010,  several speakers including Professor Deb McGuiness drew a very clear line was drawn between what a taxonomist does and what an ontologist does.  Taxonomists build hierarchies, and ontologists determine classes or categories.   In other words, ontologies are neat and unambiguous, and taxonomies are a bit messy.

Defining classes or ontology work  typically precedes building the taxonomy.  Defining the classes is like writing a specification for the taxonomy; in fact defining classes is the same as defining facets.   The goal of a taxonomist and ontologies should be to define a specific, unambiguous description of a term that helps manage how we find and organize content so the pathways are clear and specific; adding an ontology ensures that the term is placed in the most specific categories to help ensure clarity and lack of ambiguity. I would argue that no taxonomy is useful unless it is faceted – that is, has been divided into classes. Taxonomies work best when they share homogenous properties, and when they are smaller and focused.

By using class analysis, or facet analysis,  several problems are solved:

1)       Clarify specific terms by situation or functions: If I am interested in Java as a programming language, I want to see material related to Java as software, not as slang for coffee or  an island in the South Pacific.  If I am looking for “drill bits,”  it might be important to understand if the drill bits are for my home electric screwdriver  or for an oil rig.   Classes capture these distinctions, and help to create precise specific tagging and information retrieval.

2)       Ease longterm  maintenance issues: Christine Connors points to a simple but common example where taxonomies are built where people’s names are included as narrow terms under the role such as “Hillary Clinton” is “Secretary of State”  or “Charles Windsor” is the “Prince of Wales.” The problem is that when people filling these roles change, there is a maintenance headache.   A classy taxonomy recognizes that there is a separate class for <people> as an entity, as distinguished from <role>.  <People> and <Role>  can be connected by a predicate such as <isA>.  These distinctions are necessary for fast-changing information (such as who is dating whom in an entertainment application) or (who owns whom in a business application).

Abstraction <person> <has> <role>

Instance: Hillary Clinton <is>  Secretary of State

3)    Facilitate sharing  and importing taxonomies: Having taxonomies that are specified by a class description means the taxonomy will be more homogenous, have shared properties, and be more focused.  This will make it easier to import with less cleanup and review.  It will facilitate the use of SKOS for example. Messy taxonomies are harder to merge.

Anyone working with semantic technologies will tell you that most problems in inference happen when hierarchies in source taxonomies create odd associations by inserting a narrow or broad term. A taxonomist needs to be attentive to inferences in order to prevent false statements.   Professor Deb McGuiness calls this issue “truth maintenance.”

To keep these categories clear and distinct, ontologists rely on building a conceptual model or a picture of the domain (see earlier post on Taxonomies and modeling.)   Modeling strategies involve skills of most taxonomists.  Most taxonomists have been taught how to capture vocabulary and how to identify facets.  Check out the blog post Taxonomies and Modelling for more information.

Elaine Kendall  of Sandpiper Software, which is a concept-modelling tool.  suggested that “one could build an ontology in 2 hours.”   With new generation of tools that can create RDF/OWL from data and content,  this statement might be true.

    With good modelling tools that automatically generate RDF/OWL,such as TopQuadrant,  taxonomists might  be able to slide into the needed role as ontologists.  Taxonomists need to understand  some basic concepts in RDF/OWL to extend their skills such as what is a class, what is a property and what is a slot facet, what is class inheritance, what is meant by reciprocation and inverse properties and how to write a SPARQL query.  But more importantly,  a classy taxonomist can help become a facilitator to help build bridges between user and development communities and  to help diagnose and prevent technical problems.

    A taxonomist who is trained in ontologies  should bring the following skills:

    • Ability to create processes to identify the requirements for each class,
    • Develop  metrics to assess good results
    • Identify what vocabularies are needed and use skills to evaluate existing vocabularies, import and adapt these vocabularies to the current needs
    • Ensure the integrity and focus of vocabularies particularly when sourced from an outside vendor,
    • Develop processes to keep vocabularies current, and understand how to use metrics to “measure and improve” any vocabularies.
    • To be part of the development team to help identify if a source vocabulary might be part of false inference.

    The taxonomist works with different user communities as well as developers and helps bridge the gap between what users and experts know and what is needed to build a useful application.   A classy taxonomist has a well-rounded set of skills that can work with development teams and user organizations to build intelligent systems.

    Enhanced by Zemanta


    This week, I am at the 201o Semantic Technology conference where there are technologists who have built ontologies.   So this seems like the location to find out  what exactly is the difference between an ontology and a taxonomy and what skills will matter.

    In the ontology world, a taxonomy strictly speaking, is a hierarchical arrangement of terms.   Taxonomists populate term nodes and decide what the form of the term is, any variants, equivalents, and semi-equivalents and create hierarchies.   Ontologists do the heavy lifting — they decide what the classes will be and define the links and generate RDF and OWL.

    But there is a bright spot in this rather dull picture of  taxonomy work.   The most progressive and insightful taxonomists insist on sorting terms into facets or classes. These facets are derived from an analysis of user needs, content, and domain knowledge.   The core of an ontologists work is   also to define classes or facets and links between classes.   These links between facets can then be inherited or asserted between classes.    A taxonomist who hasn’t thought about classes and design will create a taxonomy that looks like spaghetti, and an ontologist who lacks that skill can create an ontology that makes bad inferences and assertions.

    The bottom line is that there is overlap between taxonomy and ontology — so I would like to suggest a term to describe this synergy:  Taxo-ology.    By thinking in terms of Taxo-ology,  we can begin to overlap and have synergy between taxonomists and ontologists:

    • Facets and classes:  Both taxonomists and ontologists need to create classes in which to classify terms.
    • Discipline in Creating Homogenous Hierarchies:  Hierarchies, ideally, should have homogenous properties. For example, Secretary of State is a constitutional office of the United States;  Hillary Rodham Clinton is filling that role, but it is one of many roles she has had.  Christine Connors,  a semantic web guru, uses “Prince of Wales” as her example. That role is there whether or not Charles is Prince.  It is part of the institution of English Monarchy.   Even for the practical reason of longterm maintenance,  these entities need to be in their own class (facet) and linked.
    • Greater Use of Linkages using Associative Relationships: Once terms are sorted homogenous buckets, associative relationships (sometimes with semantic labels for the relationship) can be used to link between classes or  term nodes within a class
    • Better Skill Sets:   Someone who is a Taxo-ologist knows how to use rich ontology tools, like TopBraid, understands OWL and XML output but can also adapt to other tools and content management software such as auto-categorization.  A taxo-ologist can apply the best practices of building classes/facets, homogenous hierarchies, and developing associative relationships
    • Better models for paying Taxo-ologists:  Taxonomists sometimes get paid by the number of terms built-out, but in the world of taxo-ology, compensation needs to be based on results — sometime strategic (is our organization collecting, sharing and exchanging the information  changing market, technical and economic conditions) to tactical need to the right SOP at the right time.  Search, for example, is a great example of how less is more, when good tax0-ologists can make smaller, sleeker taxologies  that can be uses to auto-tag concepts across facets.  Or they create smaller taxonomies that have higher matches to user queries because of use of variants.

    Taxologists seems like a good word to help bridge the gap between these disciplines, but there needs to be a discussion and synergy between the taxo community and the ontology world.     Taxonomists to apply more discipline to how they do their work and embrace the autocategorization and semantic tools that make it easier to process content.    The semantic world can save some time  in its development process by learning from the practical experience taxonomists have built by being in the enterprise, libraries, doing card sorts, understanding user experience, analyzing content, and merging all that with domain knowledge.

    My goal this week is to find out more about what will help semantic technologies gain more traction, what are the practical, killer applications, and what are the future skills.    Be sure to stop by Christine’s booth to find out more about how ontologists can help with strategic information management and technical integration with semantic web technologies.

    ~ Marlene Rockmore (blogging from SemTech San Francisco 2010)