Taxonomies and Modeling

A thread on  the Taxonomy Community of Practice (TaxoCop) discussion board peeked my interest.  The question was whether there was an overlap between taxonomies and data modeling.   On the surface, there might not seem to be overlap because data modeling tries to make sense of all the data elements in a data dictionary or schema, but taxonomies are also high-level representations of the content or data, which makes taxonomies a kind of a model.

To me, taxonomies work best when the terms in the taxonomy are grouped into  facets, which are terms which have shared properties.  These facets could also be called  classes or business entitities.      In fact, I would be so bold as to suggest that taxonomies can be defined as  are collections of unambiguous terms grouped into facets.

When I build taxonomies or the first thing I would prefer to do, ( after defining  the information problem, selling the strategy, creating the metadata model, and defining the proof of concept)  the taxonomy is to  figure out the overall conceptual model. For example, here is a taxonomy for a  project for  for a sales and marketing system where there was a requirement to track products, companies and applications.

Below is a small example of  model that was built for this application. I recently put the model into a cool tool called CMAP.

Sales and Marketing Conceptual Map

Sales and Marketing Conceptual Map

This taxonomy was created based on a multiprong approach where we

  • Captured user query terms from search logs and from existing taxonomies
  • Sorted these terms into facets to create a starter taxonomy (this is sometimes called a strawman taxonomy — remember the taxonomy is a collection of facets)
  • Got the funding and support to conduct 5 cross-functional mind mapping sessions with mixed groups of stakeholders and users to validate the taxonomy
  • Developed an enterprise model of all the facets which became the basis for the longterm implemenation plan

Each of the boxes in the diagram represents a facet, which can then be defined as an authority list, equivalence or variants or hierarchies.  The model also clarifies which  terms are associated concepts. For example,  in the model above, the facet applications  can be associated with the platforms that they run on.  Run-On is a form of a related term, but it is a user-defined relationship.

Software Product Model with User-defined relationship

This model  becomes the driver for defining the facets required by the particular application.   Facets then become fields in the metadata. The last thing I figure out is  what kinds of relationships are needed — whether the facet should be a list of preferred terms, or variants or a hierarchy.  Building taxonomies using conceptual modeling makes it easier to find the associated facets.

At this point, the model seems ready for semantic tools  but this model can be used for more common  applications.

  • It is easier to discover which facets should become associated facets.
  • It’s easier to write specifications for how facets should be  built and populated
  • It is also easier to figure who has authority for update and control of each facet, and where there might be overlapping jurisdiction.
  • UI  designers get pumped because they  have a knowledge organization diagram to help energize their creative juices to create innovative interface design.
  • It becomes easier to develop policies about how content should be tagged such as if a content is about a product, it should also be tagged to the manufacturer as well as other attributes

Models don’t have to be used solely with ontology software.  These models can be used to help figure out fields in a database or used with automated categorization or can be used to assist user inteface design.

Perhaps,  taxonomists should work to take more control over  creating models.    Taxonomists are perceived as “hierarchy builders;”  while ontologists are seen as modelers, particularly if they are using an Ontology tool, like  Protege or Top Quadrant composer.   If taxonomists could become modelers, we might be able to better explain what we do, why it matters, and help create some innovative systems.  Taxonomists understand how to create models and knowledge representations that can capture  community norms through validation techniques.

The point of models is that they help in general in reducing complexity and seeing the big picture.  By modeling, some organizations might start to see the advantage of building the taxonomy before designing interfaces.  These models are  be agnostic — that is that it  can be integrated into an architecture to work with many different technologies and content from sophisticate semantic systems to the everyday database with fielded data, without worrying about the underlying platform. Keith DeWeese commented in the interesting discussion thread that the ” Ontology should be done before the taxonomy.”   But perhaps “taxonomy”  can become a new code word for both modeling and terminology management.

Taxonomy and “ Political Regeneration”

2009 began with the declaration that taxonomy was dead.  In 2010,  I want to suggest that taxonomies have a role  to  play in  regeneration.   I recently reread an influential essay by  George Orwell, called “Politics and the English Language.” Orwell’s essay is about writing,  but it is a request to  choose carefully how we label our experience.   Orwell writes,  language is “full of bad habits. … To get rid of these habits is to think more clearly, and to think more and to think clearly is a necessary first step toward political regeneration.”

Orwell’s essay, written at the end of WWII, was a  quest was to end the bureaucratic language that led to the Holocaust and Stalinization and  that gave us us  desensitizing phrases like “collateral damage”  or “pacification”  but despite Orwell’s large polemics, on rereading his essay, I realized he had an important insights for taxonomies  — and why how we label and categorize matters.

Orwell exerts us to exercise  mental energy to  construct meaningful, vivid and lively labels.   Orwell has a few good rules that probably should be added to the list of taxonomy editing guidelines.

  • Avoid overused  or dying  metaphors and phrases
  • Use more action words, and avoid the passive voice
  • Avoid pretentious words

Think of all the words that emerged in 2009 could use some more  complete taxonomic description to understand what they really meant:   “Health care reform” “death panels”  “single payer” “What do these really mean ?    I am also quite certain how I define a term is not how my neighbor or even 2 experts might define the term.  As a concrete illustration, in 2009,  I was part of a very knowledgeable group who looked at health care reform.  I proposed that each of us write a definition of “single payer” on an index card —  and we had multiple definitions even among a like-minded group.

I am interested in Orwell’s idea to look at phrases as action words — which goes against the passivity of taxonomies as nouns phrases.    For example, although this might be a bit turgid, what if we start thinking about investment as an activity.  By separating investment (the product) for investing (the action), we might start to understand who the players  are,  their roles,  and methods and practices, and then we are on our way to understanding the  an action-oriented defensive role played regulation and regulatory agencies.  And now we are on our way to designing more comprehensive systems for understanding financial goobledy-gook.

Orwell even has a formula for creating user-generated labels that is as good as any instruction I have seen.  Orwell has a process of visualization, where you capture your ideas about a concept in a “mental model”  before you attempt to write a label for the object.   He writes:

When you think of a concrete object, you think wordlessly, and then, if you want to describe the thing you have been visualizing you probably hunt about until you find the exact words that seem to fit it. When you think of something abstract you are more inclined to use words from the start, and unless you make a conscious effort to prevent it, the existing dialect will come rushing in and do the job for you, at the expense of blurring or even changing your meaning. Probably it is better to put off using words as long as possible and get one’s meaning as clear as one can through pictures and sensations.

In 2010,  my  goal is to think about overused  terms and phrases and to take the time to map what is actually meant.   On my blog , you will see  more concept mapping using CMAPS and I’ll be posting my maps for different projects  on this blog regularly.  I’ll also be looking for other projects that are doing innovative work.

But the main point of Orwell’s work is that words have to be discussed in an open dialog.  Taxonomy work should not be done in isolation, because we are questioning and defining core concepts.  We need to ask in public spaces  about these fundamental definitions.   If we shake our heads in acknowledgement, when we don’t understand, then  we are  imitating,  and not  exercising mental energy or regenerating our own thinking.

This is a year to use taxonomies for regeneration –  for taxonomies to   become a  more conscious activity to see if the label  conveys what is meant and everything that is meant. As Orwell says, we need to avoid imitation because is corrupts our own thought processes.  If we don’t understand a phrase, ask the speaker to define the concept – precisely and unambiguously.  In tagging my objects, I need to ask if my tags are specific enough?  Did I use enough tags?  Have I covered all the facets aspects of the object?   Can another person find my object  or my post using my tags?

Perhaps, I am going to walk around with bunches of index cards in my bag in order  to create spontaneous moments for regeneration and dialog.    Asking for clarity is something we can do graciously.