A thread on the Taxonomy Community of Practice (TaxoCop) discussion board peeked my interest. The question was whether there was an overlap between taxonomies and data modeling. On the surface, there might not seem to be overlap because data modeling tries to make sense of all the data elements in a data dictionary or schema, but taxonomies are also high-level representations of the content or data, which makes taxonomies a kind of a model.
To me, taxonomies work best when the terms in the taxonomy are grouped into facets, which are terms which have shared properties. These facets could also be called classes or business entitities. In fact, I would be so bold as to suggest that taxonomies can be defined as are collections of unambiguous terms grouped into facets.
When I build taxonomies or the first thing I would prefer to do, ( after defining the information problem, selling the strategy, creating the metadata model, and defining the proof of concept) the taxonomy is to figure out the overall conceptual model. For example, here is a taxonomy for a project for for a sales and marketing system where there was a requirement to track products, companies and applications.
Below is a small example of model that was built for this application. I recently put the model into a cool tool called CMAP. http://cmap.ihmc.us/conceptmap.html.
This taxonomy was created based on a multiprong approach where we
- Captured user query terms from search logs and from existing taxonomies
- Sorted these terms into facets to create a starter taxonomy (this is sometimes called a strawman taxonomy — remember the taxonomy is a collection of facets)
- Got the funding and support to conduct 5 cross-functional mind mapping sessions with mixed groups of stakeholders and users to validate the taxonomy
- Developed an enterprise model of all the facets which became the basis for the longterm implemenation plan
Each of the boxes in the diagram represents a facet, which can then be defined as an authority list, equivalence or variants or hierarchies. The model also clarifies which terms are associated concepts. For example, in the model above, the facet applications can be associated with the platforms that they run on. Run-On is a form of a related term, but it is a user-defined relationship.
This model becomes the driver for defining the facets required by the particular application. Facets then become fields in the metadata. The last thing I figure out is what kinds of relationships are needed — whether the facet should be a list of preferred terms, or variants or a hierarchy. Building taxonomies using conceptual modeling makes it easier to find the associated facets.
At this point, the model seems ready for semantic tools but this model can be used for more common applications.
- It is easier to discover which facets should become associated facets.
- It’s easier to write specifications for how facets should be built and populated
- It is also easier to figure who has authority for update and control of each facet, and where there might be overlapping jurisdiction.
- UI designers get pumped because they have a knowledge organization diagram to help energize their creative juices to create innovative interface design.
- It becomes easier to develop policies about how content should be tagged such as if a content is about a product, it should also be tagged to the manufacturer as well as other attributes
Models don’t have to be used solely with ontology software. These models can be used to help figure out fields in a database or used with automated categorization or can be used to assist user inteface design.
Perhaps, taxonomists should work to take more control over creating models. Taxonomists are perceived as “hierarchy builders;” while ontologists are seen as modelers, particularly if they are using an Ontology tool, like Protege or Top Quadrant composer. If taxonomists could become modelers, we might be able to better explain what we do, why it matters, and help create some innovative systems. Taxonomists understand how to create models and knowledge representations that can capture community norms through validation techniques.
The point of models is that they help in general in reducing complexity and seeing the big picture. By modeling, some organizations might start to see the advantage of building the taxonomy before designing interfaces. These models are be agnostic — that is that it can be integrated into an architecture to work with many different technologies and content from sophisticate semantic systems to the everyday database with fielded data, without worrying about the underlying platform. Keith DeWeese commented in the interesting discussion thread that the ” Ontology should be done before the taxonomy.” But perhaps “taxonomy” can become a new code word for both modeling and terminology management.