The Taxonomy Blog


Organizing concepts leads to clear thinking

Google’s Wonderwheel

Google is trying out a new feature.  Click on other options on a search and then try the option called “WonderWheel.”    Here is a Wonderwheel result for the concept “Dog Parks.”

Google's WonderWheel for "Dog Parks" May 5, 2010

I’m curious if anyone has advice on how to optimize your site to be classified in the wonderwheel  — can I use metadata,  words in text, seo tricks?

If anyone has insights into how Wonderwheel works, please post or contact me offline.

~  Marlene Rockmore

Filed under: Uncategorized

Search Patterns and Faceted Taxonomies

Peter Morville and Jeffrey Callendar have produced a beautiful  manifesto calling to improve search  called Search Patterns: Design for Discovery (Oreilly, 2010). It is an ode to making complex data beautiful and navigable in user interfaces.  It’s nice to see O’Reilly produce a book with visual flair.

But once you journey through the many beautiful interfaces and design principles on how to present data,  you realize that there is still a need to understand that data presentation is related to data organization.  Morville hints at how data is organized to facilitate these interfaces.  In Chapter 2 on the anatomy of search, the authors write that sites should “embrace faceted navigation… Global facets might include topic, format, date and author.”   Morville downplays the role of formal hierarchies, focusing instead of the user experience of multiple interactions from “pearl growing” to browsing to managing your data to work towards a more immediate user experience.  Faceted navigation is described as “arguably the most significant search innovation of the past decade” (p 95), but there is only one short chapter on called Engines for Discovery that discusses how to create faceted navigation.

The data organization that combines the product taxonomy with other facets is called “unified discovery.”  The engines of this discovery (Chapter 6) and this is where we get into the expanded role of the taxonomists is to add facets for

  • Category: broad classifications that vary by application,
  • Topics:  the smaller areas of common interest  such as specific cars or books or recipes
  • Format: how data is formatted whether as content, video, or idea
  • Audience:  the fundamental activity of understanding the needs of who might need the data, from scholar and expert to novice browser

This global “one size fits all’  recommendation leaves out Time and Chance, which is when an object is produced, and the element of chance in that it is highly respected and relevant to the needs of users.  Date and date range is an important global facet.  Whether there is an “out of the box ” global taxonomy is probably up for debate.   Facets, and how many and how they are labeled,  needs to be validated by user need, application and content.   A global  model is a good starting point, but will probably need to be tuned.  Search across health care policies, for example, which probably requires facets on diseases, symptoms and treatments, and additional resources.    Determining the top categories can take some time so that these categories reflect common shared knowledge and vocabulary.  The top facets do not have to be 5 or 7 plus or minus 2, but rather what is needed by the application, users, and to organize the content.   Get over fixed universality rules and instead collect more data about user needs and content.

These navigations rely on separate and distinct data structures which allow users to navigate and refine queries before they are passed to underlying database or data structures.  These data structures  needs to be maintained, governed and analyzed. Over time, the richer this conceptual metadata, the better the search experience – better techniques for creating and using metadata are only around the corner.

On taxonomies and ontologies, the authors specifically argue that there may be other approaches to disambiguating terms (like Java the programming language from Java the island) based on clues like user and context rather than vocabularies:

“It’s not that there’s no value in parsing sentences for meaning or developing thesauri (or ontologies) that map equivalent, hierarchical, and associative relationships.  These approaches can add value, especially within verticals with limited formal vocabularies, like medicine, law and engineering.  It’s just that less obvious approaches like employing query-query reformulation and post-query click data to drive autosuggest – may deliver better results at lower costs. And we should be wary of claims that computers “understand meaning,” at least until they get a whole lot better at filtering spam.” (p. 162)

While these ideas are valid, it loses the essential wisdom of why librarians adapted taxonomies and spent so long building a body of standards for taxonomy creation. One thing librarians have long known about taxonomies is that they have a shelf-life beyond a specific application – that they can be used to share data across applications, communities and across the globe.

If we are to move the beauty of Morville and Callendar’s interfaces to uses beyond e-commerce and towards accessible, lower cost applications, we are going to have to understand the data structures behind these beautiful designs, and reach some shared understandings about how they should be built.  Search-side approaches to search are wise, but they depend on a good design for faceted navigation where it has validated user categories with user’s needs.  The skills of the taxonomist can be applied to search-side information design.

One discussion I enjoyed was on the under-appreciated role of color as a “quick way to reference the major categories and key players.” (p.15) I have often thought that it might be useful to have a color attribute when defining a facet or category so that all the terms and concepts within a facet share the same color.  That would help in visual sorting of ideas which is an idea Morville and Callendar explore more on the following pages.  Sites without a visual library of photos but only ideas and concepts could become more visual through the use of color-coding.  That would be useful if blogs and databases would look at ways of adding color so that similar concepts in a facet or category  can also be categorized by shared color.

To move to the next level, where we move search patterns from e-commerce to other uses, such as health care or better access to government information and more widely adapt better and more visual search designs,  we have to broaden the understanding of how to create and validate  faceted navigation and categories and what the supporting data structures need to be.  Perhaps O’Reilly’s next book should be on the common data structures for design for discovery such as the art of taxonomy and ontology.

Search Patterns is a valuable little  book  to stimulate creative juices.  The link  to buy Search Patterns is at

Thank you to Andy Oram, a mensch of an editor at O’Reilly.

~ Marlene Rockmore

Enhanced by Zemanta

Filed under: Book Review, Faceted Search, Peter Morville, ,

The Mars Test

A recent segment on NPR discussed with New Yorker writer Peter Hessler, who has lived in China for the past 15 years, what it was like to re-enter life in the United States and how United States looks to Chinese citizens.  Hessler discussed how hard it is for the rest of the world to understand our complex system of check and balances, of federal, state and local power, of influential groups with non-governmental status.    So that raised the question of what governmental websites do to help orient visitors to what the basic organization and framework of government.

What if we were visiting from Mars?  What would we learn from our governmental websites about how the United States is organized.   The Mars test, in taxonomy and information design, is also called the ‘mental model.’  A mental model uses common knowledge or frameworks for creating website navigation.  So a good place to start design a US Government website might be with 4th grade civics, which distinguishes Executive and administration, Legislative, and Judicial Branches and explain responsibilities of federal government and those functions reserved for state government.

Here is the US Government portal called  Does it pass the Mars test? on April 16, 2010

It is a directory like interface  that is organized, it seems to me, based on arbitrary topics with no association to government agencies. Where would I even begin to find out about the President of the United States, the new health care bill, the Supreme Court?  How do you find a local office of a government office like my legislator’s office or the social security office.  In a week where a United States Supreme Court justice retired and volcanic ash disrupted air travel, there is no acknowledgment of these events or links to related website.  The site in fact gives an impression that lights are on but nobody is home.  is actually experimenting with some sophisticated clustering software such as  Vivisimo (  This clustering application illustrates how clustering results can be customized in this case by topic, by agency and by sources. While the topic clusters are automatically generated on-the-fly, the agency and source filters are generated based on HTML metatags.

The United Kingdom is experimenting with its own clustered interface but the site also uses  RDFa and shared metadata. This system has the advantage of having a reusable metadata model that can allow state and local agencies map their content to the governmental model.  This promotes “harmonization” and cooperation in supplying data between federal and state government.  Because of this harmonization through use of shared metadata, can enable features such as search by zipcode for local offices that deliver state and local services.  Even better, the interface looks like someone is minding the store and cares what content appears on the website.

Direct.Gov.UK April 16, 2010

I am not opposed to clustering.  Clustering promises to be a great technology to quickly retrieve masses of documents and content, but a little upfront work is needed to filter automated technologies into useful categories that reflect our  shared  knowledge and common sense.  This work  would help in  creating automated systems that sort results into useful buckets that clarify content and help users find government assistance and  solutions. is actually an exciting engine that has clustered over 50 million government documents.  However it needs a friendlier, warmer interface to the experience.   For example search for  Supreme Court, and results  mixes state courts with the United States Supreme Court.  Wouldn’t  search experience  be improved if the portal to the search engine helpe users  understand and  filtered  searches to distinguish between by federal and  state courts.

Using common models through taxonomies and shared metadata might not only help the visitors from Mars.  It might also help citizens of the United States find a clearly navigable path based on stuff they learned in 4th grade.

Reblog this post [with Zemanta]

Filed under: Government Databases, User-Centered Design

Understanding Associative Relationships

Taxonomies are collections of facets that consist of terms that are described and made unique through connections to each other through relationships. The common relationships are equivalency, hierarchies, and associative (related) relationships.  Of these relationships, the least understood of these relationships and the least used is the associative relationship. Associative relationships are sometimes also called related terms are sometimes also called See Also relationships or sydetic  (cross-references) structures.

One of the reasons that associative (related terms) are least used is because of the confusion about how they are implemented.  Hierarchies move us up and down a category of information that share common properties.  Associative relationship help point us to other aspects of a topic.

Associative relationships can help sort topics into clear categories.  This creates more simplicity that helps both programmers and users.  For example,  might be about paper products.  If I organize one giant enterprise taxonomy with a single hierarchy, types of paper products from envelopes to toilet paper will be mashed in with other topics such as paper weight, composition, manufacturers, or activities for which product is used.

Instead by sorting terms into facet and using associative relationships, you create a clearer graphical mapping of these concepts. In the example below, a taxonomy has been sorted into multiple facets.  This hierarchy has parent/child relationships, which can also be called the abstraction and an instance as in the illustration below:


Middle Level or Abstraction Products Manufacturers Composite Measured By
Lower Level: Instance Envelopes Canson, Tyvek Vellum

100% recycled content

Standard Sizes

Custom Sizes

The associative  relationship can be used to connect the dots  between the columns in the table above . An associative relationship can be displayed as a See Also,  a Related term (RT) as in ANSI-Standard Thesaurus relationships, or as a semantic relationship (predicate) as in a semantic triple.

Paper Products

See Also  Manfacturers and Distributors, 100% Recycled Content, Custom Products


See Also Canson, Tyvek, Vellum, Standard Sizes,

Or by a thesaurus relationship as in

Paper Products

Narrow Term (NT): Envelopes

Related Term (RT):  Manufacturers, Composite, Custom Sizes,

Or by custom  semantic Relationships which could replace the See Also or RT

Paper Products


<ComposedOf> Composite

<MeasuredBy> Standard Sizes, Custom Sizes

This design can also be represented as simpler controlled vocabularies that are stored in fairly flat files.  If a file is hierarchical, the hierarchy would be at most 2-3 levels.  In the following example.  the top level abstraction becomes the name of the facet with specific values for each facet as follows:
Facet1: Paper Products: Office Paper, Envelopes,
Manufacturers & Distributors ->  Weyerhauser, Tyvek, Canson, ACME, Dunder-Mifflin
Facet3: Composite:  Vellum, 100% Recycled Content,
Facet4: Standard Size:

The benefits to using associated relationships to create the connections or “predicates”   between facets are multiple, but some of the obvious benefits are

Processing Improvement: Each facet is mutually exclusive/orthogonal and recombined as needed with other facets.

Extensibility: Since the design is based on a model and add new facets such as content type  (blogs, videos), audience or customer, events (such as weddings or business)  or additional attributes such as color.

Ease of Maintenance: It simplifies long-term maintenance because you no longer have to dive into and sort through or under complex hierarchies to find related concepts.

Discover Information: By sorting these terms into facets and using associative relationships it becomes easier to browse through an information space

The very best part of this approach is that you can change the taxonomy model , without changing the content, and you can update the taxonomy with new terms, such as processes, and it is updated everywhere.  There is one important caveat in this approach.  My data shows that this predefined modeling works 80% of the time  but for about 9% of  uses cases,  you will run into glitches  because of ambiguous or co-occuring single terms   that might appear in multiple facets.  These are words such a  such as “treatment” or “process ”  or  “evaluation” which, as  single terms, are somewhat vague.  When combined in compound phrases, such as “Water-processing”  or “Chemically-treated”, the meaning is clearer.

In my practice, I typically identify and  sort these vague  guide or hub terms such as “process’, “treatment’ “’management”, “evaluation” and so forth and sort them into a separate facet.  I can then use an associative relationship to point to the more specific compound terms (which are in their appropriate facet.)  When these terms are then sorted into the right facet, the meaning becomes even clearer.  This sorting of terms into facets and linking facets with relationships, including semantic, RDF-like, relationships as shown above can be used to create more specific information spaces.   The string that results from this work is more precise than a term that stands alone with no associated relationships.

By using associative relationships, you can build a taxonomy  where it is easier to discover related facets of information, and to combine attributes to refine searchers.  For example, if I am looking to information about envelopes for a wedding invitation,  associative relationships might also help me find information about  wedding planners, or wedding destinations.  I can find videos on how to properly create an invitation.

Of course, that assumes that you define taxonomies as “collections for facets, classes or graphs that create unambiguous terms.” It is also easier to add new facets if needed and link in new data using associative relationships.  Note:  I cannot  do any of the above UNLESS I have built predefined links between facets in the background model through associated relationships between facets.  It is very hard to recognize these known relationships “on the fly.”

Associative relationships are the least understood, but when you take the time to learn how to create them,  it becomes  one of the best arguments for taking the time to build a faceted taxonomy and in adapting data modeling techniques as part of taxonomy development.  If you are designing associative relationships for online systems, whether it is for managing content, e-commerce, or search, you might want to be attentive to how you create associative relationships because there are implications for implementation in improving user navigation and in machine-to-machine processing.

-      Marlene Rockmore

NOTE: In future posts, I’ll put up a reference table to show the relationship between  kinds of associative (related terms) and semantic labels for predicates in a “triple.”  Look for that in future posts.

Reblog this post [with Zemanta]

Filed under: Associative Relationships, Conceptual Modeling

Taxonomies and Modeling

A thread on  the Taxonomy Community of Practice (TaxoCop) discussion board peeked my interest.  The question was whether there was an overlap between taxonomies and data modeling.   On the surface, there might not seem to be overlap because data modeling tries to make sense of all the data elements in a data dictionary or schema, but taxonomies are also high-level representations of the content or data, which makes taxonomies a kind of a model.

To me, taxonomies work best when the terms in the taxonomy are grouped into  facets, which are terms which have shared properties.  These facets could also be called  classes or business entitities.      In fact, I would be so bold as to suggest that taxonomies can be defined as  are collections of unambiguous terms grouped into facets.

When I build taxonomies or the first thing I would prefer to do, ( after defining  the information problem, selling the strategy, creating the metadata model, and defining the proof of concept)  the taxonomy is to  figure out the overall conceptual model. For example, here is a taxonomy for a  project for  for a sales and marketing system where there was a requirement to track products, companies and applications.

Below is a small example of  model that was built for this application. I recently put the model into a cool tool called CMAP.

Sales and Marketing Conceptual Map

Sales and Marketing Conceptual Map

This taxonomy was created based on a multiprong approach where we

  • Captured user query terms from search logs and from existing taxonomies
  • Sorted these terms into facets to create a starter taxonomy (this is sometimes called a strawman taxonomy — remember the taxonomy is a collection of facets)
  • Got the funding and support to conduct 5 cross-functional mind mapping sessions with mixed groups of stakeholders and users to validate the taxonomy
  • Developed an enterprise model of all the facets which became the basis for the longterm implemenation plan

Each of the boxes in the diagram represents a facet, which can then be defined as an authority list, equivalence or variants or hierarchies.  The model also clarifies which  terms are associated concepts. For example,  in the model above, the facet applications  can be associated with the platforms that they run on.  Run-On is a form of a related term, but it is a user-defined relationship.

Software Product Model with User-defined relationship

This model  becomes the driver for defining the facets required by the particular application.   Facets then become fields in the metadata. The last thing I figure out is  what kinds of relationships are needed — whether the facet should be a list of preferred terms, or variants or a hierarchy.  Building taxonomies using conceptual modeling makes it easier to find the associated facets.

At this point, the model seems ready for semantic tools  but this model can be used for more common  applications.

  • It is easier to discover which facets should become associated facets.
  • It’s easier to write specifications for how facets should be  built and populated
  • It is also easier to figure who has authority for update and control of each facet, and where there might be overlapping jurisdiction.
  • UI  designers get pumped because they  have a knowledge organization diagram to help energize their creative juices to create innovative interface design.
  • It becomes easier to develop policies about how content should be tagged such as if a content is about a product, it should also be tagged to the manufacturer as well as other attributes

Models don’t have to be used solely with ontology software.  These models can be used to help figure out fields in a database or used with automated categorization or can be used to assist user inteface design.

Perhaps,  taxonomists should work to take more control over  creating models.    Taxonomists are perceived as “hierarchy builders;”  while ontologists are seen as modelers, particularly if they are using an Ontology tool, like  Protege or Top Quadrant composer.   If taxonomists could become modelers, we might be able to better explain what we do, why it matters, and help create some innovative systems.  Taxonomists understand how to create models and knowledge representations that can capture  community norms through validation techniques.

The point of models is that they help in general in reducing complexity and seeing the big picture.  By modeling, some organizations might start to see the advantage of building the taxonomy before designing interfaces.  These models are  be agnostic — that is that it  can be integrated into an architecture to work with many different technologies and content from sophisticate semantic systems to the everyday database with fielded data, without worrying about the underlying platform. Keith DeWeese commented in the interesting discussion thread that the ” Ontology should be done before the taxonomy.”   But perhaps “taxonomy”  can become a new code word for both modeling and terminology management.

Filed under: Uncategorized

Taxonomy and “ Political Regeneration”

2009 began with the declaration that taxonomy was dead.  In 2010,  I want to suggest that taxonomies have a role  to  play in  regeneration.   I recently reread an influential essay by  George Orwell, called “Politics and the English Language.” Orwell’s essay is about writing,  but it is a request to  choose carefully how we label our experience.   Orwell writes,  language is “full of bad habits. … To get rid of these habits is to think more clearly, and to think more and to think clearly is a necessary first step toward political regeneration.”

Orwell’s essay, written at the end of WWII, was a  quest was to end the bureaucratic language that led to the Holocaust and Stalinization and  that gave us us  desensitizing phrases like “collateral damage”  or “pacification”  but despite Orwell’s large polemics, on rereading his essay, I realized he had an important insights for taxonomies  — and why how we label and categorize matters.

Orwell exerts us to exercise  mental energy to  construct meaningful, vivid and lively labels.   Orwell has a few good rules that probably should be added to the list of taxonomy editing guidelines.

  • Avoid overused  or dying  metaphors and phrases
  • Use more action words, and avoid the passive voice
  • Avoid pretentious words

Think of all the words that emerged in 2009 could use some more  complete taxonomic description to understand what they really meant:   “Health care reform” “death panels”  “single payer” “What do these really mean ?    I am also quite certain how I define a term is not how my neighbor or even 2 experts might define the term.  As a concrete illustration, in 2009,  I was part of a very knowledgeable group who looked at health care reform.  I proposed that each of us write a definition of “single payer” on an index card —  and we had multiple definitions even among a like-minded group.

I am interested in Orwell’s idea to look at phrases as action words — which goes against the passivity of taxonomies as nouns phrases.    For example, although this might be a bit turgid, what if we start thinking about investment as an activity.  By separating investment (the product) for investing (the action), we might start to understand who the players  are,  their roles,  and methods and practices, and then we are on our way to understanding the  an action-oriented defensive role played regulation and regulatory agencies.  And now we are on our way to designing more comprehensive systems for understanding financial goobledy-gook.

Orwell even has a formula for creating user-generated labels that is as good as any instruction I have seen.  Orwell has a process of visualization, where you capture your ideas about a concept in a “mental model”  before you attempt to write a label for the object.   He writes:

When you think of a concrete object, you think wordlessly, and then, if you want to describe the thing you have been visualizing you probably hunt about until you find the exact words that seem to fit it. When you think of something abstract you are more inclined to use words from the start, and unless you make a conscious effort to prevent it, the existing dialect will come rushing in and do the job for you, at the expense of blurring or even changing your meaning. Probably it is better to put off using words as long as possible and get one’s meaning as clear as one can through pictures and sensations.

In 2010,  my  goal is to think about overused  terms and phrases and to take the time to map what is actually meant.   On my blog , you will see  more concept mapping using CMAPS and I’ll be posting my maps for different projects  on this blog regularly.  I’ll also be looking for other projects that are doing innovative work.

But the main point of Orwell’s work is that words have to be discussed in an open dialog.  Taxonomy work should not be done in isolation, because we are questioning and defining core concepts.  We need to ask in public spaces  about these fundamental definitions.   If we shake our heads in acknowledgement, when we don’t understand, then  we are  imitating,  and not  exercising mental energy or regenerating our own thinking.

This is a year to use taxonomies for regeneration –  for taxonomies to   become a  more conscious activity to see if the label  conveys what is meant and everything that is meant. As Orwell says, we need to avoid imitation because is corrupts our own thought processes.  If we don’t understand a phrase, ask the speaker to define the concept – precisely and unambiguously.  In tagging my objects, I need to ask if my tags are specific enough?  Did I use enough tags?  Have I covered all the facets aspects of the object?   Can another person find my object  or my post using my tags?

Perhaps, I am going to walk around with bunches of index cards in my bag in order  to create spontaneous moments for regeneration and dialog.    Asking for clarity is something we can do graciously.

Filed under: Uncategorized

The Right Prescription for a Crowd-source Experiment

My last post was an experiment in using remote online card sorting as a way to build a taxonomy.  And why start small.  My sample data was the picklist used on when you  search on “What does Medicare Cover?”   For my experiment, I used as the remote card sorting tool.

First, let’s start with the good news.  Online tools are basically very cool way to bring together remote groups where it would be too expensive or politically impossible to connect.  That’s the promise.

But to have a successful  remote card sort requires  preliminary planning and work.   Here are my lessons learned:

  • Keep the test under 20 minutes: Online card sorting is a time-consuming task for the participant, so for the experiment to be successful,  you need to make sure that participants have the time and that the number of terms to be sorted are not overwhelming. Joseph Busch of Taxonomy Strategies and Dave Cooksey, suggest 20 minutes/25 terms at most.  My comprehensive test  of all 132 picklist terms from the Medicare site was too big.
  • Pretest the taxonomy: Since the card-sorting activity is a one-time opportunity to  engage testers , some prior testing of the taxonomy should occur.  Remote card sorting is better for closed experiment where a taxonomy has been designed, rather than an open card sort where the goal is to discover categories and facets.   The best practice recommendation is to run some prior tests of the taxonomy before that online experiment.  Have a trusted expert do the test, and then throw away obvious problems.  If the pre-test doesn’t go well,  try again.   Testers in an online setting have a low tolerance for obvious problems, so the test needs to  about validating  a good design.
  • Choose online tools carefully: The tool I used,, had a major problem.  It only allowed a term to be classified under one and only category.  This proved frustrating to users. For example, users wanted to classify durable medical equipment under the category for Equipment but also under the category for the Disease or Chronic Condition.   Dave Cooksey, who tracks tools, says remote tools are improving all the time  — so evaluate tools and choose wisely.
  • Be sure to thank the participants: We all feel manipulated by many of the group activities we attend in the face-to-face world, and that can happen in the remote world as well.   Being authentic and courteous is important. Provide a thank you and be sure to share results or feedback.  If possible, consider some kind of compensation such as a gift card.

So given that a test that seems so simple on the surface requires work to set up, what is the value of this work. The purpose of a taxonomy is to determine top level facets that can be used to organize and search for information.  If we look at a topic like Medicare, we know that we have a national problem determining standards for insurance policies.  It is difficult to compare policies, and it is also time-consuming to manage the costsIn designing good remote crowdsourced  card sorting tests, Dave and Joseph have the following recommendations

  • Pay attention to the sample size
  • Recruit carefully to be sure the sample has balance of perspectives
  • Run tests prior to online activity. Have experts try the test.
  • Remember the goal of a taxonomytest is to find the higher level categories that overlap between the technical expertise and general understanding.
  • The result is a better analysis of shared group understanding – shared mental models of how we collectively categorize concepts,  not individual understanding

In the scheme of a trillion dollar problem like health care, a project to set up  well-designed remote cards sorts that can compare how different user groups sort fundamental medicare concepts seems like a small investment.   A well-run test with a good recruitment could be a very good way to jumpstart better designs of  websites such as Medicare  that deliver  clearer information about benefits and choices.

Reblog this post [with Zemanta]

Filed under: Card Sorts, Taxonomy Valdation, User-Centered Design,

Using Taxonomies to Sort through Health Care Reform

I am very interested in the health care reform debate, thus I wanted to know what a public option might look like. I was told by my sources that a robust public option might look a bit like Medicare. So off I went to the website to find out what was covered.   In the middle of the home page in the second column, there is  a link to ‘Find Out What is Covered, ” which leads to an advanced search criteria page. The search page  includes picklist of about 143 topics,  just the right size for a sample set of candidate terms  for a card sort.

This month, I am offering a small interactive experiment in online card sorting.   Taxonomies are collections of facets, which are created by organizing concepts into categories.  Card sorting is one of the best ways to identify categories by having controlled tests with groups of users to create categories, that can be validated through repeated tests, until there a consensus.  In health care reform, taxonomies might be useful to help create consumer-friendly interfaces to help search across the national insurance exchanges.

A card sort method uses the following steps:

  • Collect a sample set of candidate concepts
  • Group or cluster terms into categories
  • Refine the design iteratively until there is a set of facets, groups of categories that have similar properties

I’ve put 130+  topics from Medicare into an online card sorting tool called  The topics have not been formatted or massaged; they are just as they appear the Medicare search picklist. suggests  that I use a closed card sort,  where participants sort terms into predetermined categories. So to get  started,   I’ve come up with about 20 starter categories.   Some of these categories will become subtopics in a faceted design

The experiment is open to the first 10 participants who want to take the time to try this task.   To try the card sort, link to

Please feel free to assign terms to multiple categories or to suggest other categories.

Last month, Joseph Busch blogged about the judicious use of online web sorting tools – that they may not be the most cost-effective way to build taxonomies. One of his arguments is that the sample set of users will not be random. That’s true. This blog has a small readership who have interest in taxonomies, and probably have a consumer’s interest in health care reform. Let me know what you think of

This little experiment could help demonstrate some bigger observations. Government may be looking to advanced high volumentechnologies such as clustering or semantic technologies to identify categories and to map claims data.   Perhaps one of the applications will be  to build interfaces that will help consumers search across the national exchanges.  But at the core of these technologies, there will be a need for well-designed taxonomies to help analyze text and building better interfaces to access health care information.

A well-designed taxonomy with facets and linking relationships can

  • Group information into useful categories
  • Identify gaps in coverage
  • Help point to important related information

Let’s find out if taxonomy design can help us sort through health care reform.

Thanks to Andy Oram and the Sunlight Foundation for introducing me to this tool and to Dave Cooksey who is virtually updating my card-sorting skills.

Filed under: Card Sorts, Conceptual Modeling, Health Care, User-Centered Design,,

What’s wrong with crowdsourcing the design of public websites?

A blog post from Sunlight Labs on “Redesigning the FCC: Getting Organized” suggests an experiment that employs a public card-sorting program,, to help redesign the Federal Communications Commission (FCC) website.  The FCC has a notoriously convoluted web site, hard to navigate and hard to search.  Sunlight Labs invites anyone interested in helping the FCC to this open card-sorting activity, which organizes about 60 terms into categories related to the FCC. But is a public web sort the right approach to redesigning a government website?

Should we crowdsource the design of a public website?

Here are some considerations: -

  • First, the success of any design process depends on who sits at the table. Site designers have not succeeded over the years by roping in anyone who happens to be around. Rather, carefully identifying the right participants for any design activity is very important. Engaging busy professionals and bureaucrats in order to derive the maximum impact with the minimum effort is a tricky business. One of the most cutting critiques of the Wikipedia has been that the editorial perspective is overwhelmingly white-male twenty-something—not necessarily the authority of choice for everyone else.
  • Second, open processes tend to be very time-consuming, which works in your favor for some kinds of crowdsourcing but not for selecting terms and categories. Unless the sample is large and controlled, the emerging pattern from crowdsourced card sorting may not be helpful because experts with limited time will be overrun by people with lots of time and a fast hand on the keyboard, no matter how much or how little they know. Some types of crowdsourcing (such as prediction markets) work because the errors of ignorant participants cancel each other out and allow the experts to win out—but card sorting is entirely different and results in just chaos.
  • Third, it would be much quicker for the FCC to suggest a model for organizing its content based on its expertise than to crowdsource the design. There are standard ways to organize things, including website content, which people can learn even if they are not entirely natural. We learn about brand, price, size, color, material, and fit because they help us find the stuff we want to buy, not necessarily because there is a shopping gene in our DNA.
  • Fourth, the users of these sites, such as broadcasters, regulators, website publishers, and ordinary people, are not always interested in the same things. The FCC will have to comply with legislative and executive branch imperatives that may be of little interest to many people in the crowd.

A better way to approach website design and redesign focuses on the backend nomenclature—buckets and categories, which are called facets and vocabularies. These form the basis of a useful taxonomy.

So when can crowd-sourcing be used effectively? If the FCC engaged in the process of designing facets and vocabularies, the crowd could be useful as a follow-up. First, it can be helpful in validating a design. After all, the test of a taxonomy is whether it helps people find information. One of the appropriate roles for crowd sourcing in taxonomy is to observe how the users access a collection of items over time, the searches they use, and the click paths they follow. The taxonomy can then be tuned based on how the activity distributes among the categories—splitting and merging categories as warranted.

Another place for crowdsourcing is to allow users to add free-text “tags” to the content. Those tags can then be evaluated to either map them to existing taxonomy categories, or to suggest changes to the taxonomy. In this case the crowd and the taxonomy work together in synergy. Users typically add a tag to only a fraction of the pages, so in most cases these terms will be synonyms or equivalents to existing categories.

Finally, a card-sorting exercise can be useful after the field is carefully constrained by the experts who know the site. The true test of any card-sorting activity is whether people can actually find what they are looking for afterward. Mapping a tag as a synonym of an existing taxonomy category, effectively applies that tag to all the content already in that taxonomy category. This synergy is one method that can help improve access to information.

Here are several techniques that are intuitive and natural for people to use with little or no training, allowing them to validate a taxonomy. These techniques are much faster than open card sorts, and provide results that are easier to interpret.

  • Classifying some content
  • Conducting walk-throughs
  • Closed card sorting

Classifying some content

In this exercise, people are presented with a representative subset of content from the site and are asked to tag it. You can select it randomly or try to include examples of the site’s primary content types, as well as content you think may be hard to tag, find, or use. Plotting the number of items tagged into each taxonomy category, you should expect to see 80% of the content fall into 20% of the categories.

Conducting Taxonomy Walk-Throughs

One-on-one and group presentations to stakeholders showing and explaining or walking through the taxonomy, is an effective way to extract specific comments and sometimes overall approval. During walk-throughs, standard questions should be asked about the category structure, as well as about problematic categories, to gather feedback on the taxonomy. Delphi walk-throughs are done using a stack of cards. It is not a set of raw terms, however, as in the FCC exercise. Instead, the cards are already marked with categories chosen by the experts. Reviewers are asked to mark changes to the category labels on the cards. Each subsequent reviewer is given their walk-through using the cards with the label mark-up from the previous session. The process usually stabilizes after a few sessions, indicating that the categories are appropriate. According to Dave Cooksey, Founder and Principal of saturdave, 20 sessions will usually result in a consensus taxonomy revision, and this method provides results without any further analysis.

Closed Card Sorting

Closed card sorting, where categories are in predefined buckets, can be used to test whether stakeholders and end users consistently sort categories into the correct taxonomy facets. The categories to test should be a set of important topics, such as the most frequently searched words and phrases from the search engine logs. The test can be done using actual cards, or using a simple grid with categories to be tested down the left column and the taxonomy facets across the top. Paper card sorts work well enough for up to 20 trials. is a good tool when you need a larger, distributed closed-card sort test. If users can’t map terms to the categories, the designers will know that they have to adjust their design. But our experience shows that pre-analysis captures about 80% of the common categories and use cases. Sunlight Labs has undertaken a commendable task in seeking to improve the FFC web site’s layout. By carrying out a card sort too quickly, they’ll just get their signals crossed. Performing some professional taxonomy work first will channel public efforts in the right direction.

Submitted by – Joseph A. Busch, Founder and Principal, Taxonomy Strategies,  Sept  8, 2009

Reblog this post [with Zemanta]

Filed under: Conceptual Modeling, FCC, Joseph Busch, Taxonomy Valdation, User-Centered Design,,

Are Taxonomies converging with Folksonomies?

Carole Kaesuk Yoon, in an August 11, 2009 New York Times article, discussed how human groups survive by observing, understanding and classifying their natural world, creating local folk taxonomies that are as intrinsic to survival as water or food. Without the power to order and name life, a person simply does not know how to live in the world. Yoon states, “How to tell the carrot from the cat — which to grate and which to pet? They are utterly lost, anchorless in a strange and confusing world.”, accessed August 11, 2009). The article included an interesting discussion of a research study where college students could decipher what a word meant in a Peruvian native language about 68% of the time because the naming in the folk taxonomy was so descriptive.

At the start of 2009, CMPros pronounced taxonomy dead. This is a good moment to re-evaluate that audacious claim. If taxonomies undergird the survival of people in pristine environments, can they clarify meaning in a culture awash in technology, economics, social science, health, and medicine?

For the last five years, Taxonomy Boot Camp, sponsored by Information Today as an extension to the last two days of Enterprise Search Summit West, provides a comprehensive program demonstrating the use of taxonomies to improve search, govern information, and improve communication. Taxonomy Boot Camp continues to pull together an interesting program of rising stars and established veterans.

Far from being a post-mortem of taxonomies, this year’s conference program provides an opportunity for a conversation about their future in the context of new and putatively competing disciplines. The conference includes superstars from the realm of folksonomies and ontology. Taxonomy Boot Camp provides an opportunity to find out how some practitioners and organizations have tried to use and re-use legacy taxonomies to order information, while providing innovation in interfaces and processes.

This year’s keynote speaker, Thomas Vander Wal, Principal, InfoCloud Solutions Inc, who coined the term folksonomies, opens the dialog with his keynote. Can taxonomies designed for enterprise business and social science organically grow to explain, clarify, modify and mesh with Web 2.0 social enterprise tools? In other words, can enterprise vocabularies become the folk taxonomies to help describe our modern world? Leslie Owens, of Forrester Research, presents the other keynote on the reuse and repurpose of taxonomies, which may highlight the value of reviving taxonomies in organizations and enterprises.

Some of this year’s participants are engaged in some leading edge projects:

Dean Allemang, developer of Top Quadrant, and author of SEMANTIC WEB FOR THE WORKING ONTOLOGIST, will lead a panel about moving beyond broad and narrow terms to semantic relationships. Co-panelists include staff from the Food and Agriculture Organization, World Bank, and Library of Congress. Metadata will be covered in several sessions, including one session on Dublin Core from Mike Crandall of the Ischool at the University of Washington with Marjorie Hlava of Access Innovations, and another discussion with Stephanie Lemieux of Earley and Associates on integration with Sharepoint.

Annie Wang of Deloitte will share her perspective on using taxonomies for large, complex organizational integration.

Christine Connors of TriviumRLG LLC and Jordan Frank of Traction Software will speak on Linked Data, Web 3.0, and Tagsonomies, and how taxonomies and ontologies can turn tag mush into useful concepts. Their talk will be followed by Stephanie Lemieus and Tom Reamy discussion of folksonomy and taxonomy. The hot topic of merging and rescuing existing taxonomies will also be discussed. Integration of existing taxonomies will be discussed by 4 veteran taxonomists including Heather Hedden, Carol Hert, Wendi Pohs; followed by a panel on rescuing and repurposing taxonomies including Lisa Dawn Colvin from Top Quadrant, Ron Daniels of Taxonomy Strategies, and Jeff Carr of Earley and Associates.

Taxonomy validation will be presented by Joseph Busch of Taxonomy Strategies, who will describe how a taxonomy was validated over several days of exercises with key stakeholders at the Substance Abuse and Mental Health Services Administration (SAMHSA). Taxonomy and semantic modeling tools will also be on the agenda .

The conference ends with a dialog about the future of taxonomies led by Wendi Pohs and Daniela Barbosa from DowJones. Several pre-conference workshops also provide learning opportunities for exploring topics with expert practitioners in more depth. The full conference program is available in HTML and PDF (

Opening a dialog about how the best practices in taxonomy management mesh with the innovations in folksonomy and ontology might help clarify our thinking in turbulent times. Any conference that brings together the taxonomy and semantic web communities provides an opportunity to create energy to move to new architectures, interfaces and tools. Taxonomy Boot Camp 2009 will be held from November 19-20, 2009 San Jose McEnery Convention Center – San Jose, CA. For more information and for $200 off the conference registration fee, visit

Reblog this post [with Zemanta]

Filed under: Conferences, Folk Taxonomy, , ,


  • A need for people who can tweak.. inquiry cannot be fully automated. Algorithms Get a Human Hand in Steering Web at 1 year ago
  • Reinventing taxonomists: Can professionals who can make text findable; can they improve practices for sharing across big data apps?tweeted at 2 years ago
  • A Need for Agile Enterprise Taxonomies? Read my take on discussions at Enterprise Data World #edw12tweeted at 2 years ago
  • A Need for Agile Enterprise Taxonomies? See my take on what I heard at Enterprise Data World #edw2012tweeted at 2 years ago
  • Put "Best insurance policy" in Wolfram-Alpha engine: Result: An old-style ANSI taxonomy Are they hiring librarians?tweeted at 2 years ago

Categories Dropdown

Blog Stats

  • 50,201 hits
October 2014
« May    

Get every new post delivered to your Inbox.

Join 60 other followers