What next, taxonomy?

Can anyone learn what is happening in a field by following a conference on Twitter? Tha’s how I decided to follow Taxonomy Bootcamp 2012.   I missed networking with a new generation of confident, well-trained taxonomists, but nevertheless, I will attempt to identify the themes and challenges that are facing taxonomists.  Please feel free to add to the list or dispute!

  1. Centralized models don’t scale; think federated, allow local variation.

Taxonomists have jobs because organizations value managing its resources with a common vocabulary, but how the architecture and governance of the vocabulary maps to the internal structure of an organization  is harder to understand.  Speakers urged attendees to allow distributed models that accommodate local variations and utilize local project vocabularies.  Using term sets (facets) help ease taxonomy managemement. As an example,  one tweet posted about the talk given by Gary Carlson and Pam Green, relayed that Microsoft had 23 term sets for its intranet – the most complex was products, and the simplest was confidentiality.  This structure assists in setting up access control to owenrship  groups manage these vocabularies at the group and/or term level.

2. Does social media and information sharing have a role in taxonomy and how does it square with governance, security, control, and confidentiality

This topic yielded some buzz and offers an interesting dialetic between the issue of information access and security. @syndetic tweeted that “HR sees social media as a time waster, and IT sees it as a security probem.”     So let’s start with exploring to see why  social media matters to taxonomists? The main point is that information is ambigouous and confusing, and that social media is necessary to help generate, identify and clarify relevant, lively, topical content now that can be curated later (as opposed to traditional models of curate then share).  AprilMunden  summed the argument for social media nicely in a tweet: “using folksonomy (user generated taxonomy) should defInitely be a part of your initial design AND ongoing governance plan.”   Seth Earley, always with  a witty insight,  said “Fast changing, requires expertise, ambiguous concepts …business can own; slow to change, more security,unambiguous,  let IT own that”     Maybe that’s a strategy for starting the conversation about the business value of tagging and social media.

3.  Why non-expert taxonomist needs to build a taxonomy.

“Experts swim in the deep end of the content.” Tom Reamy made this statement which set off a flurry of sympathetic tweets.  Tom clarified that experts know their content and process but can get mired in details and in their point-of view   Tom’s statement is provocative because it clarifies the role of the taxonomist which is

  • to be apply social  listening skills and analytic methods to categorize diverse needs of a wide range of users
  •  Build a well-structured faceted taxonomy that reflects needs of different constituencies and fits with the governance structure of an organization (see point 1)
  •  Validate and test the taxonomy
  • Assist with the support of application integration
  • Establish ongoing governance processes including use of social media and text analytic to keep the taxonomy relevant

A taxonomy needs to reflect multiple user needs which means allowing non-experts to explore topics before they go in for a deep dive.  Taxonomists need to work with experts to gain their support by engaging and in validating the taxonomy.

3. What aboutTaxonomy, the User Interface, and Big Data and Mobile apps

Taxonomists sit at an intersection between  data/digital assets/content and user interfaces, but it’s not clear how taxonomists apply their skills.  They are not graphic designers or UX experts, and not quite database experts.  A few sessions mentioned data visualization, and other graphic imagery to explore content and data such as   mashups of datasets,  grids, wheels, graphic organizers, or maps. This is an area where taxonomists, who are not as visually oriented, need to rethink their approach, to start to think of themselves as information artists.  Taxonomists can be advocates for adopting in improved techniques including standards that organize taxonomy/metadata framework  and can also advocate for tools that make sharing between across applications, organization and platforms more efficient, which brings us to point 4

4. Taxonomy tools must make it easier to import and export vocabularies

Taxonomists know that their vocabularies need to play well with all the applications above as well as other needs such as the goal of providing cross-organization information access Sharepoint, XML, relational databases, legacy products.   Taxonomies work in part because they are agnostic, because they can work in with any number of technologies, because concepts and metadata are separate from the content.  To play well with others, taxonomy tools  need to support import and export of vocabularies into different standards including SKOS or XML,  As KarlaTR tweeted ”If you put something in a taxonomy, you’ve got to get it out.”     One option is to try tools that are marketed at Taxonomy Bootcamp such as TopQuadrant EVN and Smartlogic which have been ported from the ontology world are now alongside Synaptica and Information Access as part of the tool evaluation process.

So what next, taxonomy?     What is nice to hear is that more taxonomists are surviving because their organizations understand their core roles. What’s the emerging topics  and challenges —  how to distribute and decentralize (localize)  while having authority and control, how to collect new content on emerging, current topics, visualization, how to be more agile, how to fit in with new technologies like social media, mobile, and big data.  Phew!  That’s a challenge.  Taxonomists have a chance to build relationships not only between terms, but with stakeholders in on a the way to a compelling, visualized, multidimensional content strategy.  Good luck.

Taxonomy Accordion in Drupal

The open source community at Drupal is  quickly catching up on how to use its taxonomy module.  The latest code module  creates a Taxonomy Accordion— aka faceted navigation.  What Taxonomy Accordion shares with faceted navigation is good.  A taxonomy accordion lets a user know at a glance what a website is about, and how to find information, and also what won’t be found.

A Taxonomy Accordion does more than a faceted navigation (plus Taxonomy Accordion is a great name):

  • Using color and shade, you can graduate the color display so parent terms have one shade and the children have another shade
  • Hierarchies close and expand hierarchies much like a venetian blind or elegant fan
  • Has modular code that can be integrated as a part of Drupal Taxonomy module

But, as with other open source,  there is a requirement to plan and invest in the work that goes on  “under the covers.” Here are some of the dirty little secrets – the “work” that tune a  taxonomy accordion or any faceted navigation:

  • Pay attention user-centered design and validation: The fundamental choices of categories has to make sense to users.  Even if you make up an initial set of categories, use a validation process to ensure that the taxonomy makes sense to users.  Validation is a two-step process.  Part one is an open process, sometimes called an open card sort, where terms are collected from users, content, and sources, and then organized into a draft of the taxonomy. The second part of the process is closed, users are asked to find content using the navigation scheme to test whether the classes and hierarchy are useful or need to be refined.  More importantly, by using a validation process and making it part of the plan, you become more user aware and attentive to user needs.
  • Use this opportunity to improve tagging and metadata management: Content has to be tagged with terms from the taxonomy so you need a back-end business process and metadata eg database design to store the tags and pointers to associated content.  This backend metadata record can also help in creating an optimized your search engine especially an engine that supports faceted search such as SOLR.
  • Understand restrictions and attributes: Some facets are not larger super-classes, but are attributes (sometimes also called “slot facets” or “datatype properties”)  that are used to restrict  or narrow search.  These restrictions in an ecommerce application might be facets size as “Measurement, Color,  Availability.”    In a content or digital asset application, the restrictions might be “Content Type, Publication Date, Format.”    By grouping these terms, it helps to reduce permutations and complexity in interface design and in writing queries.
  • Foster distributed environments  and local control: This is hard to understand, but the faceted design is not authoritarian.  If the faceted design is based on user needs and a validation process, than it is likely to reflect shared values.   It still allows local organizations to develop and manage their information; it makes it easier to map that information to process and workflow.   For example,  a music company might have all its artists map  their music to shared facets such as genre.   A local social service agency might be asked to map its services to a common public service metadata scheme.  Allowing local agencies to update their metadata,  tag content, and suggest terms for taxonomy is a great way to identify user needs and changing requirements.
  • Change and Improve: Once categories are established, a change management process needs to be in place to monitor user queries to make sure that the categories and terms remain current and useful.   Setting baseline thresholds —  vital statistics —  (to be discussed in next month’s post) —  can help in recognizing changing markets, technologies or user needs.

An open source faceted navigation should allow implementation at a lower cost. Even with an Open Source solution like Drupal, which offers flexible options,  it  pays to invest some attention to understanding taxonomy business process because it will lead to more efficient implementation and efficient backend process.

The Return of Investment (ROI) justification  include not only user interface improvements (reduced clicks to right content) but also programming cost efficiencies such as  more simplicity in writing backend queries – great ROI justifications for the work. Validation work segues with the work of marketing and customer relations, so consider integrating taxonomy validation and governance into existing work processes.   Some organizations roll taxonomy management into a knowledge management function which oversees the entire process from organizing knowledge categories, managing content acquisition, and monitor.

Drupal’s development community has some very sophisticated features that will be available in the upcoming years including ways to visualize and cluster linked data, using RDFa.   Developing faceted navigation and taxonomies is a great way to get ready for an exciting future of visually interesting interfaces that better help users find and share information in complex organizations.

Don’t let the simplicity of the Taxonomy Accordion fool you.   Use the accordion as  an opportunity to understand user needs, how users look for information, and making underlying production, tagging and databases more efficient and focused  on user needs and high quality information.

~ Marlene Rockmore

Enhanced by Zemanta


Image via Wikipedia

I just returned from an intense training in semantic web technologies through Top Quadrant and I learned much more about what goes on “under the covers.” The course explained more about how semantic technologies can generate machine to machine applications. One important learning was that facets are similar to classes which is similar to the mathematical idea of a set and discusses why taxonomists and programmers need to think more in terms of classes, facets and sets as similar ideas.

Using semantic tools requires building a conceptual model — which is collection of classes.  To build useful models that are semantically-enable requires learning the basic semantic toolkit:

  • RDF (relational description framework). In RDF, one creates classes, and designs relations between individual members of a class and between classes. RDF comes in two main flavors:  RDFa which is for web-based applications  and RDFs which can be used to generate the ontology (concept mapping) as a schema to represent the underlying data.  RDF is used to create inverted graphs that can be converted to triples. Using RDF, one can read in a data store such as a spreadsheet and quickly generate a starter taxonomy (which still needs to be validated with use case scenarios )
  • SKOS (simple knowledge organization system) converts traditional taxonomies into rdf format. SKOS handles basic thesaurus-type relations such as broader/narrower concepts, alternative labels and related concepts. In SKOS the related concept would have its own unique resource identifier. SKOS can only describe a concept with broader, narrower and alternative labels and preferred labels, and cannot associate a concept with an OWL class.
  • SPARQL is a specialized query language, designed to query triple stores A semantically-enabled applications is one that is converted can be converted into an RDF graph, which can then be visually displayed as a graph and queried using SparQL.
  • OWL (web ontology language) is the underlying language for describing models. OWL is required to handle more complexity such as restrictions, cardinality, and inferencing.

Most everything conceptually in RDF, SKOS, and the underlying programming language OWL, once you get under the covers, will familiar to taxonomists. Some details can confuse you, but don’t let the lack of underlying naming conventions deter you. For example, a class in RDF is called an Owl:Thing. If a class is defined in RDF Schema language can be called an RDFS:Class. Oh well, confusing, but don’t let that deter you from appreciating the power of this approach. A thing is still a class, which is similar to a facet.

Here are some examples of how OWL and taxonomies are similar. The bolded print is the OWL property.

SubClassOf defines narrow term in a set

Inverse of creates reciprocal relations

Transitivity allows navigation of a hierarchy so that if A = B, B=C, the A=C. A SPARQL query that can chain through a hierarchy can potentially consist of 2 lines.

Restrictions are similar to slot facets or attributes which are o properties that limit the set

Here are some reasons to utilize classes in semantic technologies as a best practice.  Without implementing classes and modeling, these outcomes would be hard to achieve:

Form follows function: Instead of designing big monolithic hierarchical taxonomies, thinking in terms of classes or facets, which are groupings of individual members in a set. These smaller, faster sets (fasets, perhaps) will be easier to export, import, edit and share. Perhaps facets should be called fast sets or fasets! Plus the facets (classes) can become fields in a web form. The possibilities for reuse and design opens many options.

Scalability and Reuse: Since concepts and the associated classes are independent of data and content, the concepts and classes can be changed, such as changing an organization name, renaming key terms, or adapting new ideas, without changing underlying queries and systems architecture. This is scalable.

Change Schema Without Changing Content: Developing conceptual mapping can be done independently and designed and changed in the RDF schema or OWL language without changing the underlying data. Precision: Because an individual concept can be easily manipulated as a member of a set, or multiple sets, the concept can have a more accurate definition. For example, take a term like “Chevy Chase.” By associating “Chevy Chase” with a class:Person one can distinguish Chevy Chase, the comedian, from Chevy Chase, Maryland as part of the class: Location. Furthermore, ideally each unique concept of Chevy Chase would have its own namespace or unique resource identifier (URI).

Precision: The ability to create a concept independent of the content without tightly coupling into a hierarchy, but allowing the concept to associate in a clear way with the appropriate facet or class and to get more precision. This same logic can be applied to more amorphous, squishy terms like “Compensation” or “Performance” or “Management” or “Quality” which can be deconstructed into more specific variants like “Executive Compensation” vs “Non-exempt Pay and Benefits” RDFs can be used to link to more appropriate term with a unique URI

Facilitate Linked Data: If taxonomies and data can be shared, it is faster to build serious applications that can solve real and acute problems. In our class, we built applications that mapped free wifi hot spots were next to swimming pools and taquerias in geographic location, but we also did a serious social policy application where we mapped cities in the United States that had increases in complaints about housing due to sexual orientation, national origin, race and other discriminatory practices, taking data from multiple, reputable sources and applying a common conceptual model.

There are some new challenges for taxonomists especially in understanding the importance of inferencing. Developers who work with OWL is that many inferencing errors can be traced back to bad, messy taxonomies where there are too many broad terms — in other words, avoid complex polyhierarchies.

To create taxonomies that are ready for the semantic future, the better practice is to how to arrange concepts into facets (which can be equated with classes or sets and avoiding complex polyhierarchies (a concept with too many parents). This will allow taxonomies to play well with applications such as user interface design and machine readable applications. The first step is to stop thinking about taxonomies as a monolithic hierarchy, but rather to look at taxonomies as a collection of classes (or facets), where a class is a set with individual members. If models and taxonomies can be easily built and used to connect across data worksheets resolving issues, applications based on linked data can be quickly built.

To try  semantic tools such as SKOS editors, download a trial copy of Top Braid Composer Free Edition.

Enhanced by Zemanta~Marlene Rockmore

Skills of a Classy Taxonomist

At SemTech in June 2010,  several speakers including Professor Deb McGuiness drew a very clear line was drawn between what a taxonomist does and what an ontologist does.  Taxonomists build hierarchies, and ontologists determine classes or categories.   In other words, ontologies are neat and unambiguous, and taxonomies are a bit messy.

Defining classes or ontology work  typically precedes building the taxonomy.  Defining the classes is like writing a specification for the taxonomy; in fact defining classes is the same as defining facets.   The goal of a taxonomist and ontologies should be to define a specific, unambiguous description of a term that helps manage how we find and organize content so the pathways are clear and specific; adding an ontology ensures that the term is placed in the most specific categories to help ensure clarity and lack of ambiguity. I would argue that no taxonomy is useful unless it is faceted – that is, has been divided into classes. Taxonomies work best when they share homogenous properties, and when they are smaller and focused.

By using class analysis, or facet analysis,  several problems are solved:

1)       Clarify specific terms by situation or functions: If I am interested in Java as a programming language, I want to see material related to Java as software, not as slang for coffee or  an island in the South Pacific.  If I am looking for “drill bits,”  it might be important to understand if the drill bits are for my home electric screwdriver  or for an oil rig.   Classes capture these distinctions, and help to create precise specific tagging and information retrieval.

2)       Ease longterm  maintenance issues: Christine Connors points to a simple but common example where taxonomies are built where people’s names are included as narrow terms under the role such as “Hillary Clinton” is “Secretary of State”  or “Charles Windsor” is the “Prince of Wales.” The problem is that when people filling these roles change, there is a maintenance headache.   A classy taxonomy recognizes that there is a separate class for <people> as an entity, as distinguished from <role>.  <People> and <Role>  can be connected by a predicate such as <isA>.  These distinctions are necessary for fast-changing information (such as who is dating whom in an entertainment application) or (who owns whom in a business application).

Abstraction <person> <has> <role>

Instance: Hillary Clinton <is>  Secretary of State

3)    Facilitate sharing  and importing taxonomies: Having taxonomies that are specified by a class description means the taxonomy will be more homogenous, have shared properties, and be more focused.  This will make it easier to import with less cleanup and review.  It will facilitate the use of SKOS for example. Messy taxonomies are harder to merge.

Anyone working with semantic technologies will tell you that most problems in inference happen when hierarchies in source taxonomies create odd associations by inserting a narrow or broad term. A taxonomist needs to be attentive to inferences in order to prevent false statements.   Professor Deb McGuiness calls this issue “truth maintenance.”

To keep these categories clear and distinct, ontologists rely on building a conceptual model or a picture of the domain (see earlier post on Taxonomies and modeling.)   Modeling strategies involve skills of most taxonomists.  Most taxonomists have been taught how to capture vocabulary and how to identify facets.  Check out the blog post Taxonomies and Modelling for more information.

Elaine Kendall  of Sandpiper Software, which is a concept-modelling tool.  suggested that “one could build an ontology in 2 hours.”   With new generation of tools that can create RDF/OWL from data and content,  this statement might be true.

    With good modelling tools that automatically generate RDF/OWL,such as TopQuadrant,  taxonomists might  be able to slide into the needed role as ontologists.  Taxonomists need to understand  some basic concepts in RDF/OWL to extend their skills such as what is a class, what is a property and what is a slot facet, what is class inheritance, what is meant by reciprocation and inverse properties and how to write a SPARQL query.  But more importantly,  a classy taxonomist can help become a facilitator to help build bridges between user and development communities and  to help diagnose and prevent technical problems.

    A taxonomist who is trained in ontologies  should bring the following skills:

    • Ability to create processes to identify the requirements for each class,
    • Develop  metrics to assess good results
    • Identify what vocabularies are needed and use skills to evaluate existing vocabularies, import and adapt these vocabularies to the current needs
    • Ensure the integrity and focus of vocabularies particularly when sourced from an outside vendor,
    • Develop processes to keep vocabularies current, and understand how to use metrics to “measure and improve” any vocabularies.
    • To be part of the development team to help identify if a source vocabulary might be part of false inference.

    The taxonomist works with different user communities as well as developers and helps bridge the gap between what users and experts know and what is needed to build a useful application.   A classy taxonomist has a well-rounded set of skills that can work with development teams and user organizations to build intelligent systems.

    Enhanced by Zemanta

    The Right Prescription for a Crowd-source Experiment

    My last post was an experiment in using remote online card sorting as a way to build a taxonomy.  And why start small.  My sample data was the picklist used on www.medicare.gov when you  search on “What does Medicare Cover?”   For my experiment, I used websort.net. as the remote card sorting tool.

    First, let’s start with the good news.  Online tools are basically very cool way to bring together remote groups where it would be too expensive or politically impossible to connect.  That’s the promise.

    But to have a successful  remote card sort requires  preliminary planning and work.   Here are my lessons learned:

    • Keep the test under 20 minutes: Online card sorting is a time-consuming task for the participant, so for the experiment to be successful,  you need to make sure that participants have the time and that the number of terms to be sorted are not overwhelming. Joseph Busch of Taxonomy Strategies and Dave Cooksey, saturdave.com suggest 20 minutes/25 terms at most.  My comprehensive test  of all 132 picklist terms from the Medicare site was too big.
    • Pretest the taxonomy: Since the card-sorting activity is a one-time opportunity to  engage testers , some prior testing of the taxonomy should occur.  Remote card sorting is better for closed experiment where a taxonomy has been designed, rather than an open card sort where the goal is to discover categories and facets.   The best practice recommendation is to run some prior tests of the taxonomy before that online experiment.  Have a trusted expert do the test, and then throw away obvious problems.  If the pre-test doesn’t go well,  try again.   Testers in an online setting have a low tolerance for obvious problems, so the test needs to  about validating  a good design.
    • Choose online tools carefully: The tool I used, websort.net, had a major problem.  It only allowed a term to be classified under one and only category.  This proved frustrating to users. For example, users wanted to classify durable medical equipment under the category for Equipment but also under the category for the Disease or Chronic Condition.   Dave Cooksey, who tracks tools, says remote tools are improving all the time  — so evaluate tools and choose wisely.
    • Be sure to thank the participants: We all feel manipulated by many of the group activities we attend in the face-to-face world, and that can happen in the remote world as well.   Being authentic and courteous is important. Provide a thank you and be sure to share results or feedback.  If possible, consider some kind of compensation such as a gift card.

    So given that a test that seems so simple on the surface requires work to set up, what is the value of this work. The purpose of a taxonomy is to determine top level facets that can be used to organize and search for information.  If we look at a topic like Medicare, we know that we have a national problem determining standards for insurance policies.  It is difficult to compare policies, and it is also time-consuming to manage the costsIn designing good remote crowdsourced  card sorting tests, Dave and Joseph have the following recommendations

    • Pay attention to the sample size
    • Recruit carefully to be sure the sample has balance of perspectives
    • Run tests prior to online activity. Have experts try the test.
    • Remember the goal of a taxonomytest is to find the higher level categories that overlap between the technical expertise and general understanding.
    • The result is a better analysis of shared group understanding – shared mental models of how we collectively categorize concepts,  not individual understanding

    In the scheme of a trillion dollar problem like health care, a project to set up  well-designed remote cards sorts that can compare how different user groups sort fundamental medicare concepts seems like a small investment.   A well-run test with a good recruitment could be a very good way to jumpstart better designs of  websites such as Medicare  that deliver  clearer information about benefits and choices.

    Reblog this post [with Zemanta]

    Using Taxonomies to Sort through Health Care Reform

    I am very interested in the health care reform debate, thus I wanted to know what a public option might look like. I was told by my sources that a robust public option might look a bit like Medicare. So off I went to the Medicare.gov website to find out what was covered.   In the middle of the home page in the second column, there is  a link to ‘Find Out What is Covered, ” which leads to an advanced search criteria page. The search page  includes picklist of about 143 topics,  just the right size for a sample set of candidate terms  for a card sort.

    This month, I am offering a small interactive experiment in online card sorting.   Taxonomies are collections of facets, which are created by organizing concepts into categories.  Card sorting is one of the best ways to identify categories by having controlled tests with groups of users to create categories, that can be validated through repeated tests, until there a consensus.  In health care reform, taxonomies might be useful to help create consumer-friendly interfaces to help search across the national insurance exchanges.

    A card sort method uses the following steps:

    • Collect a sample set of candidate concepts
    • Group or cluster terms into categories
    • Refine the design iteratively until there is a set of facets, groups of categories that have similar properties

    I’ve put 130+  topics from Medicare into an online card sorting tool called Websort.net.  The topics have not been formatted or massaged; they are just as they appear the Medicare search picklist.   Websort.net suggests  that I use a closed card sort,  where participants sort terms into predetermined categories. So to get  started,   I’ve come up with about 20 starter categories.   Some of these categories will become subtopics in a faceted design

    The experiment is open to the first 10 participants who want to take the time to try this task.   To try the card sort, link to


    Please feel free to assign terms to multiple categories or to suggest other categories.

    Last month, Joseph Busch blogged about the judicious use of online web sorting tools – that they may not be the most cost-effective way to build taxonomies. One of his arguments is that the sample set of users will not be random. That’s true. This blog has a small readership who have interest in taxonomies, and probably have a consumer’s interest in health care reform. Let me know what you think of websort.net.

    This little experiment could help demonstrate some bigger observations. Government may be looking to advanced high volumentechnologies such as clustering or semantic technologies to identify categories and to map claims data.   Perhaps one of the applications will be  to build interfaces that will help consumers search across the national exchanges.  But at the core of these technologies, there will be a need for well-designed taxonomies to help analyze text and building better interfaces to access health care information.

    A well-designed taxonomy with facets and linking relationships can

    • Group information into useful categories
    • Identify gaps in coverage
    • Help point to important related information

    Let’s find out if taxonomy design can help us sort through health care reform.

    Thanks to Andy Oram and the Sunlight Foundation for introducing me to this tool and to Dave Cooksey who is virtually updating my card-sorting skills.

    What’s wrong with crowdsourcing the design of public websites?

    A blog post from Sunlight Labs on “Redesigning the FCC: Getting Organized” suggests an experiment that employs a public card-sorting program, websort.net, to help redesign the Federal Communications Commission (FCC) website.  The FCC has a notoriously convoluted web site, hard to navigate and hard to search.  Sunlight Labs invites anyone interested in helping the FCC to this open card-sorting activity, which organizes about 60 terms into categories related to the FCC. But is a public web sort the right approach to redesigning a government website?

    Should we crowdsource the design of a public website?

    Here are some considerations: –

    • First, the success of any design process depends on who sits at the table. Site designers have not succeeded over the years by roping in anyone who happens to be around. Rather, carefully identifying the right participants for any design activity is very important. Engaging busy professionals and bureaucrats in order to derive the maximum impact with the minimum effort is a tricky business. One of the most cutting critiques of the Wikipedia has been that the editorial perspective is overwhelmingly white-male twenty-something—not necessarily the authority of choice for everyone else.
    • Second, open processes tend to be very time-consuming, which works in your favor for some kinds of crowdsourcing but not for selecting terms and categories. Unless the sample is large and controlled, the emerging pattern from crowdsourced card sorting may not be helpful because experts with limited time will be overrun by people with lots of time and a fast hand on the keyboard, no matter how much or how little they know. Some types of crowdsourcing (such as prediction markets) work because the errors of ignorant participants cancel each other out and allow the experts to win out—but card sorting is entirely different and results in just chaos.
    • Third, it would be much quicker for the FCC to suggest a model for organizing its content based on its expertise than to crowdsource the design. There are standard ways to organize things, including website content, which people can learn even if they are not entirely natural. We learn about brand, price, size, color, material, and fit because they help us find the stuff we want to buy, not necessarily because there is a shopping gene in our DNA.
    • Fourth, the users of these sites, such as broadcasters, regulators, website publishers, and ordinary people, are not always interested in the same things. The FCC will have to comply with legislative and executive branch imperatives that may be of little interest to many people in the crowd.

    A better way to approach website design and redesign focuses on the backend nomenclature—buckets and categories, which are called facets and vocabularies. These form the basis of a useful taxonomy.

    So when can crowd-sourcing be used effectively? If the FCC engaged in the process of designing facets and vocabularies, the crowd could be useful as a follow-up. First, it can be helpful in validating a design. After all, the test of a taxonomy is whether it helps people find information. One of the appropriate roles for crowd sourcing in taxonomy is to observe how the users access a collection of items over time, the searches they use, and the click paths they follow. The taxonomy can then be tuned based on how the activity distributes among the categories—splitting and merging categories as warranted.

    Another place for crowdsourcing is to allow users to add free-text “tags” to the content. Those tags can then be evaluated to either map them to existing taxonomy categories, or to suggest changes to the taxonomy. In this case the crowd and the taxonomy work together in synergy. Users typically add a tag to only a fraction of the pages, so in most cases these terms will be synonyms or equivalents to existing categories.

    Finally, a card-sorting exercise can be useful after the field is carefully constrained by the experts who know the site. The true test of any card-sorting activity is whether people can actually find what they are looking for afterward. Mapping a tag as a synonym of an existing taxonomy category, effectively applies that tag to all the content already in that taxonomy category. This synergy is one method that can help improve access to information.

    Here are several techniques that are intuitive and natural for people to use with little or no training, allowing them to validate a taxonomy. These techniques are much faster than open card sorts, and provide results that are easier to interpret.

    • Classifying some content
    • Conducting walk-throughs
    • Closed card sorting

    Classifying some content

    In this exercise, people are presented with a representative subset of content from the site and are asked to tag it. You can select it randomly or try to include examples of the site’s primary content types, as well as content you think may be hard to tag, find, or use. Plotting the number of items tagged into each taxonomy category, you should expect to see 80% of the content fall into 20% of the categories.

    Conducting Taxonomy Walk-Throughs

    One-on-one and group presentations to stakeholders showing and explaining or walking through the taxonomy, is an effective way to extract specific comments and sometimes overall approval. During walk-throughs, standard questions should be asked about the category structure, as well as about problematic categories, to gather feedback on the taxonomy. Delphi walk-throughs are done using a stack of cards. It is not a set of raw terms, however, as in the FCC exercise. Instead, the cards are already marked with categories chosen by the experts. Reviewers are asked to mark changes to the category labels on the cards. Each subsequent reviewer is given their walk-through using the cards with the label mark-up from the previous session. The process usually stabilizes after a few sessions, indicating that the categories are appropriate. According to Dave Cooksey, Founder and Principal of saturdave, 20 sessions will usually result in a consensus taxonomy revision, and this method provides results without any further analysis.

    Closed Card Sorting

    Closed card sorting, where categories are in predefined buckets, can be used to test whether stakeholders and end users consistently sort categories into the correct taxonomy facets. The categories to test should be a set of important topics, such as the most frequently searched words and phrases from the search engine logs. The test can be done using actual cards, or using a simple grid with categories to be tested down the left column and the taxonomy facets across the top. Paper card sorts work well enough for up to 20 trials.

    Websort.net is a good tool when you need a larger, distributed closed-card sort test. If users can’t map terms to the categories, the designers will know that they have to adjust their design. But our experience shows that pre-analysis captures about 80% of the common categories and use cases. Sunlight Labs has undertaken a commendable task in seeking to improve the FFC web site’s layout. By carrying out a card sort too quickly, they’ll just get their signals crossed. Performing some professional taxonomy work first will channel public efforts in the right direction.

    Submitted by – Joseph A. Busch, Founder and Principal, Taxonomy Strategies,  Sept  8, 2009

    Reblog this post [with Zemanta]