The Taxonomy Blog

Icon

Organizing concepts leads to clear thinking

Deconfusing Healthcare through Taxonomy Inquiry

This winter,  I had an opportunity to participate in an information research team that had a chance to interview top executives in health care in Massachusetts.  This included the CEOs of insurance companies,  regulators from the Attorney General’s office, and medical directors of major medical networks and hospitals.   The goal of this project was to understand one term  “Cost Containment”   — what are the drivers for rising health care costs and what can be done to slow the rate of growth.

When someone with taxonomy skills participates in these types of investigations, it is hard not to put those taxonomy skills to work. What did I learn from this process that might be applicable to best practice and to understanding health care cost containment?

1) Start with a  simple but important question  as a guide for developing deeper knowledge

This group started with the question  “What is cost containment?”   It is a fairly fundamental question since we in Massachusetts are fortunate to have universal coverage (about 97%)  but there is a need to control costs.  By asking this fundamental question. the group could  collect basic facts from each key player on the same topic   to understand how proposed strategies are defined from the point of view of key players who are shaping policy.

2) Get to know the cast of characters

Remember the adage that the key to a baseball game is to know the players and the same applies to understanding a complex issue. We need to  who the users are, what brought them to these meetings,  It is critical to  identify the constituencies in healthcare, all of whom have different goals in any situation.   The key actors we indentified were:

  • Insurers (also known as Payers)
  • Providers (Hospitals, Doctors, Specialists)
  • Regulators (government, legislature, attorney general)
  • Consumers (includes business owners, patients, local government)
  • Purchasing agents (people who buy insurance for large groups — government, business, insurance agents)
The above list is a top level of the Actors/Player facet which further breakdowns.  Insurers for example is further categorized into companies, corporate structure (profit/non-profit), market share.    Not all the groups under these broad headings share characteristics.  For examples, we rarely saw a “specialist” at  a meeting on cost containment, but other types of medical personnel including primary care, psychiatrists, behavior medicine, were well represented because they, as a group, lower reimbursement and higher volume than specialists.  Grouping does not mean all values are inherited  — thus the need for understanding power relationships and attributes.

3) Understand the power relationships

Some actors have more power and are core to the discussion.  Insurers and providers have a closer affinity for example, while consumers, including employees,  business and local government entities tend to have less to no power in these relationships.  Hospitals and specialists have more power than primary care and behavioral medicine.  Understanding these internecine wars within health care is a key analysis for understanding core relationships and who is outlying.  The health care debate is in part about how to give outliers more power and equity in the health care process. The most outlying of all voices is patients and consumers.  Theoretically,  in new models of health care, their voice is supposed to be represented by larger purchasing pools who can negotiate for better service at less cost.

4) Identify  the key cost drivers —  Isolate the attributes 

The hardest part of this work is to isolate the variables/attributes  or cost drivers, and understand how each group contributes to improving these practices.  These are topics that should be of mutual concern but that are  not universally understood and standardized.  Examples of cost drivers included:

  • Use of and dissemination of best practices (end-of-life care, chronic diseases)
  • Use of Technology
  • Number  and Variety of Insurance Plans
  • Cost of drugs
  • Reimbursement rates
  • Risk Management (use of defensive medicine, malpractice, high-risk pools)

Each of these attributes needed to be further understood from perspective of the key players to understand how it contributes to cost.  For example, Massachusetts has an excellent universal health care law, where consumers can choose from about 18 different plans over the Connector, but in addition, there are additional public, private and individual plans resulting in over 16,000 different plans.   Some cost containment could be achieved by having a “shared minimal contract” that is at a high standard of care, and captures essence of basic wellness.  To do this, the players and consumers need to find the common language for describing conditions and coverage.

5) Capture the AS IS Definitions.

Since these conditions and coverage are not standardized,  it is useful to understand what the current status is.   Understanding AS IS definitions help to capture the many disconnects between group. For example, while consumers argue about cost of deductibles, insurance companies might spend more money in order to reduce high cost of hospitalization.  Result is like a balloon filled with water — one end gets leaner, while more pressure is put on another end of the balloon — the consumer.    Capturing the cacophony, instead of the symphony, turned out to be the most valuable part of the work. We discovered we did not have to reach common understanding, which meant trying to capture the current status and its impacts.

6) Read background content

In addition to understand the “cast and drivers”  it is also important to read studies and literature to keep a broad and balance perspective. Being in rooms with charming and knowledgeable power players can be quite intoxicating, but to keep it honest, we needed to keep reading and we needed to ask honest questions about what was the advantage for each player in their advocacy for a certain program.   Spending a few hours each week on literature reviews, books, articles, podcasts on general health care was very important to building our group and individual knowledge base and developing our facility in the terminology of health care economics.  We used reading to define comparative health care models in other countries (Taiwan, Switzerland, Japan, Canada, Germany, UK, France, and US) and to understand multiple models of healthcare delivery.

7) Capture concepts in simple diagrams

Even within our small, random  data collection group, there were divisions in understanding can be quite diverse.  Using simple diagrams to capture concepts  turned out to be powerful shared way to come to common understanding.  Bubble mapping, graphing, hierarchical diagrams, any visual graph was useful to clarify information.

8)  If any term is hard to explain with a simple sentence, it probably deserves a taxonomy

“Cost containment”  is not trivial,  but it is also important to understand. And it is almost  impossible to explain without learning something about healthcare system.   It is worthy of the time and effort to create a taxonomy to define the information space or information void, and a void is filled by misunderstanding or misinformation.

Developing a consumer-focussed taxonomy for navigating health care  turns out to be valuable work, but it is hard to sustain without a dedicated team with and sustained funding.  A consumer-focused taxonomy would help  navigate the health care debate, can be used across all actors, including   insurers, providers, governmental entities  and consumers who want to share information with a confused but curious public.

~ Marlene Rockmore

Filed under: Health Care, Methodology, Planning a Taxonomy, Sample Applications, User-Centered Design,

Taxonomy Accordion in Drupal

The open source community at Drupal is  quickly catching up on how to use its taxonomy module.  The latest code module  creates a Taxonomy Accordion– aka faceted navigation.  What Taxonomy Accordion shares with faceted navigation is good.  A taxonomy accordion lets a user know at a glance what a website is about, and how to find information, and also what won’t be found.

A Taxonomy Accordion does more than a faceted navigation (plus Taxonomy Accordion is a great name):

  • Using color and shade, you can graduate the color display so parent terms have one shade and the children have another shade
  • Hierarchies close and expand hierarchies much like a venetian blind or elegant fan
  • Has modular code that can be integrated as a part of Drupal Taxonomy module

But, as with other open source,  there is a requirement to plan and invest in the work that goes on  “under the covers.” Here are some of the dirty little secrets – the “work” that tune a  taxonomy accordion or any faceted navigation:

  • Pay attention user-centered design and validation: The fundamental choices of categories has to make sense to users.  Even if you make up an initial set of categories, use a validation process to ensure that the taxonomy makes sense to users.  Validation is a two-step process.  Part one is an open process, sometimes called an open card sort, where terms are collected from users, content, and sources, and then organized into a draft of the taxonomy. The second part of the process is closed, users are asked to find content using the navigation scheme to test whether the classes and hierarchy are useful or need to be refined.  More importantly, by using a validation process and making it part of the plan, you become more user aware and attentive to user needs.
  • Use this opportunity to improve tagging and metadata management: Content has to be tagged with terms from the taxonomy so you need a back-end business process and metadata eg database design to store the tags and pointers to associated content.  This backend metadata record can also help in creating an optimized your search engine especially an engine that supports faceted search such as SOLR.
  • Understand restrictions and attributes: Some facets are not larger super-classes, but are attributes (sometimes also called “slot facets” or “datatype properties”)  that are used to restrict  or narrow search.  These restrictions in an ecommerce application might be facets size as “Measurement, Color,  Availability.”    In a content or digital asset application, the restrictions might be “Content Type, Publication Date, Format.”    By grouping these terms, it helps to reduce permutations and complexity in interface design and in writing queries.
  • Foster distributed environments  and local control: This is hard to understand, but the faceted design is not authoritarian.  If the faceted design is based on user needs and a validation process, than it is likely to reflect shared values.   It still allows local organizations to develop and manage their information; it makes it easier to map that information to process and workflow.   For example,  a music company might have all its artists map  their music to shared facets such as genre.   A local social service agency might be asked to map its services to a common public service metadata scheme.  Allowing local agencies to update their metadata,  tag content, and suggest terms for taxonomy is a great way to identify user needs and changing requirements.
  • Change and Improve: Once categories are established, a change management process needs to be in place to monitor user queries to make sure that the categories and terms remain current and useful.   Setting baseline thresholds –  vital statistics –  (to be discussed in next month’s post) –  can help in recognizing changing markets, technologies or user needs.

An open source faceted navigation should allow implementation at a lower cost. Even with an Open Source solution like Drupal, which offers flexible options,  it  pays to invest some attention to understanding taxonomy business process because it will lead to more efficient implementation and efficient backend process.

The Return of Investment (ROI) justification  include not only user interface improvements (reduced clicks to right content) but also programming cost efficiencies such as  more simplicity in writing backend queries – great ROI justifications for the work. Validation work segues with the work of marketing and customer relations, so consider integrating taxonomy validation and governance into existing work processes.   Some organizations roll taxonomy management into a knowledge management function which oversees the entire process from organizing knowledge categories, managing content acquisition, and monitor.

Drupal’s development community has some very sophisticated features that will be available in the upcoming years including ways to visualize and cluster linked data, using RDFa.   Developing faceted navigation and taxonomies is a great way to get ready for an exciting future of visually interesting interfaces that better help users find and share information in complex organizations.

Don’t let the simplicity of the Taxonomy Accordion fool you.   Use the accordion as  an opportunity to understand user needs, how users look for information, and making underlying production, tagging and databases more efficient and focused  on user needs and high quality information.

~ Marlene Rockmore

Enhanced by Zemanta

Filed under: Card Sorts, Open Source Solutions, Taxonomy Tools, Taxonomy Valdation, ,

Skills of a Classy Taxonomist

At SemTech in June 2010,  several speakers including Professor Deb McGuiness drew a very clear line was drawn between what a taxonomist does and what an ontologist does.  Taxonomists build hierarchies, and ontologists determine classes or categories.   In other words, ontologies are neat and unambiguous, and taxonomies are a bit messy.

Defining classes or ontology work  typically precedes building the taxonomy.  Defining the classes is like writing a specification for the taxonomy; in fact defining classes is the same as defining facets.   The goal of a taxonomist and ontologies should be to define a specific, unambiguous description of a term that helps manage how we find and organize content so the pathways are clear and specific; adding an ontology ensures that the term is placed in the most specific categories to help ensure clarity and lack of ambiguity. I would argue that no taxonomy is useful unless it is faceted – that is, has been divided into classes. Taxonomies work best when they share homogenous properties, and when they are smaller and focused.

By using class analysis, or facet analysis,  several problems are solved:

1)       Clarify specific terms by situation or functions: If I am interested in Java as a programming language, I want to see material related to Java as software, not as slang for coffee or  an island in the South Pacific.  If I am looking for “drill bits,”  it might be important to understand if the drill bits are for my home electric screwdriver  or for an oil rig.   Classes capture these distinctions, and help to create precise specific tagging and information retrieval.

2)       Ease longterm  maintenance issues: Christine Connors points to a simple but common example where taxonomies are built where people’s names are included as narrow terms under the role such as “Hillary Clinton” is “Secretary of State”  or “Charles Windsor” is the “Prince of Wales.” The problem is that when people filling these roles change, there is a maintenance headache.   A classy taxonomy recognizes that there is a separate class for <people> as an entity, as distinguished from <role>.  <People> and <Role>  can be connected by a predicate such as <isA>.  These distinctions are necessary for fast-changing information (such as who is dating whom in an entertainment application) or (who owns whom in a business application).

Abstraction <person> <has> <role>

Instance: Hillary Clinton <is>  Secretary of State

3)    Facilitate sharing  and importing taxonomies: Having taxonomies that are specified by a class description means the taxonomy will be more homogenous, have shared properties, and be more focused.  This will make it easier to import with less cleanup and review.  It will facilitate the use of SKOS for example. Messy taxonomies are harder to merge.

Anyone working with semantic technologies will tell you that most problems in inference happen when hierarchies in source taxonomies create odd associations by inserting a narrow or broad term. A taxonomist needs to be attentive to inferences in order to prevent false statements.   Professor Deb McGuiness calls this issue “truth maintenance.”

To keep these categories clear and distinct, ontologists rely on building a conceptual model or a picture of the domain (see earlier post on Taxonomies and modeling.)   Modeling strategies involve skills of most taxonomists.  Most taxonomists have been taught how to capture vocabulary and how to identify facets.  Check out the blog post Taxonomies and Modelling for more information.

Elaine Kendall  of Sandpiper Software, which is a concept-modelling tool.  suggested that “one could build an ontology in 2 hours.”   With new generation of tools that can create RDF/OWL from data and content,  this statement might be true.

    With good modelling tools that automatically generate RDF/OWL,such as TopQuadrant,  taxonomists might  be able to slide into the needed role as ontologists.  Taxonomists need to understand  some basic concepts in RDF/OWL to extend their skills such as what is a class, what is a property and what is a slot facet, what is class inheritance, what is meant by reciprocation and inverse properties and how to write a SPARQL query.  But more importantly,  a classy taxonomist can help become a facilitator to help build bridges between user and development communities and  to help diagnose and prevent technical problems.

    A taxonomist who is trained in ontologies  should bring the following skills:

    • Ability to create processes to identify the requirements for each class,
    • Develop  metrics to assess good results
    • Identify what vocabularies are needed and use skills to evaluate existing vocabularies, import and adapt these vocabularies to the current needs
    • Ensure the integrity and focus of vocabularies particularly when sourced from an outside vendor,
    • Develop processes to keep vocabularies current, and understand how to use metrics to “measure and improve” any vocabularies.
    • To be part of the development team to help identify if a source vocabulary might be part of false inference.

    The taxonomist works with different user communities as well as developers and helps bridge the gap between what users and experts know and what is needed to build a useful application.   A classy taxonomist has a well-rounded set of skills that can work with development teams and user organizations to build intelligent systems.

    Enhanced by Zemanta

    Filed under: Ontologies, Ontology Development, Planning a Taxonomy, SemTech2010, Topbraid Composer, , , ,

    Understanding Associative Relationships

    Taxonomies are collections of facets that consist of terms that are described and made unique through connections to each other through relationships. The common relationships are equivalency, hierarchies, and associative (related) relationships.  Of these relationships, the least understood of these relationships and the least used is the associative relationship. Associative relationships are sometimes also called related terms are sometimes also called See Also relationships or sydetic  (cross-references) structures.

    One of the reasons that associative (related terms) are least used is because of the confusion about how they are implemented.  Hierarchies move us up and down a category of information that share common properties.  Associative relationship help point us to other aspects of a topic.

    Associative relationships can help sort topics into clear categories.  This creates more simplicity that helps both programmers and users.  For example,  might be about paper products.  If I organize one giant enterprise taxonomy with a single hierarchy, types of paper products from envelopes to toilet paper will be mashed in with other topics such as paper weight, composition, manufacturers, or activities for which product is used.

    Instead by sorting terms into facet and using associative relationships, you create a clearer graphical mapping of these concepts. In the example below, a taxonomy has been sorted into multiple facets.  This hierarchy has parent/child relationships, which can also be called the abstraction and an instance as in the illustration below:

    .

    Middle Level or Abstraction Products Manufacturers Composite Measured By
    Lower Level: Instance Envelopes Canson, Tyvek Vellum

    100% recycled content

    Standard Sizes

    Custom Sizes

    The associative  relationship can be used to connect the dots  between the columns in the table above . An associative relationship can be displayed as a See Also,  a Related term (RT) as in ANSI-Standard Thesaurus relationships, or as a semantic relationship (predicate) as in a semantic triple.

    Paper Products

    See Also  Manfacturers and Distributors, 100% Recycled Content, Custom Products

    Envelopes

    See Also Canson, Tyvek, Vellum, Standard Sizes,

    Or by a thesaurus relationship as in

    Paper Products

    Narrow Term (NT): Envelopes

    Related Term (RT):  Manufacturers, Composite, Custom Sizes,

    Or by custom  semantic Relationships which could replace the See Also or RT

    Paper Products

    <MadeBy>Manufacturers

    <ComposedOf> Composite

    <MeasuredBy> Standard Sizes, Custom Sizes

    This design can also be represented as simpler controlled vocabularies that are stored in fairly flat files.  If a file is hierarchical, the hierarchy would be at most 2-3 levels.  In the following example.  the top level abstraction becomes the name of the facet with specific values for each facet as follows:
    Facet1: Paper Products: Office Paper, Envelopes,
    Facet2
    Manufacturers & Distributors ->  Weyerhauser, Tyvek, Canson, ACME, Dunder-Mifflin
    Facet3: Composite:  Vellum, 100% Recycled Content,
    Facet4: Standard Size:

    The benefits to using associated relationships to create the connections or “predicates”   between facets are multiple, but some of the obvious benefits are

    Processing Improvement: Each facet is mutually exclusive/orthogonal and recombined as needed with other facets.

    Extensibility: Since the design is based on a model and add new facets such as content type  (blogs, videos), audience or customer, events (such as weddings or business)  or additional attributes such as color.

    Ease of Maintenance: It simplifies long-term maintenance because you no longer have to dive into and sort through or under complex hierarchies to find related concepts.

    Discover Information: By sorting these terms into facets and using associative relationships it becomes easier to browse through an information space

    The very best part of this approach is that you can change the taxonomy model , without changing the content, and you can update the taxonomy with new terms, such as processes, and it is updated everywhere.  There is one important caveat in this approach.  My data shows that this predefined modeling works 80% of the time  but for about 9% of  uses cases,  you will run into glitches  because of ambiguous or co-occuring single terms   that might appear in multiple facets.  These are words such a  such as “treatment” or “process ”  or  “evaluation” which, as  single terms, are somewhat vague.  When combined in compound phrases, such as “Water-processing”  or “Chemically-treated”, the meaning is clearer.

    In my practice, I typically identify and  sort these vague  guide or hub terms such as “process’, “treatment’ “’management”, “evaluation” and so forth and sort them into a separate facet.  I can then use an associative relationship to point to the more specific compound terms (which are in their appropriate facet.)  When these terms are then sorted into the right facet, the meaning becomes even clearer.  This sorting of terms into facets and linking facets with relationships, including semantic, RDF-like, relationships as shown above can be used to create more specific information spaces.   The string that results from this work is more precise than a term that stands alone with no associated relationships.

    By using associative relationships, you can build a taxonomy  where it is easier to discover related facets of information, and to combine attributes to refine searchers.  For example, if I am looking to information about envelopes for a wedding invitation,  associative relationships might also help me find information about  wedding planners, or wedding destinations.  I can find videos on how to properly create an invitation.

    Of course, that assumes that you define taxonomies as “collections for facets, classes or graphs that create unambiguous terms.” It is also easier to add new facets if needed and link in new data using associative relationships.  Note:  I cannot  do any of the above UNLESS I have built predefined links between facets in the background model through associated relationships between facets.  It is very hard to recognize these known relationships “on the fly.”

    Associative relationships are the least understood, but when you take the time to learn how to create them,  it becomes  one of the best arguments for taking the time to build a faceted taxonomy and in adapting data modeling techniques as part of taxonomy development.  If you are designing associative relationships for online systems, whether it is for managing content, e-commerce, or search, you might want to be attentive to how you create associative relationships because there are implications for implementation in improving user navigation and in machine-to-machine processing.

    -      Marlene Rockmore

    NOTE: In future posts, I’ll put up a reference table to show the relationship between  kinds of associative (related terms) and semantic labels for predicates in a “triple.”  Look for that in future posts.

    Reblog this post [with Zemanta]

    Filed under: Associative Relationships, Conceptual Modeling

    The Right Prescription for a Crowd-source Experiment

    My last post was an experiment in using remote online card sorting as a way to build a taxonomy.  And why start small.  My sample data was the picklist used on www.medicare.gov when you  search on “What does Medicare Cover?”   For my experiment, I used websort.net. as the remote card sorting tool.

    First, let’s start with the good news.  Online tools are basically very cool way to bring together remote groups where it would be too expensive or politically impossible to connect.  That’s the promise.

    But to have a successful  remote card sort requires  preliminary planning and work.   Here are my lessons learned:

    • Keep the test under 20 minutes: Online card sorting is a time-consuming task for the participant, so for the experiment to be successful,  you need to make sure that participants have the time and that the number of terms to be sorted are not overwhelming. Joseph Busch of Taxonomy Strategies and Dave Cooksey, saturdave.com suggest 20 minutes/25 terms at most.  My comprehensive test  of all 132 picklist terms from the Medicare site was too big.
    • Pretest the taxonomy: Since the card-sorting activity is a one-time opportunity to  engage testers , some prior testing of the taxonomy should occur.  Remote card sorting is better for closed experiment where a taxonomy has been designed, rather than an open card sort where the goal is to discover categories and facets.   The best practice recommendation is to run some prior tests of the taxonomy before that online experiment.  Have a trusted expert do the test, and then throw away obvious problems.  If the pre-test doesn’t go well,  try again.   Testers in an online setting have a low tolerance for obvious problems, so the test needs to  about validating  a good design.
    • Choose online tools carefully: The tool I used, websort.net, had a major problem.  It only allowed a term to be classified under one and only category.  This proved frustrating to users. For example, users wanted to classify durable medical equipment under the category for Equipment but also under the category for the Disease or Chronic Condition.   Dave Cooksey, who tracks tools, says remote tools are improving all the time  — so evaluate tools and choose wisely.
    • Be sure to thank the participants: We all feel manipulated by many of the group activities we attend in the face-to-face world, and that can happen in the remote world as well.   Being authentic and courteous is important. Provide a thank you and be sure to share results or feedback.  If possible, consider some kind of compensation such as a gift card.

    So given that a test that seems so simple on the surface requires work to set up, what is the value of this work. The purpose of a taxonomy is to determine top level facets that can be used to organize and search for information.  If we look at a topic like Medicare, we know that we have a national problem determining standards for insurance policies.  It is difficult to compare policies, and it is also time-consuming to manage the costsIn designing good remote crowdsourced  card sorting tests, Dave and Joseph have the following recommendations

    • Pay attention to the sample size
    • Recruit carefully to be sure the sample has balance of perspectives
    • Run tests prior to online activity. Have experts try the test.
    • Remember the goal of a taxonomytest is to find the higher level categories that overlap between the technical expertise and general understanding.
    • The result is a better analysis of shared group understanding – shared mental models of how we collectively categorize concepts,  not individual understanding

    In the scheme of a trillion dollar problem like health care, a project to set up  well-designed remote cards sorts that can compare how different user groups sort fundamental medicare concepts seems like a small investment.   A well-run test with a good recruitment could be a very good way to jumpstart better designs of  websites such as Medicare  that deliver  clearer information about benefits and choices.

    Reblog this post [with Zemanta]

    Filed under: Card Sorts, Taxonomy Valdation, User-Centered Design, Websort.net

    Using Taxonomies to Sort through Health Care Reform

    I am very interested in the health care reform debate, thus I wanted to know what a public option might look like. I was told by my sources that a robust public option might look a bit like Medicare. So off I went to the Medicare.gov website to find out what was covered.   In the middle of the home page in the second column, there is  a link to ‘Find Out What is Covered, ” which leads to an advanced search criteria page. The search page  includes picklist of about 143 topics,  just the right size for a sample set of candidate terms  for a card sort.

    This month, I am offering a small interactive experiment in online card sorting.   Taxonomies are collections of facets, which are created by organizing concepts into categories.  Card sorting is one of the best ways to identify categories by having controlled tests with groups of users to create categories, that can be validated through repeated tests, until there a consensus.  In health care reform, taxonomies might be useful to help create consumer-friendly interfaces to help search across the national insurance exchanges.

    A card sort method uses the following steps:

    • Collect a sample set of candidate concepts
    • Group or cluster terms into categories
    • Refine the design iteratively until there is a set of facets, groups of categories that have similar properties

    I’ve put 130+  topics from Medicare into an online card sorting tool called Websort.net.  The topics have not been formatted or massaged; they are just as they appear the Medicare search picklist.   Websort.net suggests  that I use a closed card sort,  where participants sort terms into predetermined categories. So to get  started,   I’ve come up with about 20 starter categories.   Some of these categories will become subtopics in a faceted design

    The experiment is open to the first 10 participants who want to take the time to try this task.   To try the card sort, link to

    http://websort.net/s/80CDD6/

    Please feel free to assign terms to multiple categories or to suggest other categories.

    Last month, Joseph Busch blogged about the judicious use of online web sorting tools – that they may not be the most cost-effective way to build taxonomies. One of his arguments is that the sample set of users will not be random. That’s true. This blog has a small readership who have interest in taxonomies, and probably have a consumer’s interest in health care reform. Let me know what you think of websort.net.

    This little experiment could help demonstrate some bigger observations. Government may be looking to advanced high volumentechnologies such as clustering or semantic technologies to identify categories and to map claims data.   Perhaps one of the applications will be  to build interfaces that will help consumers search across the national exchanges.  But at the core of these technologies, there will be a need for well-designed taxonomies to help analyze text and building better interfaces to access health care information.

    A well-designed taxonomy with facets and linking relationships can

    • Group information into useful categories
    • Identify gaps in coverage
    • Help point to important related information

    Let’s find out if taxonomy design can help us sort through health care reform.

    Thanks to Andy Oram and the Sunlight Foundation for introducing me to this tool and to Dave Cooksey who is virtually updating my card-sorting skills.

    Filed under: Card Sorts, Conceptual Modeling, Health Care, User-Centered Design, Websort.net,

    What’s wrong with crowdsourcing the design of public websites?

    A blog post from Sunlight Labs on “Redesigning the FCC: Getting Organized” suggests an experiment that employs a public card-sorting program, websort.net, to help redesign the Federal Communications Commission (FCC) website.  The FCC has a notoriously convoluted web site, hard to navigate and hard to search.  Sunlight Labs invites anyone interested in helping the FCC to this open card-sorting activity, which organizes about 60 terms into categories related to the FCC. But is a public web sort the right approach to redesigning a government website?

    Should we crowdsource the design of a public website?

    Here are some considerations: -

    • First, the success of any design process depends on who sits at the table. Site designers have not succeeded over the years by roping in anyone who happens to be around. Rather, carefully identifying the right participants for any design activity is very important. Engaging busy professionals and bureaucrats in order to derive the maximum impact with the minimum effort is a tricky business. One of the most cutting critiques of the Wikipedia has been that the editorial perspective is overwhelmingly white-male twenty-something—not necessarily the authority of choice for everyone else.
    • Second, open processes tend to be very time-consuming, which works in your favor for some kinds of crowdsourcing but not for selecting terms and categories. Unless the sample is large and controlled, the emerging pattern from crowdsourced card sorting may not be helpful because experts with limited time will be overrun by people with lots of time and a fast hand on the keyboard, no matter how much or how little they know. Some types of crowdsourcing (such as prediction markets) work because the errors of ignorant participants cancel each other out and allow the experts to win out—but card sorting is entirely different and results in just chaos.
    • Third, it would be much quicker for the FCC to suggest a model for organizing its content based on its expertise than to crowdsource the design. There are standard ways to organize things, including website content, which people can learn even if they are not entirely natural. We learn about brand, price, size, color, material, and fit because they help us find the stuff we want to buy, not necessarily because there is a shopping gene in our DNA.
    • Fourth, the users of these sites, such as broadcasters, regulators, website publishers, and ordinary people, are not always interested in the same things. The FCC will have to comply with legislative and executive branch imperatives that may be of little interest to many people in the crowd.

    A better way to approach website design and redesign focuses on the backend nomenclature—buckets and categories, which are called facets and vocabularies. These form the basis of a useful taxonomy.

    So when can crowd-sourcing be used effectively? If the FCC engaged in the process of designing facets and vocabularies, the crowd could be useful as a follow-up. First, it can be helpful in validating a design. After all, the test of a taxonomy is whether it helps people find information. One of the appropriate roles for crowd sourcing in taxonomy is to observe how the users access a collection of items over time, the searches they use, and the click paths they follow. The taxonomy can then be tuned based on how the activity distributes among the categories—splitting and merging categories as warranted.

    Another place for crowdsourcing is to allow users to add free-text “tags” to the content. Those tags can then be evaluated to either map them to existing taxonomy categories, or to suggest changes to the taxonomy. In this case the crowd and the taxonomy work together in synergy. Users typically add a tag to only a fraction of the pages, so in most cases these terms will be synonyms or equivalents to existing categories.

    Finally, a card-sorting exercise can be useful after the field is carefully constrained by the experts who know the site. The true test of any card-sorting activity is whether people can actually find what they are looking for afterward. Mapping a tag as a synonym of an existing taxonomy category, effectively applies that tag to all the content already in that taxonomy category. This synergy is one method that can help improve access to information.

    Here are several techniques that are intuitive and natural for people to use with little or no training, allowing them to validate a taxonomy. These techniques are much faster than open card sorts, and provide results that are easier to interpret.

    • Classifying some content
    • Conducting walk-throughs
    • Closed card sorting

    Classifying some content

    In this exercise, people are presented with a representative subset of content from the site and are asked to tag it. You can select it randomly or try to include examples of the site’s primary content types, as well as content you think may be hard to tag, find, or use. Plotting the number of items tagged into each taxonomy category, you should expect to see 80% of the content fall into 20% of the categories.

    Conducting Taxonomy Walk-Throughs

    One-on-one and group presentations to stakeholders showing and explaining or walking through the taxonomy, is an effective way to extract specific comments and sometimes overall approval. During walk-throughs, standard questions should be asked about the category structure, as well as about problematic categories, to gather feedback on the taxonomy. Delphi walk-throughs are done using a stack of cards. It is not a set of raw terms, however, as in the FCC exercise. Instead, the cards are already marked with categories chosen by the experts. Reviewers are asked to mark changes to the category labels on the cards. Each subsequent reviewer is given their walk-through using the cards with the label mark-up from the previous session. The process usually stabilizes after a few sessions, indicating that the categories are appropriate. According to Dave Cooksey, Founder and Principal of saturdave, 20 sessions will usually result in a consensus taxonomy revision, and this method provides results without any further analysis.

    Closed Card Sorting

    Closed card sorting, where categories are in predefined buckets, can be used to test whether stakeholders and end users consistently sort categories into the correct taxonomy facets. The categories to test should be a set of important topics, such as the most frequently searched words and phrases from the search engine logs. The test can be done using actual cards, or using a simple grid with categories to be tested down the left column and the taxonomy facets across the top. Paper card sorts work well enough for up to 20 trials.

    Websort.net is a good tool when you need a larger, distributed closed-card sort test. If users can’t map terms to the categories, the designers will know that they have to adjust their design. But our experience shows that pre-analysis captures about 80% of the common categories and use cases. Sunlight Labs has undertaken a commendable task in seeking to improve the FFC web site’s layout. By carrying out a card sort too quickly, they’ll just get their signals crossed. Performing some professional taxonomy work first will channel public efforts in the right direction.

    Submitted by - Joseph A. Busch, Founder and Principal, Taxonomy Strategies,  Sept  8, 2009

    Reblog this post [with Zemanta]

    Filed under: Conceptual Modeling, FCC, Joseph Busch, Taxonomy Valdation, User-Centered Design, Websort.net,

    5 Types of Taxonomies: From Lists to Ontologies

    Taxonomy, strictly defined, is a hierarchical arrangement of terms, but the form of a taxonomy depends on the information problem at hand. After all, taxonomy is a method for organizing knowledge or concepts, which requires flexibility in how to capture and represent concepts. The complexity depends on factors such as what’s the core area of information for the application, user’s vocabulary, the size of the content collection and how much specificity is needed, how the content will be tagged or indexed, and how result sets will displayed and refined. The taxonomy is not an end, but a means to help users navigate information, find out what is in the collection, and get to meaningful results. And most of all, the taxonomy needs to provide clear, unambiguous access to information.

    Here’s a primer on the basic ways to organize concepts:

    Form 1: Lists (picklists, authority lists or controlled vocabularies)

    Good Ol’ Picklists ensure that a specific term is when creating or searching content. A picklist is really a list of lead or preferred terms such as Geographical Names, and/or other proper names including proper names for people, organizations or projects. Certainly not many of us can properly spell the name of the current Iranian President (Ahmadinejad) so it makes sense to pick that off a pre-defined list. The problem with picklists is that they are often buried in applications instead of right there on the home page as a search assistance or the design is so tied to the relational database design that you have to drill multiple levels to get to a reasonable query. That means misery for the searcher as well as the database programmer.

    Many excellent databases have transformed their picklists and controlled vocabularies into picklists that can be searched from the home page. For a great example of a taxonomy as picklist, look at Proquest or Cars. Com. These content sources manage their picklists as taxonomies, but each taxonomy is a clearly defined list of terms. Combine 2 or more of these lists and voila! You now have a faceted taxonomy where the user can now browse your content from the homepage.  (of course, you need a powerful content management software as well but that’s another story).

    Form 2: Synonym Lists

    Synonyms are a wonderful use of taxonomies which are easy to track in taxonomy tools and spreadsheets, but that surprisingly difficult to implement on the User Interface. If the Search Box is used, you will need a taxonomy rich in synonyms so that users don’t have to worry about the preferred form of a term or even a misspelling. Why do synonyms matter? First, they can be used to track words that mean the same thing such as “car” and “auto” or “automobile.” For example, the environmental movement prefers the term “Climate Change” be used instead of” Global Warming”. The use of synonyms allows one concept to be instantiated as the same as the other, but still allows a term to be preferred over another.
    But synonyms can be used to assist search in other ways. Synonyms can be used for
    • alternate spellings, such as British versus American spellings of terms like “organization” vs. “’organisation”
    • allow search misspellings such as alternate spellings of proper names or even common words (Honestly, how do you spell “broccoli?”)
    • allow alternate versions of proper names such as Hillary Clinton for Hillary Rodham Clinton
    • creates variations on concepts or phrases such as allow “Current Iranian President” to be used as an variant for Ahmadinejad, which is a name that few of us can spell correctly.
    In other words, have a liberal and generous policy about what’s a synonym, but be sure to test your application as you may get some unexpected false results as well because there will always be ambiguities. Adding synonyms to search is surprisingly challenging to implement. That’s why synonym-based searched systems are often paired with autocategorization rules-based systems such as Teragram but that’s a topic for another blog. On the other hand, if as terms evolve, adding synonyms to a taxonomy is a quick way to improve access without changing the database.

    Form 3: Hierarchies

    Taxonomies are used to create the familiar drill-down type of interfaces. Traditionally viewed as hierarchies or tree structure, hierarchies capture the following types of interrelationships:
    • Parent/child
    • Broad Term/Narrow term
    • Is a part of
    • Is a type of
    The biggest mistake made in creating hierarchies is by associating items that do not have an inherent hierarchical relationship. If you are creating a hierarchy with two terms that do not fall in the links above, you may be better of considering building two separate lists (or facets) — that’s the start of a faceted navigation. You have to conduct a logical “sniff test” when constructing hierarchies. Let’s go back to our government example. Let’s say you are building a simplistic hierarchy with United States as a term and Hillary Clinton as a narrow term. But Hillary Clinton is not actually a parent/child, part of, or type of United States. That is not a true hierarchical relationship. In that case, wouldn’t it be better to think about modeling how users look for information about government and then structure separate taxonomies — Countries, Leadership and perhaps a third facet for Forms of Government. So the best rule in creating hierarchies to make sure hierarchies are about the same category of knowledge. Taxonomy geeks called this “orthogonal.” Better for the programmer and better for the end user.

    Form 4: Faceted Navigation and Thesauri

    If you stepped through the process above, the look how fast and easily you moved from Lists (authority or otherwise) to Hierarchies to Faceted Navigation. A faceted navigation is basically hierarchical taxonomies that have been normalized and categorized so that terms do not cross categories. . Thesauri is a fully-fleshed out taxonomy where all the synonyms and hierarchies within a category are labeled. Thesauri also allow related terms. Related terms or associative relations are links between categories of terms. Faceted design has several advantages
    • By having a faceted structure, you can begin the process of disambiguating terms. For example, if my application is about House and Garden Design where the term “”green” is common, I might have Green Building Products under the Products category, while “green” as a color would be in the color and decorating category. By categorizing terms under the appropriate facet, the term is now unique based on the meaning in context. Thus, there are now two distinct, disambiguated topics.
    • Take the government example we are building above where we have taxonomies for countries, leaders and government structure. By creating a top-level of facets or categories, you now building a model of the domain. If a country changes leaders, or forms of government, I can change that concept without reindexing or relinking my entire application. And now I have the added benefit of having a framework to build User Interfaces that might be easier to navigate because I have designed a better conceptual framework. I am also now well on my way to designing an ontology. How? Read on:

    Form 5: Ontologies

    An ontology is basically a faceted taxonomy where all the ambiguities have been resolved and where all the concepts have been described as completely as possible. The other feature of ontology is the potential use of links or RDFa as a language to describe the links between the categories and terms. Now there is one more step in my progression from lists to ontologies — to create links between categories. For example, we know Countries Have Governments, and that Governments have Leaders. The phrases are called triples, which is a subject and object inked with a predicate. While many of the issues about how to implement ontologies are still cooking, so to say, it is worth thinking through how to implement ontology. After all, good information access is about clarifying questions and resolving ambiguity. The downside of ontologies is the inferencing. For example, if you look at Friend-of-a-Friend (FOAF) application, we all know that we know people with diverse interests and beliefs, but those are not our beliefs. This type of syllogistic inferencing might have unintended negative consequences so be judicious.

    Take a renewed look at those picklists, and start to see the connections between those terms. You’ll be on the first step towards styling your taxonomies and building unambiguous, powerful ontologies.

    ~ Marlene Rockmore

    Filed under: Conceptual Modeling, How To

    Extreme Picklist Makeover

    Last winter, the side airbags in my car deployed for no apparent reason. What does this have to do with taxonomy? Well, the subsequent struggle with both the insurance company and the car manufacturer sent me scrambling to the National Highway Safety Transportation Database (www.safercar.gov) to research spontaneous deployments of side curtain airbags when there was no visible damage to wheels, tires or undercarriage.

    First, I love government information. Just today I used the U.S. Geological Service and checked information at the Bureau of Labor Statistics but the US Government has to learn how to makeover its picklists and 1.0 databases into an information architecture with usable taxonomies. These ugly ducklings need to become swans.

    nhsta defects and recalls

    nhsta defects and recalls

    Here’s the problem. In a traditional database, every record has to be unique to avoid redundancy so when multiple reports are filed,  all reports are tied back to the original record.  Unfortunately, what happens is that the end-user, who is searching for information in a desperate moment of need such as after an accident, has to find that original record. The record I needed which described a research report about 498 similar complaints was filed in 2006 but was filed under the the original complaint (different year and model) which was a record created in 2003. To find the record that contained a research report filed 3 years after the original complaint, I had to use a year that was prior to the manufacturer of my car, and I was unable to search by the specific component failure as a keyword or phrase. I found the record by using a citation from a Google search where I found a news team investigation of a similar event in a different model. Even with the citation, I had to drill through multiple layers four queries deep to find the original record and I was unable to search by any keywords or topics.

    How would taxonomy have helped? A taxonomy would have helped in 2 key ways. First, content management using a taxonomy provides multiple access points related to the same set of topics and issues. A faceted taxonomy would have provided a useful user interface that would have allowed me to alter my search strategy. Searching by model under the existing database design doomed my search to failure because the record I needed was filed under a different model and a different year. Second, the database would have been designed to consider multiple access points to content without sacrificing the benefits of relational database design. It would have simplified the query programming logic, but still allowed an efficient database design.   A good taxonomy design would make it easier to add new facets or terms as technology evolves to search across topics such as environmental issues and engine efficiency.

    A quick 2-level redesign of the NHSTA interface might aid searching through a simpler page navigation such as

    Vehicle Safety by type

    • Auto Safety
    • Bicycle Safety
    • Motorcycle Safety
    • Light Trucks
    • OffRoad
    • Tractor/Trailer

    Driver and Occupant Safety

    • Child safety, car seats and restraints •
    • Teen drivers •
    • Older Population •
    • Population under 5’5”

    Traffic Safety

    • Data by state
    • Pedestrian Safety
    • School Transportation Safety

    Recalls, Defects, and Complaints

    • By manufacturer/model
    • By component

    New Technologies

    • Fuel efficiency

    Recent studies

    • Press Room
    • Fact Sheets

    Redesigning picklists into taxonomies is not a difficult task for trained taxonomists and projects can be very cost-effective even in a tough economy. In my case, my search led to thousands of dollars of savings in insurance expenses. In other cases, getting good information quickly will help save lives. The hard part is pre-determining what the categories will be captured in the taxonomy, and how databases will be searched by endusers, but that’s why there are taxonomists who can do usability studies and research existing metadata such as insurance reports and consumer safety databases. The taxonomy can also be used to reindex databases through tools that support entity extraction where the taxonomy can be used to find synonymous terms.

    After a weekend searching the NHSTA database, I was almost as eager to call the US Government to help provide an “extreme picklist makeover” to transform Web 1.0 picklists into a more searchable 2-level faceted taxonomy as I was to successfully resolve the issue with my vehicle manufacturer. I can’t imagine how anyone without some training or experience would have figured out the logic of the database and constructed a search strategy. By the way, I had a happy resolution with the manufacturer but I am still waiting for the NHSTA to respond to my complaint. One of the changes I am hoping for in the new administration is more attention to our neglected government databases which are in need of “extreme picklist makeovers.” Information has to be easier to find. In some cases, this improved access can save a life, if not thousands of dollars (as was my case).

    - Marlene Rockmore

    Filed under: Conceptual Modeling, Government Databases, User-Centered Design

    Book Review: Organising Knowledge by Patrick Lambe

    Although the interest in and applications of taxonomies has grown in recent years, there are still not many books on the subject. Most of the information on taxonomies currently resides in online discussion group archives, blogs, wikis, conference presentations, white papers and reports (the latter at quite a premium price), but not much yet in easily accessible books. A search on Amazon.com on “taxonomies” yields numerous books of specific taxonomies, but very few on the art of creating taxonomies in general. Even the “books” page on the Taxonomy Community of Practice Wikispace lists mostly books on information architecture, a classic book on classification theory, chapters of books on broader topics, and high-priced research reports. There is just one book listed with a focus on taxonomies: Organising Knowledge: Taxonomies, Knowledge and Organisational Effectiveness by Patrick Lambe (Oxford, England: Chandos Publishing, 2007)

    Indeed, as its title and subtitle suggest, taxonomies are presented within a broader view of how knowledge is organized. The book is neither a simple “how to” book, nor a scholarly treatment of the subject, but in fact combines both: practical advice on how to create taxonomies along with thoroughness in covering the field of knowledge organization and analysis of various ideas and previous literature on the subject, with many footnotes and a lengthy bibliography.

    The author, Patrick Lambe, is a Singapore-based consultant in the field of knowledge management who can base his ideas on his own business experience. Yet Lambe also has the academic credentials of an information scientist, a Master’s degree in Information Studies and Librarianship and experience teaching as an adjunct professor. Thus, he aptly bridges both sides of taxonomies, the traditional library science side and the newer corporate knowledge management side, although it is the latter that is the subject of this book. What I appreciate in this book is that Lambe writes based on both his research and his experience, and based on these he has developed a number of his own ideas.

    While common definitions of taxonomies often limit them to hierarchies, Lambe prefers a broader definition. The forms of taxonomies that Lambe presents, along with a detailed explanation for each, are: lists, trees, hierarchies, polyhierarchies, matrices, facets, and system maps. Stretching the definition and boundaries of what taxonomies are and can do is a central theme of Organising Knowledge. Lambe states: “Taken together, it becomes clear that taxonomy work holds a wider range of application and use than simply as a tool of information retrieval.” (p. 95) .

    Organising Knowledge presents a number of real world examples, scenarios, and case studies of the application of taxonomies in their broadest sense. These include implementations by the U.S. Department of Homeland Security, Unilever, and Club Med. These examples illustrate the wide range of uses for taxonomies. Among business activities, Lambe says that taxonomies can support the areas of risk recognition and response, cost control, customer and market management, and innovation.

    Lambe does not simply describe taxonomies and their use. In this in-depth book he discusses their varied roles, how they are understood, and trends in their implementation. He describes how different kinds of taxonomies can either (1) structure and organize (both things and processes), (2) establish common ground, (3) span boundaries between groups, (4) help in sense-making, or (5) aid in the discovery of risk and opportunity.

    Several later chapters turn to the practical steps of preparing, designing, and implementing a taxonomy project. Lambe breaks out the process into ten steps, the first six of which are all still part of the preparation stage. Among the topics presented in the preparation phase are taking technology into consideration and communicating well with the taxonomy sponsor and stakeholders. While it is appreciated that technology/computer systems are mentioned, I would have liked to learn more about this. It becomes quite evident that different situations require different approaches and different kinds of taxonomies, the different kinds of taxonomies that Lambe describes earlier in the book. My only point of disagreement here is the continual distinction between tree taxonomies and faceted taxonomies, since taxonomies often exhibit both characteristics at the same time.

    The book is well written and relatively easy to follow, but it is not a “light” read. It has a number of helpful tables and diagrams. Particularly useful is the table (two and half pages long) comparing the uses and issues for each of the seven forms of taxonomies: lists, trees, hierarchies, polyhierarchies, matrices, facets, and system maps.

    I highly recommend this book of great breadth and depth to anyone who works on taxonomies or is interested in working on taxonomies. The intended audience of the book is indeed limited to knowledge management and taxonomy professionals. Even those with considerable experience working in taxonomies will find this book informative and enlightening.

    - Heather Hedden

    This review is based on a longer book review written by Heather Hedden and published in Key Words, the Bulletin of the American Society for Indexing, Vol. 15, No. 4, October-December 2007, pp. 130-132.

    Filed under: Book Review, Planning a Taxonomy

    Twitter

    • A need for people who can tweak.. inquiry cannot be fully automated. Algorithms Get a Human Hand in Steering Web nyti.ms/YQLARqtweeted at 1 year ago
    • Reinventing taxonomists: Can professionals who can make text findable; can they improve practices for sharing across big data apps?tweeted at 1 year ago
    • A Need for Agile Enterprise Taxonomies? Read my take on discussions at Enterprise Data World wp.me/prz8d-6R #edw12tweeted at 1 year ago
    • A Need for Agile Enterprise Taxonomies? See my take on what I heard at Enterprise Data World wp.me/prz8d-6R #edw2012tweeted at 1 year ago
    • Put "Best insurance policy" in Wolfram-Alpha engine: Result: An old-style ANSI taxonomy tinyurl.com/7zcz6qb. Are they hiring librarians?tweeted at 2 years ago

    Categories Dropdown

    Blog Stats

    • 47,524 hits
    April 2014
    M T W T F S S
    « May    
     123456
    78910111213
    14151617181920
    21222324252627
    282930  
    Follow

    Get every new post delivered to your Inbox.

    Join 39 other followers