Understanding Associative Relationships

Taxonomies are collections of facets that consist of terms that are described and made unique through connections to each other through relationships. The common relationships are equivalency, hierarchies, and associative (related) relationships.  Of these relationships, the least understood of these relationships and the least used is the associative relationship. Associative relationships are sometimes also called related terms are sometimes also called See Also relationships or sydetic  (cross-references) structures.

One of the reasons that associative (related terms) are least used is because of the confusion about how they are implemented.  Hierarchies move us up and down a category of information that share common properties.  Associative relationship help point us to other aspects of a topic.

Associative relationships can help sort topics into clear categories.  This creates more simplicity that helps both programmers and users.  For example,  might be about paper products.  If I organize one giant enterprise taxonomy with a single hierarchy, types of paper products from envelopes to toilet paper will be mashed in with other topics such as paper weight, composition, manufacturers, or activities for which product is used.

Instead by sorting terms into facet and using associative relationships, you create a clearer graphical mapping of these concepts. In the example below, a taxonomy has been sorted into multiple facets.  This hierarchy has parent/child relationships, which can also be called the abstraction and an instance as in the illustration below:

.

Middle Level or Abstraction Products Manufacturers Composite Measured By
Lower Level: Instance Envelopes Canson, Tyvek Vellum

100% recycled content

Standard Sizes

Custom Sizes

The associative  relationship can be used to connect the dots  between the columns in the table above . An associative relationship can be displayed as a See Also,  a Related term (RT) as in ANSI-Standard Thesaurus relationships, or as a semantic relationship (predicate) as in a semantic triple.

Paper Products

See Also  Manfacturers and Distributors, 100% Recycled Content, Custom Products

Envelopes

See Also Canson, Tyvek, Vellum, Standard Sizes,

Or by a thesaurus relationship as in

Paper Products

Narrow Term (NT): Envelopes

Related Term (RT):  Manufacturers, Composite, Custom Sizes,

Or by custom  semantic Relationships which could replace the See Also or RT

Paper Products

<MadeBy>Manufacturers

<ComposedOf> Composite

<MeasuredBy> Standard Sizes, Custom Sizes

This design can also be represented as simpler controlled vocabularies that are stored in fairly flat files.  If a file is hierarchical, the hierarchy would be at most 2-3 levels.  In the following example.  the top level abstraction becomes the name of the facet with specific values for each facet as follows:
Facet1: Paper Products: Office Paper, Envelopes,
Facet2
Manufacturers & Distributors ->  Weyerhauser, Tyvek, Canson, ACME, Dunder-Mifflin
Facet3: Composite:  Vellum, 100% Recycled Content,
Facet4: Standard Size:

The benefits to using associated relationships to create the connections or “predicates”   between facets are multiple, but some of the obvious benefits are

Processing Improvement: Each facet is mutually exclusive/orthogonal and recombined as needed with other facets.

Extensibility: Since the design is based on a model and add new facets such as content type  (blogs, videos), audience or customer, events (such as weddings or business)  or additional attributes such as color.

Ease of Maintenance: It simplifies long-term maintenance because you no longer have to dive into and sort through or under complex hierarchies to find related concepts.

Discover Information: By sorting these terms into facets and using associative relationships it becomes easier to browse through an information space

The very best part of this approach is that you can change the taxonomy model , without changing the content, and you can update the taxonomy with new terms, such as processes, and it is updated everywhere.  There is one important caveat in this approach.  My data shows that this predefined modeling works 80% of the time  but for about 9% of  uses cases,  you will run into glitches  because of ambiguous or co-occuring single terms   that might appear in multiple facets.  These are words such a  such as “treatment” or “process ”  or  “evaluation” which, as  single terms, are somewhat vague.  When combined in compound phrases, such as “Water-processing”  or “Chemically-treated”, the meaning is clearer.

In my practice, I typically identify and  sort these vague  guide or hub terms such as “process’, “treatment’ “’management”, “evaluation” and so forth and sort them into a separate facet.  I can then use an associative relationship to point to the more specific compound terms (which are in their appropriate facet.)  When these terms are then sorted into the right facet, the meaning becomes even clearer.  This sorting of terms into facets and linking facets with relationships, including semantic, RDF-like, relationships as shown above can be used to create more specific information spaces.   The string that results from this work is more precise than a term that stands alone with no associated relationships.

By using associative relationships, you can build a taxonomy  where it is easier to discover related facets of information, and to combine attributes to refine searchers.  For example, if I am looking to information about envelopes for a wedding invitation,  associative relationships might also help me find information about  wedding planners, or wedding destinations.  I can find videos on how to properly create an invitation.

Of course, that assumes that you define taxonomies as “collections for facets, classes or graphs that create unambiguous terms.” It is also easier to add new facets if needed and link in new data using associative relationships.  Note:  I cannot  do any of the above UNLESS I have built predefined links between facets in the background model through associated relationships between facets.  It is very hard to recognize these known relationships “on the fly.”

Associative relationships are the least understood, but when you take the time to learn how to create them,  it becomes  one of the best arguments for taking the time to build a faceted taxonomy and in adapting data modeling techniques as part of taxonomy development.  If you are designing associative relationships for online systems, whether it is for managing content, e-commerce, or search, you might want to be attentive to how you create associative relationships because there are implications for implementation in improving user navigation and in machine-to-machine processing.

–      Marlene Rockmore

NOTE: In future posts, I’ll put up a reference table to show the relationship between  kinds of associative (related terms) and semantic labels for predicates in a “triple.”  Look for that in future posts.

Reblog this post [with Zemanta]
Advertisements

Using Taxonomies to Sort through Health Care Reform

I am very interested in the health care reform debate, thus I wanted to know what a public option might look like. I was told by my sources that a robust public option might look a bit like Medicare. So off I went to the Medicare.gov website to find out what was covered.   In the middle of the home page in the second column, there is  a link to ‘Find Out What is Covered, ” which leads to an advanced search criteria page. The search page  includes picklist of about 143 topics,  just the right size for a sample set of candidate terms  for a card sort.

This month, I am offering a small interactive experiment in online card sorting.   Taxonomies are collections of facets, which are created by organizing concepts into categories.  Card sorting is one of the best ways to identify categories by having controlled tests with groups of users to create categories, that can be validated through repeated tests, until there a consensus.  In health care reform, taxonomies might be useful to help create consumer-friendly interfaces to help search across the national insurance exchanges.

A card sort method uses the following steps:

  • Collect a sample set of candidate concepts
  • Group or cluster terms into categories
  • Refine the design iteratively until there is a set of facets, groups of categories that have similar properties

I’ve put 130+  topics from Medicare into an online card sorting tool called Websort.net.  The topics have not been formatted or massaged; they are just as they appear the Medicare search picklist.   Websort.net suggests  that I use a closed card sort,  where participants sort terms into predetermined categories. So to get  started,   I’ve come up with about 20 starter categories.   Some of these categories will become subtopics in a faceted design

The experiment is open to the first 10 participants who want to take the time to try this task.   To try the card sort, link to

http://websort.net/s/80CDD6/

Please feel free to assign terms to multiple categories or to suggest other categories.

Last month, Joseph Busch blogged about the judicious use of online web sorting tools – that they may not be the most cost-effective way to build taxonomies. One of his arguments is that the sample set of users will not be random. That’s true. This blog has a small readership who have interest in taxonomies, and probably have a consumer’s interest in health care reform. Let me know what you think of websort.net.

This little experiment could help demonstrate some bigger observations. Government may be looking to advanced high volumentechnologies such as clustering or semantic technologies to identify categories and to map claims data.   Perhaps one of the applications will be  to build interfaces that will help consumers search across the national exchanges.  But at the core of these technologies, there will be a need for well-designed taxonomies to help analyze text and building better interfaces to access health care information.

A well-designed taxonomy with facets and linking relationships can

  • Group information into useful categories
  • Identify gaps in coverage
  • Help point to important related information

Let’s find out if taxonomy design can help us sort through health care reform.

Thanks to Andy Oram and the Sunlight Foundation for introducing me to this tool and to Dave Cooksey who is virtually updating my card-sorting skills.

What’s wrong with crowdsourcing the design of public websites?

A blog post from Sunlight Labs on “Redesigning the FCC: Getting Organized” suggests an experiment that employs a public card-sorting program, websort.net, to help redesign the Federal Communications Commission (FCC) website.  The FCC has a notoriously convoluted web site, hard to navigate and hard to search.  Sunlight Labs invites anyone interested in helping the FCC to this open card-sorting activity, which organizes about 60 terms into categories related to the FCC. But is a public web sort the right approach to redesigning a government website?

Should we crowdsource the design of a public website?

Here are some considerations: –

  • First, the success of any design process depends on who sits at the table. Site designers have not succeeded over the years by roping in anyone who happens to be around. Rather, carefully identifying the right participants for any design activity is very important. Engaging busy professionals and bureaucrats in order to derive the maximum impact with the minimum effort is a tricky business. One of the most cutting critiques of the Wikipedia has been that the editorial perspective is overwhelmingly white-male twenty-something—not necessarily the authority of choice for everyone else.
  • Second, open processes tend to be very time-consuming, which works in your favor for some kinds of crowdsourcing but not for selecting terms and categories. Unless the sample is large and controlled, the emerging pattern from crowdsourced card sorting may not be helpful because experts with limited time will be overrun by people with lots of time and a fast hand on the keyboard, no matter how much or how little they know. Some types of crowdsourcing (such as prediction markets) work because the errors of ignorant participants cancel each other out and allow the experts to win out—but card sorting is entirely different and results in just chaos.
  • Third, it would be much quicker for the FCC to suggest a model for organizing its content based on its expertise than to crowdsource the design. There are standard ways to organize things, including website content, which people can learn even if they are not entirely natural. We learn about brand, price, size, color, material, and fit because they help us find the stuff we want to buy, not necessarily because there is a shopping gene in our DNA.
  • Fourth, the users of these sites, such as broadcasters, regulators, website publishers, and ordinary people, are not always interested in the same things. The FCC will have to comply with legislative and executive branch imperatives that may be of little interest to many people in the crowd.

A better way to approach website design and redesign focuses on the backend nomenclature—buckets and categories, which are called facets and vocabularies. These form the basis of a useful taxonomy.

So when can crowd-sourcing be used effectively? If the FCC engaged in the process of designing facets and vocabularies, the crowd could be useful as a follow-up. First, it can be helpful in validating a design. After all, the test of a taxonomy is whether it helps people find information. One of the appropriate roles for crowd sourcing in taxonomy is to observe how the users access a collection of items over time, the searches they use, and the click paths they follow. The taxonomy can then be tuned based on how the activity distributes among the categories—splitting and merging categories as warranted.

Another place for crowdsourcing is to allow users to add free-text “tags” to the content. Those tags can then be evaluated to either map them to existing taxonomy categories, or to suggest changes to the taxonomy. In this case the crowd and the taxonomy work together in synergy. Users typically add a tag to only a fraction of the pages, so in most cases these terms will be synonyms or equivalents to existing categories.

Finally, a card-sorting exercise can be useful after the field is carefully constrained by the experts who know the site. The true test of any card-sorting activity is whether people can actually find what they are looking for afterward. Mapping a tag as a synonym of an existing taxonomy category, effectively applies that tag to all the content already in that taxonomy category. This synergy is one method that can help improve access to information.

Here are several techniques that are intuitive and natural for people to use with little or no training, allowing them to validate a taxonomy. These techniques are much faster than open card sorts, and provide results that are easier to interpret.

  • Classifying some content
  • Conducting walk-throughs
  • Closed card sorting

Classifying some content

In this exercise, people are presented with a representative subset of content from the site and are asked to tag it. You can select it randomly or try to include examples of the site’s primary content types, as well as content you think may be hard to tag, find, or use. Plotting the number of items tagged into each taxonomy category, you should expect to see 80% of the content fall into 20% of the categories.

Conducting Taxonomy Walk-Throughs

One-on-one and group presentations to stakeholders showing and explaining or walking through the taxonomy, is an effective way to extract specific comments and sometimes overall approval. During walk-throughs, standard questions should be asked about the category structure, as well as about problematic categories, to gather feedback on the taxonomy. Delphi walk-throughs are done using a stack of cards. It is not a set of raw terms, however, as in the FCC exercise. Instead, the cards are already marked with categories chosen by the experts. Reviewers are asked to mark changes to the category labels on the cards. Each subsequent reviewer is given their walk-through using the cards with the label mark-up from the previous session. The process usually stabilizes after a few sessions, indicating that the categories are appropriate. According to Dave Cooksey, Founder and Principal of saturdave, 20 sessions will usually result in a consensus taxonomy revision, and this method provides results without any further analysis.

Closed Card Sorting

Closed card sorting, where categories are in predefined buckets, can be used to test whether stakeholders and end users consistently sort categories into the correct taxonomy facets. The categories to test should be a set of important topics, such as the most frequently searched words and phrases from the search engine logs. The test can be done using actual cards, or using a simple grid with categories to be tested down the left column and the taxonomy facets across the top. Paper card sorts work well enough for up to 20 trials.

Websort.net is a good tool when you need a larger, distributed closed-card sort test. If users can’t map terms to the categories, the designers will know that they have to adjust their design. But our experience shows that pre-analysis captures about 80% of the common categories and use cases. Sunlight Labs has undertaken a commendable task in seeking to improve the FFC web site’s layout. By carrying out a card sort too quickly, they’ll just get their signals crossed. Performing some professional taxonomy work first will channel public efforts in the right direction.

Submitted by – Joseph A. Busch, Founder and Principal, Taxonomy Strategies,  Sept  8, 2009

Reblog this post [with Zemanta]

5 Types of Taxonomies: From Lists to Ontologies

Taxonomy, strictly defined, is a hierarchical arrangement of terms, but the form of a taxonomy depends on the information problem at hand. After all, taxonomy is a method for organizing knowledge or concepts, which requires flexibility in how to capture and represent concepts. The complexity depends on factors such as what’s the core area of information for the application, user’s vocabulary, the size of the content collection and how much specificity is needed, how the content will be tagged or indexed, and how result sets will displayed and refined. The taxonomy is not an end, but a means to help users navigate information, find out what is in the collection, and get to meaningful results. And most of all, the taxonomy needs to provide clear, unambiguous access to information.

Here’s a primer on the basic ways to organize concepts:

Form 1: Lists (picklists, authority lists or controlled vocabularies)

Good Ol’ Picklists ensure that a specific term is when creating or searching content. A picklist is really a list of lead or preferred terms such as Geographical Names, and/or other proper names including proper names for people, organizations or projects. Certainly not many of us can properly spell the name of the current Iranian President (Ahmadinejad) so it makes sense to pick that off a pre-defined list. The problem with picklists is that they are often buried in applications instead of right there on the home page as a search assistance or the design is so tied to the relational database design that you have to drill multiple levels to get to a reasonable query. That means misery for the searcher as well as the database programmer.

Many excellent databases have transformed their picklists and controlled vocabularies into picklists that can be searched from the home page. For a great example of a taxonomy as picklist, look at Proquest or Cars. Com. These content sources manage their picklists as taxonomies, but each taxonomy is a clearly defined list of terms. Combine 2 or more of these lists and voila! You now have a faceted taxonomy where the user can now browse your content from the homepage.  (of course, you need a powerful content management software as well but that’s another story).

Form 2: Synonym Lists

Synonyms are a wonderful use of taxonomies which are easy to track in taxonomy tools and spreadsheets, but that surprisingly difficult to implement on the User Interface. If the Search Box is used, you will need a taxonomy rich in synonyms so that users don’t have to worry about the preferred form of a term or even a misspelling. Why do synonyms matter? First, they can be used to track words that mean the same thing such as “car” and “auto” or “automobile.” For example, the environmental movement prefers the term “Climate Change” be used instead of” Global Warming”. The use of synonyms allows one concept to be instantiated as the same as the other, but still allows a term to be preferred over another.
But synonyms can be used to assist search in other ways. Synonyms can be used for
• alternate spellings, such as British versus American spellings of terms like “organization” vs. “’organisation”
• allow search misspellings such as alternate spellings of proper names or even common words (Honestly, how do you spell “broccoli?”)
• allow alternate versions of proper names such as Hillary Clinton for Hillary Rodham Clinton
• creates variations on concepts or phrases such as allow “Current Iranian President” to be used as an variant for Ahmadinejad, which is a name that few of us can spell correctly.
In other words, have a liberal and generous policy about what’s a synonym, but be sure to test your application as you may get some unexpected false results as well because there will always be ambiguities. Adding synonyms to search is surprisingly challenging to implement. That’s why synonym-based searched systems are often paired with autocategorization rules-based systems such as Teragram but that’s a topic for another blog. On the other hand, if as terms evolve, adding synonyms to a taxonomy is a quick way to improve access without changing the database.

Form 3: Hierarchies

Taxonomies are used to create the familiar drill-down type of interfaces. Traditionally viewed as hierarchies or tree structure, hierarchies capture the following types of interrelationships:
• Parent/child
• Broad Term/Narrow term
• Is a part of
• Is a type of
The biggest mistake made in creating hierarchies is by associating items that do not have an inherent hierarchical relationship. If you are creating a hierarchy with two terms that do not fall in the links above, you may be better of considering building two separate lists (or facets) — that’s the start of a faceted navigation. You have to conduct a logical “sniff test” when constructing hierarchies. Let’s go back to our government example. Let’s say you are building a simplistic hierarchy with United States as a term and Hillary Clinton as a narrow term. But Hillary Clinton is not actually a parent/child, part of, or type of United States. That is not a true hierarchical relationship. In that case, wouldn’t it be better to think about modeling how users look for information about government and then structure separate taxonomies — Countries, Leadership and perhaps a third facet for Forms of Government. So the best rule in creating hierarchies to make sure hierarchies are about the same category of knowledge. Taxonomy geeks called this “orthogonal.” Better for the programmer and better for the end user.

Form 4: Faceted Navigation and Thesauri

If you stepped through the process above, the look how fast and easily you moved from Lists (authority or otherwise) to Hierarchies to Faceted Navigation. A faceted navigation is basically hierarchical taxonomies that have been normalized and categorized so that terms do not cross categories. . Thesauri is a fully-fleshed out taxonomy where all the synonyms and hierarchies within a category are labeled. Thesauri also allow related terms. Related terms or associative relations are links between categories of terms. Faceted design has several advantages
• By having a faceted structure, you can begin the process of disambiguating terms. For example, if my application is about House and Garden Design where the term “”green” is common, I might have Green Building Products under the Products category, while “green” as a color would be in the color and decorating category. By categorizing terms under the appropriate facet, the term is now unique based on the meaning in context. Thus, there are now two distinct, disambiguated topics.
• Take the government example we are building above where we have taxonomies for countries, leaders and government structure. By creating a top-level of facets or categories, you now building a model of the domain. If a country changes leaders, or forms of government, I can change that concept without reindexing or relinking my entire application. And now I have the added benefit of having a framework to build User Interfaces that might be easier to navigate because I have designed a better conceptual framework. I am also now well on my way to designing an ontology. How? Read on:

Form 5: Ontologies

An ontology is basically a faceted taxonomy where all the ambiguities have been resolved and where all the concepts have been described as completely as possible. The other feature of ontology is the potential use of links or RDFa as a language to describe the links between the categories and terms. Now there is one more step in my progression from lists to ontologies — to create links between categories. For example, we know Countries Have Governments, and that Governments have Leaders. The phrases are called triples, which is a subject and object inked with a predicate. While many of the issues about how to implement ontologies are still cooking, so to say, it is worth thinking through how to implement ontology. After all, good information access is about clarifying questions and resolving ambiguity. The downside of ontologies is the inferencing. For example, if you look at Friend-of-a-Friend (FOAF) application, we all know that we know people with diverse interests and beliefs, but those are not our beliefs. This type of syllogistic inferencing might have unintended negative consequences so be judicious.

Take a renewed look at those picklists, and start to see the connections between those terms. You’ll be on the first step towards styling your taxonomies and building unambiguous, powerful ontologies.

~ Marlene Rockmore

Extreme Picklist Makeover

Last winter, the side airbags in my car deployed for no apparent reason. What does this have to do with taxonomy? Well, the subsequent struggle with both the insurance company and the car manufacturer sent me scrambling to the National Highway Safety Transportation Database (www.safercar.gov) to research spontaneous deployments of side curtain airbags when there was no visible damage to wheels, tires or undercarriage.

First, I love government information. Just today I used the U.S. Geological Service and checked information at the Bureau of Labor Statistics but the US Government has to learn how to makeover its picklists and 1.0 databases into an information architecture with usable taxonomies. These ugly ducklings need to become swans.

nhsta defects and recalls

nhsta defects and recalls

Here’s the problem. In a traditional database, every record has to be unique to avoid redundancy so when multiple reports are filed,  all reports are tied back to the original record.  Unfortunately, what happens is that the end-user, who is searching for information in a desperate moment of need such as after an accident, has to find that original record. The record I needed which described a research report about 498 similar complaints was filed in 2006 but was filed under the the original complaint (different year and model) which was a record created in 2003. To find the record that contained a research report filed 3 years after the original complaint, I had to use a year that was prior to the manufacturer of my car, and I was unable to search by the specific component failure as a keyword or phrase. I found the record by using a citation from a Google search where I found a news team investigation of a similar event in a different model. Even with the citation, I had to drill through multiple layers four queries deep to find the original record and I was unable to search by any keywords or topics.

How would taxonomy have helped? A taxonomy would have helped in 2 key ways. First, content management using a taxonomy provides multiple access points related to the same set of topics and issues. A faceted taxonomy would have provided a useful user interface that would have allowed me to alter my search strategy. Searching by model under the existing database design doomed my search to failure because the record I needed was filed under a different model and a different year. Second, the database would have been designed to consider multiple access points to content without sacrificing the benefits of relational database design. It would have simplified the query programming logic, but still allowed an efficient database design.   A good taxonomy design would make it easier to add new facets or terms as technology evolves to search across topics such as environmental issues and engine efficiency.

A quick 2-level redesign of the NHSTA interface might aid searching through a simpler page navigation such as

Vehicle Safety by type

  • Auto Safety
  • Bicycle Safety
  • Motorcycle Safety
  • Light Trucks
  • OffRoad
  • Tractor/Trailer

Driver and Occupant Safety

  • Child safety, car seats and restraints •
  • Teen drivers •
  • Older Population •
  • Population under 5’5”

Traffic Safety

  • Data by state
  • Pedestrian Safety
  • School Transportation Safety

Recalls, Defects, and Complaints

  • By manufacturer/model
  • By component

New Technologies

  • Fuel efficiency

Recent studies

  • Press Room
  • Fact Sheets

Redesigning picklists into taxonomies is not a difficult task for trained taxonomists and projects can be very cost-effective even in a tough economy. In my case, my search led to thousands of dollars of savings in insurance expenses. In other cases, getting good information quickly will help save lives. The hard part is pre-determining what the categories will be captured in the taxonomy, and how databases will be searched by endusers, but that’s why there are taxonomists who can do usability studies and research existing metadata such as insurance reports and consumer safety databases. The taxonomy can also be used to reindex databases through tools that support entity extraction where the taxonomy can be used to find synonymous terms.

After a weekend searching the NHSTA database, I was almost as eager to call the US Government to help provide an “extreme picklist makeover” to transform Web 1.0 picklists into a more searchable 2-level faceted taxonomy as I was to successfully resolve the issue with my vehicle manufacturer. I can’t imagine how anyone without some training or experience would have figured out the logic of the database and constructed a search strategy. By the way, I had a happy resolution with the manufacturer but I am still waiting for the NHSTA to respond to my complaint. One of the changes I am hoping for in the new administration is more attention to our neglected government databases which are in need of “extreme picklist makeovers.” Information has to be easier to find. In some cases, this improved access can save a life, if not thousands of dollars (as was my case).

– Marlene Rockmore