A Need for Agile Enterprise Taxonomies?

Agile Enterprise Taxonomies?

Enterprise taxonomy and common vocabularies was one of the  hot topics among the data professionals in master data management, and enterprise data who were at Enterprise Data World in Atlanta, Georgia last week.

Taxonomies, vocabularies, and ontologies seem to be one of the ingredients that is being added to the secret sauce.  One enterprise data modeler from a health insurance company put this succinctly,  “I use taxonomies to understand my data, not to reorganize the data or content.”

These practical data architects are searching for ways to help them find simplicity in complexity of data management without having to disrupt business critical data flows and best practices.

The driving business forces behind this resurging interest in  taxonomies has more to do with driving needs to improve productivity, open flow of information, and efficiency, not search or semantics.  Enterprise data managers are looking to overcome a weakness in traditional tools of data management, where it remains a challenge to collect and analyze information in different stovepipes or “swim lanes”.   Enterprises need solutions that can allow access to data rapidly but without destroying the data  or the data quality processes.

The driving business needs include the requirements to:

  • Need to encourage collaboration and data sharing  over diverse databases.  Instead of finding the one usage, in the agile environment, there is an understanding that terms often have different meanings in different functions.  Acquisiition means one thing to a purchasing agent and another to the business strategist.   Geoff Mahafsky of Phasic Systems pointed out that  learning to listen to, embrace, and respect these variations is a critical first step in an agile  strategy.
  • Integrate structured and unstructured data.  Unstructured data includes content types such as text, documents, reports, digital assets.  Understanding a company’s internal operations not only involves data; it also requires understanding policies and programs, the type of information found in text.
  • Facilitate access to data.  Whether the reason for gathering data across function is for regulatory processes or business intelligence,  enterprises need tools and processes that can pass queries to diverse data and content stores at the speed of business.  This approach means that queries don’t always have  to be have to be translated to SQL; in fact the same analytic question could be passed to multiple technologies such as NoSQL relational graphs, search engines, and semantic  along with relational databases.

The buzzword for this cross-functional enterprise data management is called model-driven architecture (MDA)  and it can work with existing master data management and reference data programs.  Model-driven architecture supports  a hybrid solution, which allows multiple technologies and processes.

The use of taxonomies  and ontologies  is finding its way into some of the leading end-end metadata and analytics products.   Examples of some systems and services providers who are taking this approach who presented at this conference  include Phasic Systems  and Spry, Inc. .

What are skills might taxonomists might have to meet the new requirements  for agile enterprise vocabularies?

 Ability to listen to and represent differences and commonalities:  Discussion about taxonomy can lead to understanding distinctions.  A common enterprise vocabulary does not mean single usage, but instead finding ways to represent and respect distinctions between functions.

Knowledge of the tools and processes of  semantics, vocabularies, rules and processes that help build bridges between silos and functions.   Taxonomists have a background in the best practices in information and knowledge organization based on standards, research and experience.   It saves companies from having to reinvent processes which can be costly.

Skills to integrate with a  hybrid technology approach.    A model based approach means that the system integrator does not need to know the physical model to create a logical model.  This requires generalist skills who can comfortably work with different organizations who use different technologies.

Ability to work well with others and respect the work of data governance and quality processes, policies and functions.  A taxonomist joins the governance team, so that the work integrates with existing processes.  The model-driven approach doesn’t replace existing staff, systems and processes at departmental and functional levels. Those processes exist because they are tested, effective and safe, but these groups may need additional support in adding on an additional service in terms of vocabulary development.

Taxonomist have long known that taxonomies can be built quickly, can model the needs of a business environment, can be improved over time, can be modified or even discarded as the needs of business change.  In an era of big data, taxonomy processes might help big data, data management better understand, utilize, share and repurpose their data  but  taxonomists will need to show their agility.  The investment in these skills will be small compared to the ability to adapt and utilize a company’s information assets.

©May 2012    Marlene Rockmore

 

 

 

 

 

 

 

 

 

 

 

 

 

What next, taxonomy?

Can anyone learn what is happening in a field by following a conference on Twitter? Tha’s how I decided to follow Taxonomy Bootcamp 2012.   I missed networking with a new generation of confident, well-trained taxonomists, but nevertheless, I will attempt to identify the themes and challenges that are facing taxonomists.  Please feel free to add to the list or dispute!

  1. Centralized models don’t scale; think federated, allow local variation.

Taxonomists have jobs because organizations value managing its resources with a common vocabulary, but how the architecture and governance of the vocabulary maps to the internal structure of an organization  is harder to understand.  Speakers urged attendees to allow distributed models that accommodate local variations and utilize local project vocabularies.  Using term sets (facets) help ease taxonomy managemement. As an example,  one tweet posted about the talk given by Gary Carlson and Pam Green, relayed that Microsoft had 23 term sets for its intranet – the most complex was products, and the simplest was confidentiality.  This structure assists in setting up access control to owenrship  groups manage these vocabularies at the group and/or term level.

2. Does social media and information sharing have a role in taxonomy and how does it square with governance, security, control, and confidentiality

This topic yielded some buzz and offers an interesting dialetic between the issue of information access and security. @syndetic tweeted that “HR sees social media as a time waster, and IT sees it as a security probem.”     So let’s start with exploring to see why  social media matters to taxonomists? The main point is that information is ambigouous and confusing, and that social media is necessary to help generate, identify and clarify relevant, lively, topical content now that can be curated later (as opposed to traditional models of curate then share).  AprilMunden  summed the argument for social media nicely in a tweet: “using folksonomy (user generated taxonomy) should defInitely be a part of your initial design AND ongoing governance plan.”   Seth Earley, always with  a witty insight,  said “Fast changing, requires expertise, ambiguous concepts …business can own; slow to change, more security,unambiguous,  let IT own that”     Maybe that’s a strategy for starting the conversation about the business value of tagging and social media.

3.  Why non-expert taxonomist needs to build a taxonomy.

“Experts swim in the deep end of the content.” Tom Reamy made this statement which set off a flurry of sympathetic tweets.  Tom clarified that experts know their content and process but can get mired in details and in their point-of view   Tom’s statement is provocative because it clarifies the role of the taxonomist which is

  • to be apply social  listening skills and analytic methods to categorize diverse needs of a wide range of users
  •  Build a well-structured faceted taxonomy that reflects needs of different constituencies and fits with the governance structure of an organization (see point 1)
  •  Validate and test the taxonomy
  • Assist with the support of application integration
  • Establish ongoing governance processes including use of social media and text analytic to keep the taxonomy relevant

A taxonomy needs to reflect multiple user needs which means allowing non-experts to explore topics before they go in for a deep dive.  Taxonomists need to work with experts to gain their support by engaging and in validating the taxonomy.

3. What aboutTaxonomy, the User Interface, and Big Data and Mobile apps

Taxonomists sit at an intersection between  data/digital assets/content and user interfaces, but it’s not clear how taxonomists apply their skills.  They are not graphic designers or UX experts, and not quite database experts.  A few sessions mentioned data visualization, and other graphic imagery to explore content and data such as   mashups of datasets,  grids, wheels, graphic organizers, or maps. This is an area where taxonomists, who are not as visually oriented, need to rethink their approach, to start to think of themselves as information artists.  Taxonomists can be advocates for adopting in improved techniques including standards that organize taxonomy/metadata framework  and can also advocate for tools that make sharing between across applications, organization and platforms more efficient, which brings us to point 4

4. Taxonomy tools must make it easier to import and export vocabularies

Taxonomists know that their vocabularies need to play well with all the applications above as well as other needs such as the goal of providing cross-organization information access Sharepoint, XML, relational databases, legacy products.   Taxonomies work in part because they are agnostic, because they can work in with any number of technologies, because concepts and metadata are separate from the content.  To play well with others, taxonomy tools  need to support import and export of vocabularies into different standards including SKOS or XML,  As KarlaTR tweeted ”If you put something in a taxonomy, you’ve got to get it out.”     One option is to try tools that are marketed at Taxonomy Bootcamp such as TopQuadrant EVN and Smartlogic which have been ported from the ontology world are now alongside Synaptica and Information Access as part of the tool evaluation process.

So what next, taxonomy?     What is nice to hear is that more taxonomists are surviving because their organizations understand their core roles. What’s the emerging topics  and challenges —  how to distribute and decentralize (localize)  while having authority and control, how to collect new content on emerging, current topics, visualization, how to be more agile, how to fit in with new technologies like social media, mobile, and big data.  Phew!  That’s a challenge.  Taxonomists have a chance to build relationships not only between terms, but with stakeholders in on a the way to a compelling, visualized, multidimensional content strategy.  Good luck.

Deconfusing Healthcare through Taxonomy Inquiry

This winter,  I had an opportunity to participate in an information research team that had a chance to interview top executives in health care in Massachusetts.  This included the CEOs of insurance companies,  regulators from the Attorney General’s office, and medical directors of major medical networks and hospitals.   The goal of this project was to understand one term  “Cost Containment”   — what are the drivers for rising health care costs and what can be done to slow the rate of growth.

When someone with taxonomy skills participates in these types of investigations, it is hard not to put those taxonomy skills to work. What did I learn from this process that might be applicable to best practice and to understanding health care cost containment?

1) Start with a  simple but important question  as a guide for developing deeper knowledge

This group started with the question  “What is cost containment?”   It is a fairly fundamental question since we in Massachusetts are fortunate to have universal coverage (about 97%)  but there is a need to control costs.  By asking this fundamental question. the group could  collect basic facts from each key player on the same topic   to understand how proposed strategies are defined from the point of view of key players who are shaping policy.

2) Get to know the cast of characters

Remember the adage that the key to a baseball game is to know the players and the same applies to understanding a complex issue. We need to  who the users are, what brought them to these meetings,  It is critical to  identify the constituencies in healthcare, all of whom have different goals in any situation.   The key actors we indentified were:

  • Insurers (also known as Payers)
  • Providers (Hospitals, Doctors, Specialists)
  • Regulators (government, legislature, attorney general)
  • Consumers (includes business owners, patients, local government)
  • Purchasing agents (people who buy insurance for large groups — government, business, insurance agents)
The above list is a top level of the Actors/Player facet which further breakdowns.  Insurers for example is further categorized into companies, corporate structure (profit/non-profit), market share.    Not all the groups under these broad headings share characteristics.  For examples, we rarely saw a “specialist” at  a meeting on cost containment, but other types of medical personnel including primary care, psychiatrists, behavior medicine, were well represented because they, as a group, lower reimbursement and higher volume than specialists.  Grouping does not mean all values are inherited  — thus the need for understanding power relationships and attributes.

3) Understand the power relationships

Some actors have more power and are core to the discussion.  Insurers and providers have a closer affinity for example, while consumers, including employees,  business and local government entities tend to have less to no power in these relationships.  Hospitals and specialists have more power than primary care and behavioral medicine.  Understanding these internecine wars within health care is a key analysis for understanding core relationships and who is outlying.  The health care debate is in part about how to give outliers more power and equity in the health care process. The most outlying of all voices is patients and consumers.  Theoretically,  in new models of health care, their voice is supposed to be represented by larger purchasing pools who can negotiate for better service at less cost.

4) Identify  the key cost drivers —  Isolate the attributes 

The hardest part of this work is to isolate the variables/attributes  or cost drivers, and understand how each group contributes to improving these practices.  These are topics that should be of mutual concern but that are  not universally understood and standardized.  Examples of cost drivers included:

  • Use of and dissemination of best practices (end-of-life care, chronic diseases)
  • Use of Technology
  • Number  and Variety of Insurance Plans
  • Cost of drugs
  • Reimbursement rates
  • Risk Management (use of defensive medicine, malpractice, high-risk pools)

Each of these attributes needed to be further understood from perspective of the key players to understand how it contributes to cost.  For example, Massachusetts has an excellent universal health care law, where consumers can choose from about 18 different plans over the Connector, but in addition, there are additional public, private and individual plans resulting in over 16,000 different plans.   Some cost containment could be achieved by having a “shared minimal contract” that is at a high standard of care, and captures essence of basic wellness.  To do this, the players and consumers need to find the common language for describing conditions and coverage.

5) Capture the AS IS Definitions.

Since these conditions and coverage are not standardized,  it is useful to understand what the current status is.   Understanding AS IS definitions help to capture the many disconnects between group. For example, while consumers argue about cost of deductibles, insurance companies might spend more money in order to reduce high cost of hospitalization.  Result is like a balloon filled with water — one end gets leaner, while more pressure is put on another end of the balloon — the consumer.    Capturing the cacophony, instead of the symphony, turned out to be the most valuable part of the work. We discovered we did not have to reach common understanding, which meant trying to capture the current status and its impacts.

6) Read background content

In addition to understand the “cast and drivers”  it is also important to read studies and literature to keep a broad and balance perspective. Being in rooms with charming and knowledgeable power players can be quite intoxicating, but to keep it honest, we needed to keep reading and we needed to ask honest questions about what was the advantage for each player in their advocacy for a certain program.   Spending a few hours each week on literature reviews, books, articles, podcasts on general health care was very important to building our group and individual knowledge base and developing our facility in the terminology of health care economics.  We used reading to define comparative health care models in other countries (Taiwan, Switzerland, Japan, Canada, Germany, UK, France, and US) and to understand multiple models of healthcare delivery.

7) Capture concepts in simple diagrams

Even within our small, random  data collection group, there were divisions in understanding can be quite diverse.  Using simple diagrams to capture concepts  turned out to be powerful shared way to come to common understanding.  Bubble mapping, graphing, hierarchical diagrams, any visual graph was useful to clarify information.

8)  If any term is hard to explain with a simple sentence, it probably deserves a taxonomy

“Cost containment”  is not trivial,  but it is also important to understand. And it is almost  impossible to explain without learning something about healthcare system.   It is worthy of the time and effort to create a taxonomy to define the information space or information void, and a void is filled by misunderstanding or misinformation.

Developing a consumer-focussed taxonomy for navigating health care  turns out to be valuable work, but it is hard to sustain without a dedicated team with and sustained funding.  A consumer-focused taxonomy would help  navigate the health care debate, can be used across all actors, including   insurers, providers, governmental entities  and consumers who want to share information with a confused but curious public.

~ Marlene Rockmore

2010 in review

The stats helper monkeys at WordPress.com mulled over how this blog did in 2010, and here’s a high level summary of its overall blog health:

Healthy blog!

The Blog-Health-o-Meter™ reads This blog is on fire!.

Crunchy numbers

Featured image

A Boeing 747-400 passenger jet can hold 416 passengers. This blog was viewed about 8,400 times in 2010. That’s about 20 full 747s.

In 2010, there were 14 new posts, growing the total archive of this blog to 27 posts. There were 17 pictures uploaded, taking up a total of 2mb. That’s about a picture per month.

The busiest day of the year was August 19th with 255 views. The most popular post that day was Skills of a Classy Taxonomist.

Where did they come from?

The top referring sites in 2010 were taxonomy2watch.blogspot.com, taxonomystrategies.com, en.wordpress.com, hedden-information.com, and google.com.

Some visitors came searching, mostly for taxonomy blog, building ontology blog problems, thetaxonomyblog.wordpress.com, taxonomy blogs, and topbraid composer.

Attractions in 2010

These are the posts and pages that got the most views in 2010.

1

Skills of a Classy Taxonomist August 2010
1 comment

2

Taxonomies and Modeling January 2010
1 comment

3

Taxonomy Accordion in Drupal October 2010

4

Is GoodRelations a Game Changer? June 2010
1 comment

5

Facets=Classes=Sets September 2010

External User-facing Search and Taxonomies

Search has become faster, cheaper and more intelligent since the days of inverted single word engines so why not just use a search engine.  Why bother with taxonomy? Let’s briefly revisit what search is suppose to do. A search engine needs to make a pretty good guess of what a user wants to find – an unspoken intent which is expressed in staccato keywords  and then search needs to  match the user query with some content (documents, data, digital objects, people with expertise, adserving, product information), and take some action such as  read, buy, forward, share, comment, browse…..Sometimes the match is exact, sometimes the query terms are a partial match and sometimes there is no match.

In other words, search is not a perfect art.  The units of this equation are not just search  engine.  It is also the quality of the content and the query.   Good search needs good content,  no matter how great the technology.

Someone responsible for search implementation  has limited control over two of the key ingredients of search – the technology and the content.  This is why taxonomy plays a role – it can help describe concepts not in the content or in the metadata about the content.  (metadata is particularly useful for digitizing non-digital objects).   Taxonomy is not always necessary –  If you can write custom content with very precise vocabulary using Search Engine Optimization (SEO) techniques might not need a taxonomy. But documents cannot be altered, such as emails or reports, where it would be a significant protocol violation, even illegal.  When is search not enough?

1)  Developing effective measures to assess when search is not enough – the 80/20 rule

As part of the some of the early work in faceted taxonomies I did, I spent some time at MIT working on a research project that compared results when we queried a system that was based on a search engine technology alone,  and when we queried one where the query could be enhanced by adding taxonomy terms. For this experiment, we had the advantage of using a system, that was the brainchild of Wendi Pohs,  in which we had 2 search engines  using the same technology processing similar documents that were made available to a user interface which had a simple search box like Google. One engine processed news feeds .  These feeds were added quickly with no intervention—directly loaded into a search engine.   What our research found was that search engines without a taxonomy, left unattended, flatlined. The recall never improved over 75-80%.   Lee Romero, who is a keen observer of search, has recently done an excellent blog post observing this same flatline phenomenon.

What to do when you want to do better than 80% and move the flatline

In the same experiment, we created second engine, using the same software, had a taxonomy function where we inserted taxonomy terms into the index. These terms were selected from query logs and analytic reports-  they were unmatched terms, misspellings,  abbreviations.   There was an added cost to add taxonomy terms, but there was no impact on speed or performance of search since search technology used the same engine.

The taxonomy  was divided into classes such as product, company, or subject. Each term was connected to another term by using user-defined cross- connections (associative terms) which was smart enough to infer other relevant terms.  At least one of these terms  in the linked sets had to be tagged by an indexer.  So,  if a product was tagged, then we could infer that the product was <made_by> a company, thus speeding the tagging process.  Taggers could override the automated suggestions, and/or add new rules, by the way.  This way we could ensure exhaustive indexing at a low cost and effort.  The taxonomy-controlled section  paid back this effort.  A search on this section  would recall content that matched user-query terms about 90% of the time.  The taxonomy-controlled part of the database  could be improved.  We also worked hard to acquire content- good content- in many formats that would improve the quality of the database and thus what goes into an engine.

By using reports, tools and measurements, we were able to proactively add equivalents and monitor emerging terms.    Dips in performance triggered action to understand what was changing in the user’s world – was it query terms, a search for emerging content, or other unmet needs.

Errors were due to 1) missing content 2) wrong application 3)new terms or spelling errors that could be quickly added to taxonomy and 4) new and emerging trends that users were identifying that had not yet been captured in the taxonomy – all issues that could be identified and corrected.For example,  in the recent flu season,  search engines would eventually learn that  H1N1 was the preferred term to Swine Flu,  but in some cases, it was much easier for a trained taxonomy editor to surgically make this connection (especially in a fast moving news and business cycle). In a search engine only scenario, these errors are not always identifiable not actionable.

Set realistic goals and explanations for what taxonomy can do

ROI discussion often mean conversations that start or end with “Taxonomy can increase sales by improving conversion”  or “lower costs.”   Here are a few reasons that might be more honest and even compellng

Help with Ambiguity and add Precision —  Use Faceted Navigation: Search engines have a hard time differentiating about very key concepts and terms.    I remember in the early days when a term like “ASK” would bring a search engine to its knees because it couldn’t tell the difference between the name of  system command or a computer company.   By sorting terms into facets, we could help differentiate and resolve ambiguity by navigating user to the right facet and by tagging more precisely.   A developer looking for information on Java applications shouldn’t be sent to Java the island.  Taxonomy can help keep users searching down paths that might lead to results that are useful.  That’s productivity.

Implement Universal Search: A taxonomy can be implemented independently from the content, which means it can be used across content types- blogs, videos, email – creating a common set of concepts from which to generate user-centered search.   That’s efficiency and smart use of limited resources.  You need to have common metadata or rdf to take advantage of universal search, but there are standards such as Dublin Core that can help jumpstart that conversation.

Think Scalability and Reuse: Taxonomy can be used across applications, which means a central, faceted taxonomy, can be reused by other applications.  The best practice however is to create smaller taxonomies that are divided into homogenous facets.   To design monolithic spaghetti-like taxonomies will, in the end, create more work, bad inferences, and sour you on the whole project.  Reuse and scalability avoid redundant efforts.  Cost savings.

Use Taxonomies to Manage Change: Since taxonomy is independent of the content, you can change the concepts in the taxonomy without impacting the content.   Taxonomies are NOT static. For example,  many organizations need to change organizational names.  These names can be subsumed in a taxonomy without impacting the existing content. It’s safer and more secure way to handle change.

Create a technical and cost plan to integrate taxonomy while maintaining speed and performance, and not adding to overhead costs.

Implementing taxonomy within search can be done at various price points —  a solution like Vivismo  is not within every budget but there are other options low cost  and effective alternatives  I’ve found include:   Here are some technical considerations in adding taxonomy.

  • You don’t need a high end faceted navigation tool to get benefits of faceted navigation. Faceted navigation allows a user to narrow or broaden or expand query at time of search. This can be done in many CMS systems including  Drupal.   WordPress, which is what I use for this blog, has a taxonomy module, allows multiple authors
  • Add custom fields or metadata  for tagging that could be loaded into the search engine to improve search (as SOLR does)
  • If you have the budget and requirement for high-thoughput as in  auto-classification and text analytics, as in nStein, Teragram or Vivismo, then taxonomy is  still very useful to improve precision of results and making collections within document sets.

The bottom line is that whether you use search engine, you should be confident that 80% of the time, the user will get what they want. If you need to find ways to improve the user experience, taxonomy is one highly viable, low-cost and effective option.  Taxonomy might be worth looking as a way to give a  insert a pacemaker into the heart of  a search engine that seems to have flatlined.

Once you have a backbone with classy taxonomies and metadata, you can then proceed to the creative activity of beautiful designs of navigation paths for your end users.  But keep your eye  For more on search and taxonomies, see also my prior book review of Peter Morville’s  Search Patterns.

~ Marlene Rockmore

Enhanced by Zemanta

Taxonomy Accordion in Drupal

The open source community at Drupal is  quickly catching up on how to use its taxonomy module.  The latest code module  creates a Taxonomy Accordion— aka faceted navigation.  What Taxonomy Accordion shares with faceted navigation is good.  A taxonomy accordion lets a user know at a glance what a website is about, and how to find information, and also what won’t be found.

A Taxonomy Accordion does more than a faceted navigation (plus Taxonomy Accordion is a great name):

  • Using color and shade, you can graduate the color display so parent terms have one shade and the children have another shade
  • Hierarchies close and expand hierarchies much like a venetian blind or elegant fan
  • Has modular code that can be integrated as a part of Drupal Taxonomy module

But, as with other open source,  there is a requirement to plan and invest in the work that goes on  “under the covers.” Here are some of the dirty little secrets – the “work” that tune a  taxonomy accordion or any faceted navigation:

  • Pay attention user-centered design and validation: The fundamental choices of categories has to make sense to users.  Even if you make up an initial set of categories, use a validation process to ensure that the taxonomy makes sense to users.  Validation is a two-step process.  Part one is an open process, sometimes called an open card sort, where terms are collected from users, content, and sources, and then organized into a draft of the taxonomy. The second part of the process is closed, users are asked to find content using the navigation scheme to test whether the classes and hierarchy are useful or need to be refined.  More importantly, by using a validation process and making it part of the plan, you become more user aware and attentive to user needs.
  • Use this opportunity to improve tagging and metadata management: Content has to be tagged with terms from the taxonomy so you need a back-end business process and metadata eg database design to store the tags and pointers to associated content.  This backend metadata record can also help in creating an optimized your search engine especially an engine that supports faceted search such as SOLR.
  • Understand restrictions and attributes: Some facets are not larger super-classes, but are attributes (sometimes also called “slot facets” or “datatype properties”)  that are used to restrict  or narrow search.  These restrictions in an ecommerce application might be facets size as “Measurement, Color,  Availability.”    In a content or digital asset application, the restrictions might be “Content Type, Publication Date, Format.”    By grouping these terms, it helps to reduce permutations and complexity in interface design and in writing queries.
  • Foster distributed environments  and local control: This is hard to understand, but the faceted design is not authoritarian.  If the faceted design is based on user needs and a validation process, than it is likely to reflect shared values.   It still allows local organizations to develop and manage their information; it makes it easier to map that information to process and workflow.   For example,  a music company might have all its artists map  their music to shared facets such as genre.   A local social service agency might be asked to map its services to a common public service metadata scheme.  Allowing local agencies to update their metadata,  tag content, and suggest terms for taxonomy is a great way to identify user needs and changing requirements.
  • Change and Improve: Once categories are established, a change management process needs to be in place to monitor user queries to make sure that the categories and terms remain current and useful.   Setting baseline thresholds —  vital statistics —  (to be discussed in next month’s post) —  can help in recognizing changing markets, technologies or user needs.

An open source faceted navigation should allow implementation at a lower cost. Even with an Open Source solution like Drupal, which offers flexible options,  it  pays to invest some attention to understanding taxonomy business process because it will lead to more efficient implementation and efficient backend process.

The Return of Investment (ROI) justification  include not only user interface improvements (reduced clicks to right content) but also programming cost efficiencies such as  more simplicity in writing backend queries – great ROI justifications for the work. Validation work segues with the work of marketing and customer relations, so consider integrating taxonomy validation and governance into existing work processes.   Some organizations roll taxonomy management into a knowledge management function which oversees the entire process from organizing knowledge categories, managing content acquisition, and monitor.

Drupal’s development community has some very sophisticated features that will be available in the upcoming years including ways to visualize and cluster linked data, using RDFa.   Developing faceted navigation and taxonomies is a great way to get ready for an exciting future of visually interesting interfaces that better help users find and share information in complex organizations.

Don’t let the simplicity of the Taxonomy Accordion fool you.   Use the accordion as  an opportunity to understand user needs, how users look for information, and making underlying production, tagging and databases more efficient and focused  on user needs and high quality information.

~ Marlene Rockmore

Enhanced by Zemanta

Facets=Classes=Sets

Rdf-graph3
Image via Wikipedia

I just returned from an intense training in semantic web technologies through Top Quadrant and I learned much more about what goes on “under the covers.” The course explained more about how semantic technologies can generate machine to machine applications. One important learning was that facets are similar to classes which is similar to the mathematical idea of a set and discusses why taxonomists and programmers need to think more in terms of classes, facets and sets as similar ideas.

Using semantic tools requires building a conceptual model — which is collection of classes.  To build useful models that are semantically-enable requires learning the basic semantic toolkit:

  • RDF (relational description framework). In RDF, one creates classes, and designs relations between individual members of a class and between classes. RDF comes in two main flavors:  RDFa which is for web-based applications  and RDFs which can be used to generate the ontology (concept mapping) as a schema to represent the underlying data.  RDF is used to create inverted graphs that can be converted to triples. Using RDF, one can read in a data store such as a spreadsheet and quickly generate a starter taxonomy (which still needs to be validated with use case scenarios )
  • SKOS (simple knowledge organization system) converts traditional taxonomies into rdf format. SKOS handles basic thesaurus-type relations such as broader/narrower concepts, alternative labels and related concepts. In SKOS the related concept would have its own unique resource identifier. SKOS can only describe a concept with broader, narrower and alternative labels and preferred labels, and cannot associate a concept with an OWL class.
  • SPARQL is a specialized query language, designed to query triple stores A semantically-enabled applications is one that is converted can be converted into an RDF graph, which can then be visually displayed as a graph and queried using SparQL.
  • OWL (web ontology language) is the underlying language for describing models. OWL is required to handle more complexity such as restrictions, cardinality, and inferencing.

Most everything conceptually in RDF, SKOS, and the underlying programming language OWL, once you get under the covers, will familiar to taxonomists. Some details can confuse you, but don’t let the lack of underlying naming conventions deter you. For example, a class in RDF is called an Owl:Thing. If a class is defined in RDF Schema language can be called an RDFS:Class. Oh well, confusing, but don’t let that deter you from appreciating the power of this approach. A thing is still a class, which is similar to a facet.

Here are some examples of how OWL and taxonomies are similar. The bolded print is the OWL property.

SubClassOf defines narrow term in a set

Inverse of creates reciprocal relations

Transitivity allows navigation of a hierarchy so that if A = B, B=C, the A=C. A SPARQL query that can chain through a hierarchy can potentially consist of 2 lines.

Restrictions are similar to slot facets or attributes which are o properties that limit the set

Here are some reasons to utilize classes in semantic technologies as a best practice.  Without implementing classes and modeling, these outcomes would be hard to achieve:

Form follows function: Instead of designing big monolithic hierarchical taxonomies, thinking in terms of classes or facets, which are groupings of individual members in a set. These smaller, faster sets (fasets, perhaps) will be easier to export, import, edit and share. Perhaps facets should be called fast sets or fasets! Plus the facets (classes) can become fields in a web form. The possibilities for reuse and design opens many options.

Scalability and Reuse: Since concepts and the associated classes are independent of data and content, the concepts and classes can be changed, such as changing an organization name, renaming key terms, or adapting new ideas, without changing underlying queries and systems architecture. This is scalable.

Change Schema Without Changing Content: Developing conceptual mapping can be done independently and designed and changed in the RDF schema or OWL language without changing the underlying data. Precision: Because an individual concept can be easily manipulated as a member of a set, or multiple sets, the concept can have a more accurate definition. For example, take a term like “Chevy Chase.” By associating “Chevy Chase” with a class:Person one can distinguish Chevy Chase, the comedian, from Chevy Chase, Maryland as part of the class: Location. Furthermore, ideally each unique concept of Chevy Chase would have its own namespace or unique resource identifier (URI).

Precision: The ability to create a concept independent of the content without tightly coupling into a hierarchy, but allowing the concept to associate in a clear way with the appropriate facet or class and to get more precision. This same logic can be applied to more amorphous, squishy terms like “Compensation” or “Performance” or “Management” or “Quality” which can be deconstructed into more specific variants like “Executive Compensation” vs “Non-exempt Pay and Benefits” RDFs can be used to link to more appropriate term with a unique URI

Facilitate Linked Data: If taxonomies and data can be shared, it is faster to build serious applications that can solve real and acute problems. In our class, we built applications that mapped free wifi hot spots were next to swimming pools and taquerias in geographic location, but we also did a serious social policy application where we mapped cities in the United States that had increases in complaints about housing due to sexual orientation, national origin, race and other discriminatory practices, taking data from multiple, reputable sources and applying a common conceptual model.

There are some new challenges for taxonomists especially in understanding the importance of inferencing. Developers who work with OWL is that many inferencing errors can be traced back to bad, messy taxonomies where there are too many broad terms — in other words, avoid complex polyhierarchies.

To create taxonomies that are ready for the semantic future, the better practice is to how to arrange concepts into facets (which can be equated with classes or sets and avoiding complex polyhierarchies (a concept with too many parents). This will allow taxonomies to play well with applications such as user interface design and machine readable applications. The first step is to stop thinking about taxonomies as a monolithic hierarchy, but rather to look at taxonomies as a collection of classes (or facets), where a class is a set with individual members. If models and taxonomies can be easily built and used to connect across data worksheets resolving issues, applications based on linked data can be quickly built.

To try  semantic tools such as SKOS editors, download a trial copy of Top Braid Composer Free Edition.

Enhanced by Zemanta~Marlene Rockmore

Skills of a Classy Taxonomist

At SemTech in June 2010,  several speakers including Professor Deb McGuiness drew a very clear line was drawn between what a taxonomist does and what an ontologist does.  Taxonomists build hierarchies, and ontologists determine classes or categories.   In other words, ontologies are neat and unambiguous, and taxonomies are a bit messy.

Defining classes or ontology work  typically precedes building the taxonomy.  Defining the classes is like writing a specification for the taxonomy; in fact defining classes is the same as defining facets.   The goal of a taxonomist and ontologies should be to define a specific, unambiguous description of a term that helps manage how we find and organize content so the pathways are clear and specific; adding an ontology ensures that the term is placed in the most specific categories to help ensure clarity and lack of ambiguity. I would argue that no taxonomy is useful unless it is faceted – that is, has been divided into classes. Taxonomies work best when they share homogenous properties, and when they are smaller and focused.

By using class analysis, or facet analysis,  several problems are solved:

1)       Clarify specific terms by situation or functions: If I am interested in Java as a programming language, I want to see material related to Java as software, not as slang for coffee or  an island in the South Pacific.  If I am looking for “drill bits,”  it might be important to understand if the drill bits are for my home electric screwdriver  or for an oil rig.   Classes capture these distinctions, and help to create precise specific tagging and information retrieval.

2)       Ease longterm  maintenance issues: Christine Connors points to a simple but common example where taxonomies are built where people’s names are included as narrow terms under the role such as “Hillary Clinton” is “Secretary of State”  or “Charles Windsor” is the “Prince of Wales.” The problem is that when people filling these roles change, there is a maintenance headache.   A classy taxonomy recognizes that there is a separate class for <people> as an entity, as distinguished from <role>.  <People> and <Role>  can be connected by a predicate such as <isA>.  These distinctions are necessary for fast-changing information (such as who is dating whom in an entertainment application) or (who owns whom in a business application).

Abstraction <person> <has> <role>

Instance: Hillary Clinton <is>  Secretary of State

3)    Facilitate sharing  and importing taxonomies: Having taxonomies that are specified by a class description means the taxonomy will be more homogenous, have shared properties, and be more focused.  This will make it easier to import with less cleanup and review.  It will facilitate the use of SKOS for example. Messy taxonomies are harder to merge.

Anyone working with semantic technologies will tell you that most problems in inference happen when hierarchies in source taxonomies create odd associations by inserting a narrow or broad term. A taxonomist needs to be attentive to inferences in order to prevent false statements.   Professor Deb McGuiness calls this issue “truth maintenance.”

To keep these categories clear and distinct, ontologists rely on building a conceptual model or a picture of the domain (see earlier post on Taxonomies and modeling.)   Modeling strategies involve skills of most taxonomists.  Most taxonomists have been taught how to capture vocabulary and how to identify facets.  Check out the blog post Taxonomies and Modelling for more information.

Elaine Kendall  of Sandpiper Software, which is a concept-modelling tool.  suggested that “one could build an ontology in 2 hours.”   With new generation of tools that can create RDF/OWL from data and content,  this statement might be true.

    With good modelling tools that automatically generate RDF/OWL,such as TopQuadrant,  taxonomists might  be able to slide into the needed role as ontologists.  Taxonomists need to understand  some basic concepts in RDF/OWL to extend their skills such as what is a class, what is a property and what is a slot facet, what is class inheritance, what is meant by reciprocation and inverse properties and how to write a SPARQL query.  But more importantly,  a classy taxonomist can help become a facilitator to help build bridges between user and development communities and  to help diagnose and prevent technical problems.

    A taxonomist who is trained in ontologies  should bring the following skills:

    • Ability to create processes to identify the requirements for each class,
    • Develop  metrics to assess good results
    • Identify what vocabularies are needed and use skills to evaluate existing vocabularies, import and adapt these vocabularies to the current needs
    • Ensure the integrity and focus of vocabularies particularly when sourced from an outside vendor,
    • Develop processes to keep vocabularies current, and understand how to use metrics to “measure and improve” any vocabularies.
    • To be part of the development team to help identify if a source vocabulary might be part of false inference.

    The taxonomist works with different user communities as well as developers and helps bridge the gap between what users and experts know and what is needed to build a useful application.   A classy taxonomist has a well-rounded set of skills that can work with development teams and user organizations to build intelligent systems.

    Enhanced by Zemanta

    Is GoodRelations a Game Changer?

    One  ontology  worth watching might be GoodRelations, which is being implemented by   Best Buy.      The central component of this architecture was an ontology called GoodRelations developed by Martin Hepp, who presented at SemTech in San Francisco last week via Skype from Munich, Germany.    GoodRelations is a retail ontology which uses RDFa from XHTML webpages to populate global ontology.   But why would a major retailer use this  architecture?

    Best Buy discovered that it was impossible to be the top dog  in search engine optimization (SEO)  in every search category for every product.  To do this, they needed to have finely tuned individual pages.  They also wanted to provide immediate content about “open box” – returned items at local stores.    looking for a solution that could add more granularity, precision and localization, but still enable global search and have metadata that was controlled by the enterprise.

    GoodRelations is a retail ontology, which offers facets or classes, metadata descriptions and attributes  that are common in the retail industry.   It is expressed in RDFa which is a flavor of RDF that works in web browsers.  Yahoo Search Monkey supports RDFa,  Facebook directed graphs will support RDF.  Google snippets also support RDFa.

    Because there is common metadata, it is easy for employees or customers (who are called “user agents” in the semantic world) to tag content via templates which populate the RDF.  RDFa can be maintained in a corporate or enterprise repository which can be configured as needed for distribution in the enterprise.

    In the GoodRelations RDF, the additional metadata might include price, color, dimensions, model and other attributes that interest consumers.  GoodRelations is an ontology that can be shared over any retail enterprise in any country.  The cost per webpage, once implemented, is minimal because “user agents” are familiar with how to complete forms over the web. The RDFa can then be appended to an HTML page written in XHTML or HTML5.  These HTML code for adding the specific metadata attributes is about 30-50 lines.  This creates HTML that has more granularity than a typical <keyword> metatag. The high costs are in the metadata management.

    Adding RDFa as metadata to a webpage should be easy to adopt because it works in the current web paradigm.   Google is offering RDFa markup language that can be appended to a webpage called Google Rich Snippets.  Snippets is competing with the another format called Microformat.  The problem is that every domain needs a shared set of s metadata attributes to enable search across smaller domains.   Google is rolling out examples of RDFa for restaurants, currently only has 2500 markup pages. To see an example of snippets,  try a search on Google for “Baked Ziti.”  Drupal 7 also offers RDF, and has been implemented in http://www.whitehouse.gov, as part of the Obama Administration transparency initiative.

    Why does this interest me as a  classy taxonomist (future ontologist)?  Clearly, this technology has evolved to a point of adoption, but further adoption depends on political and organizational work to get other applications to take the risk to try RDFa.    RDFa depends on common adoption of similar metadata  This requires political and organization skills to define and manage common metadata knowledge models.  First, taxonomists understand vocabulary and metadata as a way to capture common knowledge and shared metadata.  Second, if this innovation becomes more widely adopted and gains traction,  there may be interest in building similar process in other applications in making any information that has to be shared.

    Further, if RDFa coupled with ontology and metadata management, makes data management and querying easier through SPARQL,  then more attention can be paid to the political and organizational work of working with local agencies to contribute good data and content.

    There is a long way to go to make this vision a reality.. browsers have to adopt RDFa, applications have to prove the viability and ontologies in other domains need to be created.  But in the long run, this might be a more democratic way to extend information access on the web.

    However,  to move toward this vision, faceted navigation and defining common metadata and taxonomies is  good intermediate step.  By creating faceted taxonomies and browsing, and collecting data, user communities are moving towards understanding what search fields, common language, and unambiguous terms that matter to their users.  A little semantics goes a long way.

    ~Marlene Rockmore

    Taxo-ology

    This week, I am at the 201o Semantic Technology conference where there are technologists who have built ontologies.   So this seems like the location to find out  what exactly is the difference between an ontology and a taxonomy and what skills will matter.

    In the ontology world, a taxonomy strictly speaking, is a hierarchical arrangement of terms.   Taxonomists populate term nodes and decide what the form of the term is, any variants, equivalents, and semi-equivalents and create hierarchies.   Ontologists do the heavy lifting — they decide what the classes will be and define the links and generate RDF and OWL.

    But there is a bright spot in this rather dull picture of  taxonomy work.   The most progressive and insightful taxonomists insist on sorting terms into facets or classes. These facets are derived from an analysis of user needs, content, and domain knowledge.   The core of an ontologists work is   also to define classes or facets and links between classes.   These links between facets can then be inherited or asserted between classes.    A taxonomist who hasn’t thought about classes and design will create a taxonomy that looks like spaghetti, and an ontologist who lacks that skill can create an ontology that makes bad inferences and assertions.

    The bottom line is that there is overlap between taxonomy and ontology — so I would like to suggest a term to describe this synergy:  Taxo-ology.    By thinking in terms of Taxo-ology,  we can begin to overlap and have synergy between taxonomists and ontologists:

    • Facets and classes:  Both taxonomists and ontologists need to create classes in which to classify terms.
    • Discipline in Creating Homogenous Hierarchies:  Hierarchies, ideally, should have homogenous properties. For example, Secretary of State is a constitutional office of the United States;  Hillary Rodham Clinton is filling that role, but it is one of many roles she has had.  Christine Connors,  a semantic web guru, uses “Prince of Wales” as her example. That role is there whether or not Charles is Prince.  It is part of the institution of English Monarchy.   Even for the practical reason of longterm maintenance,  these entities need to be in their own class (facet) and linked.
    • Greater Use of Linkages using Associative Relationships: Once terms are sorted homogenous buckets, associative relationships (sometimes with semantic labels for the relationship) can be used to link between classes or  term nodes within a class
    • Better Skill Sets:   Someone who is a Taxo-ologist knows how to use rich ontology tools, like TopBraid, understands OWL and XML output but can also adapt to other tools and content management software such as auto-categorization.  A taxo-ologist can apply the best practices of building classes/facets, homogenous hierarchies, and developing associative relationships
    • Better models for paying Taxo-ologists:  Taxonomists sometimes get paid by the number of terms built-out, but in the world of taxo-ology, compensation needs to be based on results — sometime strategic (is our organization collecting, sharing and exchanging the information  changing market, technical and economic conditions) to tactical need to the right SOP at the right time.  Search, for example, is a great example of how less is more, when good tax0-ologists can make smaller, sleeker taxologies  that can be uses to auto-tag concepts across facets.  Or they create smaller taxonomies that have higher matches to user queries because of use of variants.

    Taxologists seems like a good word to help bridge the gap between these disciplines, but there needs to be a discussion and synergy between the taxo community and the ontology world.     Taxonomists to apply more discipline to how they do their work and embrace the autocategorization and semantic tools that make it easier to process content.    The semantic world can save some time  in its development process by learning from the practical experience taxonomists have built by being in the enterprise, libraries, doing card sorts, understanding user experience, analyzing content, and merging all that with domain knowledge.

    My goal this week is to find out more about what will help semantic technologies gain more traction, what are the practical, killer applications, and what are the future skills.    Be sure to stop by Christine’s booth to find out more about how ontologists can help with strategic information management and technical integration with semantic web technologies.

    ~ Marlene Rockmore (blogging from SemTech San Francisco 2010)