What next, taxonomy?

Can anyone learn what is happening in a field by following a conference on Twitter? Tha’s how I decided to follow Taxonomy Bootcamp 2012.   I missed networking with a new generation of confident, well-trained taxonomists, but nevertheless, I will attempt to identify the themes and challenges that are facing taxonomists.  Please feel free to add to the list or dispute!

  1. Centralized models don’t scale; think federated, allow local variation.

Taxonomists have jobs because organizations value managing its resources with a common vocabulary, but how the architecture and governance of the vocabulary maps to the internal structure of an organization  is harder to understand.  Speakers urged attendees to allow distributed models that accommodate local variations and utilize local project vocabularies.  Using term sets (facets) help ease taxonomy managemement. As an example,  one tweet posted about the talk given by Gary Carlson and Pam Green, relayed that Microsoft had 23 term sets for its intranet – the most complex was products, and the simplest was confidentiality.  This structure assists in setting up access control to owenrship  groups manage these vocabularies at the group and/or term level.

2. Does social media and information sharing have a role in taxonomy and how does it square with governance, security, control, and confidentiality

This topic yielded some buzz and offers an interesting dialetic between the issue of information access and security. @syndetic tweeted that “HR sees social media as a time waster, and IT sees it as a security probem.”     So let’s start with exploring to see why  social media matters to taxonomists? The main point is that information is ambigouous and confusing, and that social media is necessary to help generate, identify and clarify relevant, lively, topical content now that can be curated later (as opposed to traditional models of curate then share).  AprilMunden  summed the argument for social media nicely in a tweet: “using folksonomy (user generated taxonomy) should defInitely be a part of your initial design AND ongoing governance plan.”   Seth Earley, always with  a witty insight,  said “Fast changing, requires expertise, ambiguous concepts …business can own; slow to change, more security,unambiguous,  let IT own that”     Maybe that’s a strategy for starting the conversation about the business value of tagging and social media.

3.  Why non-expert taxonomist needs to build a taxonomy.

“Experts swim in the deep end of the content.” Tom Reamy made this statement which set off a flurry of sympathetic tweets.  Tom clarified that experts know their content and process but can get mired in details and in their point-of view   Tom’s statement is provocative because it clarifies the role of the taxonomist which is

  • to be apply social  listening skills and analytic methods to categorize diverse needs of a wide range of users
  •  Build a well-structured faceted taxonomy that reflects needs of different constituencies and fits with the governance structure of an organization (see point 1)
  •  Validate and test the taxonomy
  • Assist with the support of application integration
  • Establish ongoing governance processes including use of social media and text analytic to keep the taxonomy relevant

A taxonomy needs to reflect multiple user needs which means allowing non-experts to explore topics before they go in for a deep dive.  Taxonomists need to work with experts to gain their support by engaging and in validating the taxonomy.

3. What aboutTaxonomy, the User Interface, and Big Data and Mobile apps

Taxonomists sit at an intersection between  data/digital assets/content and user interfaces, but it’s not clear how taxonomists apply their skills.  They are not graphic designers or UX experts, and not quite database experts.  A few sessions mentioned data visualization, and other graphic imagery to explore content and data such as   mashups of datasets,  grids, wheels, graphic organizers, or maps. This is an area where taxonomists, who are not as visually oriented, need to rethink their approach, to start to think of themselves as information artists.  Taxonomists can be advocates for adopting in improved techniques including standards that organize taxonomy/metadata framework  and can also advocate for tools that make sharing between across applications, organization and platforms more efficient, which brings us to point 4

4. Taxonomy tools must make it easier to import and export vocabularies

Taxonomists know that their vocabularies need to play well with all the applications above as well as other needs such as the goal of providing cross-organization information access Sharepoint, XML, relational databases, legacy products.   Taxonomies work in part because they are agnostic, because they can work in with any number of technologies, because concepts and metadata are separate from the content.  To play well with others, taxonomy tools  need to support import and export of vocabularies into different standards including SKOS or XML,  As KarlaTR tweeted ”If you put something in a taxonomy, you’ve got to get it out.”     One option is to try tools that are marketed at Taxonomy Bootcamp such as TopQuadrant EVN and Smartlogic which have been ported from the ontology world are now alongside Synaptica and Information Access as part of the tool evaluation process.

So what next, taxonomy?     What is nice to hear is that more taxonomists are surviving because their organizations understand their core roles. What’s the emerging topics  and challenges —  how to distribute and decentralize (localize)  while having authority and control, how to collect new content on emerging, current topics, visualization, how to be more agile, how to fit in with new technologies like social media, mobile, and big data.  Phew!  That’s a challenge.  Taxonomists have a chance to build relationships not only between terms, but with stakeholders in on a the way to a compelling, visualized, multidimensional content strategy.  Good luck.

Skills of a Classy Taxonomist

At SemTech in June 2010,  several speakers including Professor Deb McGuiness drew a very clear line was drawn between what a taxonomist does and what an ontologist does.  Taxonomists build hierarchies, and ontologists determine classes or categories.   In other words, ontologies are neat and unambiguous, and taxonomies are a bit messy.

Defining classes or ontology work  typically precedes building the taxonomy.  Defining the classes is like writing a specification for the taxonomy; in fact defining classes is the same as defining facets.   The goal of a taxonomist and ontologies should be to define a specific, unambiguous description of a term that helps manage how we find and organize content so the pathways are clear and specific; adding an ontology ensures that the term is placed in the most specific categories to help ensure clarity and lack of ambiguity. I would argue that no taxonomy is useful unless it is faceted – that is, has been divided into classes. Taxonomies work best when they share homogenous properties, and when they are smaller and focused.

By using class analysis, or facet analysis,  several problems are solved:

1)       Clarify specific terms by situation or functions: If I am interested in Java as a programming language, I want to see material related to Java as software, not as slang for coffee or  an island in the South Pacific.  If I am looking for “drill bits,”  it might be important to understand if the drill bits are for my home electric screwdriver  or for an oil rig.   Classes capture these distinctions, and help to create precise specific tagging and information retrieval.

2)       Ease longterm  maintenance issues: Christine Connors points to a simple but common example where taxonomies are built where people’s names are included as narrow terms under the role such as “Hillary Clinton” is “Secretary of State”  or “Charles Windsor” is the “Prince of Wales.” The problem is that when people filling these roles change, there is a maintenance headache.   A classy taxonomy recognizes that there is a separate class for <people> as an entity, as distinguished from <role>.  <People> and <Role>  can be connected by a predicate such as <isA>.  These distinctions are necessary for fast-changing information (such as who is dating whom in an entertainment application) or (who owns whom in a business application).

Abstraction <person> <has> <role>

Instance: Hillary Clinton <is>  Secretary of State

3)    Facilitate sharing  and importing taxonomies: Having taxonomies that are specified by a class description means the taxonomy will be more homogenous, have shared properties, and be more focused.  This will make it easier to import with less cleanup and review.  It will facilitate the use of SKOS for example. Messy taxonomies are harder to merge.

Anyone working with semantic technologies will tell you that most problems in inference happen when hierarchies in source taxonomies create odd associations by inserting a narrow or broad term. A taxonomist needs to be attentive to inferences in order to prevent false statements.   Professor Deb McGuiness calls this issue “truth maintenance.”

To keep these categories clear and distinct, ontologists rely on building a conceptual model or a picture of the domain (see earlier post on Taxonomies and modeling.)   Modeling strategies involve skills of most taxonomists.  Most taxonomists have been taught how to capture vocabulary and how to identify facets.  Check out the blog post Taxonomies and Modelling for more information.

Elaine Kendall  of Sandpiper Software, which is a concept-modelling tool.  suggested that “one could build an ontology in 2 hours.”   With new generation of tools that can create RDF/OWL from data and content,  this statement might be true.

    With good modelling tools that automatically generate RDF/OWL,such as TopQuadrant,  taxonomists might  be able to slide into the needed role as ontologists.  Taxonomists need to understand  some basic concepts in RDF/OWL to extend their skills such as what is a class, what is a property and what is a slot facet, what is class inheritance, what is meant by reciprocation and inverse properties and how to write a SPARQL query.  But more importantly,  a classy taxonomist can help become a facilitator to help build bridges between user and development communities and  to help diagnose and prevent technical problems.

    A taxonomist who is trained in ontologies  should bring the following skills:

    • Ability to create processes to identify the requirements for each class,
    • Develop  metrics to assess good results
    • Identify what vocabularies are needed and use skills to evaluate existing vocabularies, import and adapt these vocabularies to the current needs
    • Ensure the integrity and focus of vocabularies particularly when sourced from an outside vendor,
    • Develop processes to keep vocabularies current, and understand how to use metrics to “measure and improve” any vocabularies.
    • To be part of the development team to help identify if a source vocabulary might be part of false inference.

    The taxonomist works with different user communities as well as developers and helps bridge the gap between what users and experts know and what is needed to build a useful application.   A classy taxonomist has a well-rounded set of skills that can work with development teams and user organizations to build intelligent systems.

    Enhanced by Zemanta

    Is GoodRelations a Game Changer?

    One  ontology  worth watching might be GoodRelations, which is being implemented by   Best Buy.      The central component of this architecture was an ontology called GoodRelations developed by Martin Hepp, who presented at SemTech in San Francisco last week via Skype from Munich, Germany.    GoodRelations is a retail ontology which uses RDFa from XHTML webpages to populate global ontology.   But why would a major retailer use this  architecture?

    Best Buy discovered that it was impossible to be the top dog  in search engine optimization (SEO)  in every search category for every product.  To do this, they needed to have finely tuned individual pages.  They also wanted to provide immediate content about “open box” – returned items at local stores.    looking for a solution that could add more granularity, precision and localization, but still enable global search and have metadata that was controlled by the enterprise.

    GoodRelations is a retail ontology, which offers facets or classes, metadata descriptions and attributes  that are common in the retail industry.   It is expressed in RDFa which is a flavor of RDF that works in web browsers.  Yahoo Search Monkey supports RDFa,  Facebook directed graphs will support RDF.  Google snippets also support RDFa.

    Because there is common metadata, it is easy for employees or customers (who are called “user agents” in the semantic world) to tag content via templates which populate the RDF.  RDFa can be maintained in a corporate or enterprise repository which can be configured as needed for distribution in the enterprise.

    In the GoodRelations RDF, the additional metadata might include price, color, dimensions, model and other attributes that interest consumers.  GoodRelations is an ontology that can be shared over any retail enterprise in any country.  The cost per webpage, once implemented, is minimal because “user agents” are familiar with how to complete forms over the web. The RDFa can then be appended to an HTML page written in XHTML or HTML5.  These HTML code for adding the specific metadata attributes is about 30-50 lines.  This creates HTML that has more granularity than a typical <keyword> metatag. The high costs are in the metadata management.

    Adding RDFa as metadata to a webpage should be easy to adopt because it works in the current web paradigm.   Google is offering RDFa markup language that can be appended to a webpage called Google Rich Snippets.  Snippets is competing with the another format called Microformat.  The problem is that every domain needs a shared set of s metadata attributes to enable search across smaller domains.   Google is rolling out examples of RDFa for restaurants, currently only has 2500 markup pages. To see an example of snippets,  try a search on Google for “Baked Ziti.”  Drupal 7 also offers RDF, and has been implemented in http://www.whitehouse.gov, as part of the Obama Administration transparency initiative.

    Why does this interest me as a  classy taxonomist (future ontologist)?  Clearly, this technology has evolved to a point of adoption, but further adoption depends on political and organizational work to get other applications to take the risk to try RDFa.    RDFa depends on common adoption of similar metadata  This requires political and organization skills to define and manage common metadata knowledge models.  First, taxonomists understand vocabulary and metadata as a way to capture common knowledge and shared metadata.  Second, if this innovation becomes more widely adopted and gains traction,  there may be interest in building similar process in other applications in making any information that has to be shared.

    Further, if RDFa coupled with ontology and metadata management, makes data management and querying easier through SPARQL,  then more attention can be paid to the political and organizational work of working with local agencies to contribute good data and content.

    There is a long way to go to make this vision a reality.. browsers have to adopt RDFa, applications have to prove the viability and ontologies in other domains need to be created.  But in the long run, this might be a more democratic way to extend information access on the web.

    However,  to move toward this vision, faceted navigation and defining common metadata and taxonomies is  good intermediate step.  By creating faceted taxonomies and browsing, and collecting data, user communities are moving towards understanding what search fields, common language, and unambiguous terms that matter to their users.  A little semantics goes a long way.

    ~Marlene Rockmore

    Are Taxonomies converging with Folksonomies?

    Carole Kaesuk Yoon, in an August 11, 2009 New York Times article, discussed how human groups survive by observing, understanding and classifying their natural world, creating local folk taxonomies that are as intrinsic to survival as water or food. Without the power to order and name life, a person simply does not know how to live in the world. Yoon states, “How to tell the carrot from the cat — which to grate and which to pet? They are utterly lost, anchorless in a strange and confusing world.” http://www.nytimes.com/2009/08/11/science/11naming.html?_r=1&8dpc, accessed August 11, 2009). The article included an interesting discussion of a research study where college students could decipher what a word meant in a Peruvian native language about 68% of the time because the naming in the folk taxonomy was so descriptive.

    At the start of 2009, CMPros pronounced taxonomy dead. This is a good moment to re-evaluate that audacious claim. If taxonomies undergird the survival of people in pristine environments, can they clarify meaning in a culture awash in technology, economics, social science, health, and medicine?

    For the last five years, Taxonomy Boot Camp, sponsored by Information Today as an extension to the last two days of Enterprise Search Summit West, provides a comprehensive program demonstrating the use of taxonomies to improve search, govern information, and improve communication. Taxonomy Boot Camp continues to pull together an interesting program of rising stars and established veterans.

    Far from being a post-mortem of taxonomies, this year’s conference program provides an opportunity for a conversation about their future in the context of new and putatively competing disciplines. The conference includes superstars from the realm of folksonomies and ontology. Taxonomy Boot Camp provides an opportunity to find out how some practitioners and organizations have tried to use and re-use legacy taxonomies to order information, while providing innovation in interfaces and processes.

    This year’s keynote speaker, Thomas Vander Wal, Principal, InfoCloud Solutions Inc, who coined the term folksonomies, opens the dialog with his keynote. Can taxonomies designed for enterprise business and social science organically grow to explain, clarify, modify and mesh with Web 2.0 social enterprise tools? In other words, can enterprise vocabularies become the folk taxonomies to help describe our modern world? Leslie Owens, of Forrester Research, presents the other keynote on the reuse and repurpose of taxonomies, which may highlight the value of reviving taxonomies in organizations and enterprises.

    Some of this year’s participants are engaged in some leading edge projects:

    Dean Allemang, developer of Top Quadrant, and author of SEMANTIC WEB FOR THE WORKING ONTOLOGIST, will lead a panel about moving beyond broad and narrow terms to semantic relationships. Co-panelists include staff from the Food and Agriculture Organization, World Bank, and Library of Congress. Metadata will be covered in several sessions, including one session on Dublin Core from Mike Crandall of the Ischool at the University of Washington with Marjorie Hlava of Access Innovations, and another discussion with Stephanie Lemieux of Earley and Associates on integration with Sharepoint.

    Annie Wang of Deloitte will share her perspective on using taxonomies for large, complex organizational integration.

    Christine Connors of TriviumRLG LLC and Jordan Frank of Traction Software will speak on Linked Data, Web 3.0, and Tagsonomies, and how taxonomies and ontologies can turn tag mush into useful concepts. Their talk will be followed by Stephanie Lemieus and Tom Reamy discussion of folksonomy and taxonomy. The hot topic of merging and rescuing existing taxonomies will also be discussed. Integration of existing taxonomies will be discussed by 4 veteran taxonomists including Heather Hedden, Carol Hert, Wendi Pohs; followed by a panel on rescuing and repurposing taxonomies including Lisa Dawn Colvin from Top Quadrant, Ron Daniels of Taxonomy Strategies, and Jeff Carr of Earley and Associates.

    Taxonomy validation will be presented by Joseph Busch of Taxonomy Strategies, who will describe how a taxonomy was validated over several days of exercises with key stakeholders at the Substance Abuse and Mental Health Services Administration (SAMHSA). Taxonomy and semantic modeling tools will also be on the agenda .

    The conference ends with a dialog about the future of taxonomies led by Wendi Pohs and Daniela Barbosa from DowJones. Several pre-conference workshops also provide learning opportunities for exploring topics with expert practitioners in more depth. The full conference program is available in HTML and PDF (http://www.taxonomybootcamp.com/2009/program.shtml)

    Opening a dialog about how the best practices in taxonomy management mesh with the innovations in folksonomy and ontology might help clarify our thinking in turbulent times. Any conference that brings together the taxonomy and semantic web communities provides an opportunity to create energy to move to new architectures, interfaces and tools. Taxonomy Boot Camp 2009 will be held from November 19-20, 2009 San Jose McEnery Convention Center – San Jose, CA. For more information and for $200 off the conference registration fee, visit http://tinyurl.com/l4npdv

    Reblog this post [with Zemanta]