What next, taxonomy?

Can anyone learn what is happening in a field by following a conference on Twitter? Tha’s how I decided to follow Taxonomy Bootcamp 2012.   I missed networking with a new generation of confident, well-trained taxonomists, but nevertheless, I will attempt to identify the themes and challenges that are facing taxonomists.  Please feel free to add to the list or dispute!

  1. Centralized models don’t scale; think federated, allow local variation.

Taxonomists have jobs because organizations value managing its resources with a common vocabulary, but how the architecture and governance of the vocabulary maps to the internal structure of an organization  is harder to understand.  Speakers urged attendees to allow distributed models that accommodate local variations and utilize local project vocabularies.  Using term sets (facets) help ease taxonomy managemement. As an example,  one tweet posted about the talk given by Gary Carlson and Pam Green, relayed that Microsoft had 23 term sets for its intranet – the most complex was products, and the simplest was confidentiality.  This structure assists in setting up access control to owenrship  groups manage these vocabularies at the group and/or term level.

2. Does social media and information sharing have a role in taxonomy and how does it square with governance, security, control, and confidentiality

This topic yielded some buzz and offers an interesting dialetic between the issue of information access and security. @syndetic tweeted that “HR sees social media as a time waster, and IT sees it as a security probem.”     So let’s start with exploring to see why  social media matters to taxonomists? The main point is that information is ambigouous and confusing, and that social media is necessary to help generate, identify and clarify relevant, lively, topical content now that can be curated later (as opposed to traditional models of curate then share).  AprilMunden  summed the argument for social media nicely in a tweet: “using folksonomy (user generated taxonomy) should defInitely be a part of your initial design AND ongoing governance plan.”   Seth Earley, always with  a witty insight,  said “Fast changing, requires expertise, ambiguous concepts …business can own; slow to change, more security,unambiguous,  let IT own that”     Maybe that’s a strategy for starting the conversation about the business value of tagging and social media.

3.  Why non-expert taxonomist needs to build a taxonomy.

“Experts swim in the deep end of the content.” Tom Reamy made this statement which set off a flurry of sympathetic tweets.  Tom clarified that experts know their content and process but can get mired in details and in their point-of view   Tom’s statement is provocative because it clarifies the role of the taxonomist which is

  • to be apply social  listening skills and analytic methods to categorize diverse needs of a wide range of users
  •  Build a well-structured faceted taxonomy that reflects needs of different constituencies and fits with the governance structure of an organization (see point 1)
  •  Validate and test the taxonomy
  • Assist with the support of application integration
  • Establish ongoing governance processes including use of social media and text analytic to keep the taxonomy relevant

A taxonomy needs to reflect multiple user needs which means allowing non-experts to explore topics before they go in for a deep dive.  Taxonomists need to work with experts to gain their support by engaging and in validating the taxonomy.

3. What aboutTaxonomy, the User Interface, and Big Data and Mobile apps

Taxonomists sit at an intersection between  data/digital assets/content and user interfaces, but it’s not clear how taxonomists apply their skills.  They are not graphic designers or UX experts, and not quite database experts.  A few sessions mentioned data visualization, and other graphic imagery to explore content and data such as   mashups of datasets,  grids, wheels, graphic organizers, or maps. This is an area where taxonomists, who are not as visually oriented, need to rethink their approach, to start to think of themselves as information artists.  Taxonomists can be advocates for adopting in improved techniques including standards that organize taxonomy/metadata framework  and can also advocate for tools that make sharing between across applications, organization and platforms more efficient, which brings us to point 4

4. Taxonomy tools must make it easier to import and export vocabularies

Taxonomists know that their vocabularies need to play well with all the applications above as well as other needs such as the goal of providing cross-organization information access Sharepoint, XML, relational databases, legacy products.   Taxonomies work in part because they are agnostic, because they can work in with any number of technologies, because concepts and metadata are separate from the content.  To play well with others, taxonomy tools  need to support import and export of vocabularies into different standards including SKOS or XML,  As KarlaTR tweeted ”If you put something in a taxonomy, you’ve got to get it out.”     One option is to try tools that are marketed at Taxonomy Bootcamp such as TopQuadrant EVN and Smartlogic which have been ported from the ontology world are now alongside Synaptica and Information Access as part of the tool evaluation process.

So what next, taxonomy?     What is nice to hear is that more taxonomists are surviving because their organizations understand their core roles. What’s the emerging topics  and challenges —  how to distribute and decentralize (localize)  while having authority and control, how to collect new content on emerging, current topics, visualization, how to be more agile, how to fit in with new technologies like social media, mobile, and big data.  Phew!  That’s a challenge.  Taxonomists have a chance to build relationships not only between terms, but with stakeholders in on a the way to a compelling, visualized, multidimensional content strategy.  Good luck.

Facets=Classes=Sets

Rdf-graph3
Image via Wikipedia

I just returned from an intense training in semantic web technologies through Top Quadrant and I learned much more about what goes on “under the covers.” The course explained more about how semantic technologies can generate machine to machine applications. One important learning was that facets are similar to classes which is similar to the mathematical idea of a set and discusses why taxonomists and programmers need to think more in terms of classes, facets and sets as similar ideas.

Using semantic tools requires building a conceptual model — which is collection of classes.  To build useful models that are semantically-enable requires learning the basic semantic toolkit:

  • RDF (relational description framework). In RDF, one creates classes, and designs relations between individual members of a class and between classes. RDF comes in two main flavors:  RDFa which is for web-based applications  and RDFs which can be used to generate the ontology (concept mapping) as a schema to represent the underlying data.  RDF is used to create inverted graphs that can be converted to triples. Using RDF, one can read in a data store such as a spreadsheet and quickly generate a starter taxonomy (which still needs to be validated with use case scenarios )
  • SKOS (simple knowledge organization system) converts traditional taxonomies into rdf format. SKOS handles basic thesaurus-type relations such as broader/narrower concepts, alternative labels and related concepts. In SKOS the related concept would have its own unique resource identifier. SKOS can only describe a concept with broader, narrower and alternative labels and preferred labels, and cannot associate a concept with an OWL class.
  • SPARQL is a specialized query language, designed to query triple stores A semantically-enabled applications is one that is converted can be converted into an RDF graph, which can then be visually displayed as a graph and queried using SparQL.
  • OWL (web ontology language) is the underlying language for describing models. OWL is required to handle more complexity such as restrictions, cardinality, and inferencing.

Most everything conceptually in RDF, SKOS, and the underlying programming language OWL, once you get under the covers, will familiar to taxonomists. Some details can confuse you, but don’t let the lack of underlying naming conventions deter you. For example, a class in RDF is called an Owl:Thing. If a class is defined in RDF Schema language can be called an RDFS:Class. Oh well, confusing, but don’t let that deter you from appreciating the power of this approach. A thing is still a class, which is similar to a facet.

Here are some examples of how OWL and taxonomies are similar. The bolded print is the OWL property.

SubClassOf defines narrow term in a set

Inverse of creates reciprocal relations

Transitivity allows navigation of a hierarchy so that if A = B, B=C, the A=C. A SPARQL query that can chain through a hierarchy can potentially consist of 2 lines.

Restrictions are similar to slot facets or attributes which are o properties that limit the set

Here are some reasons to utilize classes in semantic technologies as a best practice.  Without implementing classes and modeling, these outcomes would be hard to achieve:

Form follows function: Instead of designing big monolithic hierarchical taxonomies, thinking in terms of classes or facets, which are groupings of individual members in a set. These smaller, faster sets (fasets, perhaps) will be easier to export, import, edit and share. Perhaps facets should be called fast sets or fasets! Plus the facets (classes) can become fields in a web form. The possibilities for reuse and design opens many options.

Scalability and Reuse: Since concepts and the associated classes are independent of data and content, the concepts and classes can be changed, such as changing an organization name, renaming key terms, or adapting new ideas, without changing underlying queries and systems architecture. This is scalable.

Change Schema Without Changing Content: Developing conceptual mapping can be done independently and designed and changed in the RDF schema or OWL language without changing the underlying data. Precision: Because an individual concept can be easily manipulated as a member of a set, or multiple sets, the concept can have a more accurate definition. For example, take a term like “Chevy Chase.” By associating “Chevy Chase” with a class:Person one can distinguish Chevy Chase, the comedian, from Chevy Chase, Maryland as part of the class: Location. Furthermore, ideally each unique concept of Chevy Chase would have its own namespace or unique resource identifier (URI).

Precision: The ability to create a concept independent of the content without tightly coupling into a hierarchy, but allowing the concept to associate in a clear way with the appropriate facet or class and to get more precision. This same logic can be applied to more amorphous, squishy terms like “Compensation” or “Performance” or “Management” or “Quality” which can be deconstructed into more specific variants like “Executive Compensation” vs “Non-exempt Pay and Benefits” RDFs can be used to link to more appropriate term with a unique URI

Facilitate Linked Data: If taxonomies and data can be shared, it is faster to build serious applications that can solve real and acute problems. In our class, we built applications that mapped free wifi hot spots were next to swimming pools and taquerias in geographic location, but we also did a serious social policy application where we mapped cities in the United States that had increases in complaints about housing due to sexual orientation, national origin, race and other discriminatory practices, taking data from multiple, reputable sources and applying a common conceptual model.

There are some new challenges for taxonomists especially in understanding the importance of inferencing. Developers who work with OWL is that many inferencing errors can be traced back to bad, messy taxonomies where there are too many broad terms — in other words, avoid complex polyhierarchies.

To create taxonomies that are ready for the semantic future, the better practice is to how to arrange concepts into facets (which can be equated with classes or sets and avoiding complex polyhierarchies (a concept with too many parents). This will allow taxonomies to play well with applications such as user interface design and machine readable applications. The first step is to stop thinking about taxonomies as a monolithic hierarchy, but rather to look at taxonomies as a collection of classes (or facets), where a class is a set with individual members. If models and taxonomies can be easily built and used to connect across data worksheets resolving issues, applications based on linked data can be quickly built.

To try  semantic tools such as SKOS editors, download a trial copy of Top Braid Composer Free Edition.

Enhanced by Zemanta~Marlene Rockmore

Skills of a Classy Taxonomist

At SemTech in June 2010,  several speakers including Professor Deb McGuiness drew a very clear line was drawn between what a taxonomist does and what an ontologist does.  Taxonomists build hierarchies, and ontologists determine classes or categories.   In other words, ontologies are neat and unambiguous, and taxonomies are a bit messy.

Defining classes or ontology work  typically precedes building the taxonomy.  Defining the classes is like writing a specification for the taxonomy; in fact defining classes is the same as defining facets.   The goal of a taxonomist and ontologies should be to define a specific, unambiguous description of a term that helps manage how we find and organize content so the pathways are clear and specific; adding an ontology ensures that the term is placed in the most specific categories to help ensure clarity and lack of ambiguity. I would argue that no taxonomy is useful unless it is faceted – that is, has been divided into classes. Taxonomies work best when they share homogenous properties, and when they are smaller and focused.

By using class analysis, or facet analysis,  several problems are solved:

1)       Clarify specific terms by situation or functions: If I am interested in Java as a programming language, I want to see material related to Java as software, not as slang for coffee or  an island in the South Pacific.  If I am looking for “drill bits,”  it might be important to understand if the drill bits are for my home electric screwdriver  or for an oil rig.   Classes capture these distinctions, and help to create precise specific tagging and information retrieval.

2)       Ease longterm  maintenance issues: Christine Connors points to a simple but common example where taxonomies are built where people’s names are included as narrow terms under the role such as “Hillary Clinton” is “Secretary of State”  or “Charles Windsor” is the “Prince of Wales.” The problem is that when people filling these roles change, there is a maintenance headache.   A classy taxonomy recognizes that there is a separate class for <people> as an entity, as distinguished from <role>.  <People> and <Role>  can be connected by a predicate such as <isA>.  These distinctions are necessary for fast-changing information (such as who is dating whom in an entertainment application) or (who owns whom in a business application).

Abstraction <person> <has> <role>

Instance: Hillary Clinton <is>  Secretary of State

3)    Facilitate sharing  and importing taxonomies: Having taxonomies that are specified by a class description means the taxonomy will be more homogenous, have shared properties, and be more focused.  This will make it easier to import with less cleanup and review.  It will facilitate the use of SKOS for example. Messy taxonomies are harder to merge.

Anyone working with semantic technologies will tell you that most problems in inference happen when hierarchies in source taxonomies create odd associations by inserting a narrow or broad term. A taxonomist needs to be attentive to inferences in order to prevent false statements.   Professor Deb McGuiness calls this issue “truth maintenance.”

To keep these categories clear and distinct, ontologists rely on building a conceptual model or a picture of the domain (see earlier post on Taxonomies and modeling.)   Modeling strategies involve skills of most taxonomists.  Most taxonomists have been taught how to capture vocabulary and how to identify facets.  Check out the blog post Taxonomies and Modelling for more information.

Elaine Kendall  of Sandpiper Software, which is a concept-modelling tool.  suggested that “one could build an ontology in 2 hours.”   With new generation of tools that can create RDF/OWL from data and content,  this statement might be true.

    With good modelling tools that automatically generate RDF/OWL,such as TopQuadrant,  taxonomists might  be able to slide into the needed role as ontologists.  Taxonomists need to understand  some basic concepts in RDF/OWL to extend their skills such as what is a class, what is a property and what is a slot facet, what is class inheritance, what is meant by reciprocation and inverse properties and how to write a SPARQL query.  But more importantly,  a classy taxonomist can help become a facilitator to help build bridges between user and development communities and  to help diagnose and prevent technical problems.

    A taxonomist who is trained in ontologies  should bring the following skills:

    • Ability to create processes to identify the requirements for each class,
    • Develop  metrics to assess good results
    • Identify what vocabularies are needed and use skills to evaluate existing vocabularies, import and adapt these vocabularies to the current needs
    • Ensure the integrity and focus of vocabularies particularly when sourced from an outside vendor,
    • Develop processes to keep vocabularies current, and understand how to use metrics to “measure and improve” any vocabularies.
    • To be part of the development team to help identify if a source vocabulary might be part of false inference.

    The taxonomist works with different user communities as well as developers and helps bridge the gap between what users and experts know and what is needed to build a useful application.   A classy taxonomist has a well-rounded set of skills that can work with development teams and user organizations to build intelligent systems.

    Enhanced by Zemanta

    Thinking in Triples : Quick start for adapting taxonomies for semantic web

    It’s time to get comfortable with ontologies, RDFa, SPARQL and OWL.    After a few days at  Drupal Design Camp at MIT and SemTech09  in San Jose,   I’m convinced more than ever that  it’s time to start thinking about ontologies.      It’s time to think in triples.

    Why does RDF and Ontologies matter? To understand why RDF matters,  it might help to define ontologyAn ontology consists of concepts that are fully described and where  all the ambiguity has been resolved. Extracting meaningful links from databases and putting these concepts in a separate search structure solves so many problems.   It will make search engine indexes richer than standalone keywords that have no context,   it simplifies building indexes for programmers, allows filtering of data by facets, and enable visual interfaces (in the future – that’s the dream).   Thinking through the conceptual links  can  give a structure to unstructured data so it can be interpreted and analyzed by programs.     Yahoo experimented for the last year in RDF-enhanced search called Search Monkey.   Search Monkey users add structured metadata to their web using XHTML/RDFa which enhanced their search and changed how their data was displayed in the results.

    What’s RDFa? RDF stands for Resource Definition Framework. What the ‘ a’ stands for is not clear.   Drupal’s  Benjamin Melancon said it might be the first version.  At SemTech in San Jose,  it was suggested that it stands for RDF+XHTML.   It might even stand for attributes.  No one really knows. RDFa  is an HTML-like syntax that will link to database schema or data definition for a concept.      RDFa is used to  create  a term that consists of a subject-predicate-object  (or a “triple”).  The important thing here is the predicate which is the link between a subject and object.  For example, in the triple, “<person> <has> <skills>” , has is the predicate. Take triples such as “<person> <works at> <company>” or “ <celebrity><isdating><celebrity>”.     The verbs  works at or  is dating are the predicates.  Instead of using the ANSI Standard  library language of broad and narrow term,  ontologies are implement using  XHTML/RDFa /XML as the enabling technology  and can created these lively predicates.

    How does an ontology differ from an taxonomy or thesaurus? Simply put, ontologies allow hierarchical relations  just like taxonomies, but there is also some flexibility in defining links or connections between terms.  That’s the use of predicates.

    CEOS and CIOs are recognizing that value of taxonomies and ontologies in managing information.   Times are changing when business managers start talking that adding or modifying a term in the taxonomy can be faster change than trying to modify a database.    Ontologies and taxonomies  are perceived as responsive to changes in concepts as opposed to databases that have static structure and query language that has to be modified through an IT process. Because the taxonomy can be modified by a “user”  or subject matter expert, without programming intervention.  That’s empowering.

    Here are some simple ways to get started without learning any RDFa:

    • If you have a taxonomy, pay attention to ambiguous terms.  Create categories (also called facets) where terms can be placed comfortably.  Don’t put square pegs in round holes.  For example,  if you have a building products application,  you can classify  “Green”  under “Building Products”  and “Color.”      Green Building Products and The color Green are 2 separate, distinct concepts just as Lincoln, the American President, and Lincoln, Nebraska are distinct terms.    Don’t forget you are classifying concepts, unique ideas,  not keywords!    By classifying terms to a category,  you give terms context and meaning.
    • Connect terms with links between concepts.  No term should standalone without a relationship to another term or category  and every term should be disambiguated by being linked to larger concepts.   Try to have at least 3 touchpoints for your term, such as a broader category or a synonym and a link or predicate.  If you are uncertain about how to classify a term, put it in an “emerging concepts”  category while you get some more input about intent.    Simple relationships to look for are hierarchical relations such as a broad term, parent child, part-of, or a type-of,   and synonyms where terms are same-as or very closely related in meaning.
    • Research context and  intent! Find out how are your users looking for information?  How do they want to use information?  What types of  analysis are they doing?       Collecting this important user-centered research  to begin to  capture awareness of the situational and contextual process.  That means that the term has been placed in a context and also reflects intent or how term will be used. Context and Intent is important to resolving ambiguity.    Context is about location, process or role,  time, or situation.   That means that terminology  is in a context or data structure that captures meaning.   For example,  think of the term that has  local term variations such as “milk shake.”   In most of  the United States, a milk shake  has ice cream, but in Boston,  you need to order a “frappe.”  Otherwise, you’ll get shaken milk.   Intent looks at information from different user perspectives.  Got an upcoming  New Product Announcement?  The Engineer cares about how it is built, the CEO looks at the revenue, and  the lawyer looks at the contracts and licenses.    From each perspective, the term  New Product Announcement has different meanings.
    • Try  blogging tools to see how taxonomy works in user interfaces and how easy it is to add and modify concepts on the fly.  Typepad, Moveable Type and Drupal blogging software  all support RDFa.    Drupal can be downloaded from Acquia.com.
    • Try using taxonomy management tools: Test drive a taxonomy tool such as Data Harmony or Synaptica. Try one of the free ontology tools  Topbraid Composer is available for free as is Protégé from  Stanford University.    You might find that  traditional taxonomy tools such as Data Harmony and Synaptica are sometimes easier to learn and can product OWL and SKOS output which is compatible with XHTML.

    Here’s the best part . Taking a step back  to use good methodology including  understanding information problems, capturing views of information based on user needs,  disambiguating and categorizing terminology is the best practice for taxonomies in whatever  form, independent whether the vocabulary is a list, taxonomy, thesaurus or ontology.

    ~    Marlene Rockmore

    Reblog this post [with Zemanta]