Facets=Classes=Sets

Rdf-graph3
Image via Wikipedia

I just returned from an intense training in semantic web technologies through Top Quadrant and I learned much more about what goes on “under the covers.” The course explained more about how semantic technologies can generate machine to machine applications. One important learning was that facets are similar to classes which is similar to the mathematical idea of a set and discusses why taxonomists and programmers need to think more in terms of classes, facets and sets as similar ideas.

Using semantic tools requires building a conceptual model — which is collection of classes.  To build useful models that are semantically-enable requires learning the basic semantic toolkit:

  • RDF (relational description framework). In RDF, one creates classes, and designs relations between individual members of a class and between classes. RDF comes in two main flavors:  RDFa which is for web-based applications  and RDFs which can be used to generate the ontology (concept mapping) as a schema to represent the underlying data.  RDF is used to create inverted graphs that can be converted to triples. Using RDF, one can read in a data store such as a spreadsheet and quickly generate a starter taxonomy (which still needs to be validated with use case scenarios )
  • SKOS (simple knowledge organization system) converts traditional taxonomies into rdf format. SKOS handles basic thesaurus-type relations such as broader/narrower concepts, alternative labels and related concepts. In SKOS the related concept would have its own unique resource identifier. SKOS can only describe a concept with broader, narrower and alternative labels and preferred labels, and cannot associate a concept with an OWL class.
  • SPARQL is a specialized query language, designed to query triple stores A semantically-enabled applications is one that is converted can be converted into an RDF graph, which can then be visually displayed as a graph and queried using SparQL.
  • OWL (web ontology language) is the underlying language for describing models. OWL is required to handle more complexity such as restrictions, cardinality, and inferencing.

Most everything conceptually in RDF, SKOS, and the underlying programming language OWL, once you get under the covers, will familiar to taxonomists. Some details can confuse you, but don’t let the lack of underlying naming conventions deter you. For example, a class in RDF is called an Owl:Thing. If a class is defined in RDF Schema language can be called an RDFS:Class. Oh well, confusing, but don’t let that deter you from appreciating the power of this approach. A thing is still a class, which is similar to a facet.

Here are some examples of how OWL and taxonomies are similar. The bolded print is the OWL property.

SubClassOf defines narrow term in a set

Inverse of creates reciprocal relations

Transitivity allows navigation of a hierarchy so that if A = B, B=C, the A=C. A SPARQL query that can chain through a hierarchy can potentially consist of 2 lines.

Restrictions are similar to slot facets or attributes which are o properties that limit the set

Here are some reasons to utilize classes in semantic technologies as a best practice.  Without implementing classes and modeling, these outcomes would be hard to achieve:

Form follows function: Instead of designing big monolithic hierarchical taxonomies, thinking in terms of classes or facets, which are groupings of individual members in a set. These smaller, faster sets (fasets, perhaps) will be easier to export, import, edit and share. Perhaps facets should be called fast sets or fasets! Plus the facets (classes) can become fields in a web form. The possibilities for reuse and design opens many options.

Scalability and Reuse: Since concepts and the associated classes are independent of data and content, the concepts and classes can be changed, such as changing an organization name, renaming key terms, or adapting new ideas, without changing underlying queries and systems architecture. This is scalable.

Change Schema Without Changing Content: Developing conceptual mapping can be done independently and designed and changed in the RDF schema or OWL language without changing the underlying data. Precision: Because an individual concept can be easily manipulated as a member of a set, or multiple sets, the concept can have a more accurate definition. For example, take a term like “Chevy Chase.” By associating “Chevy Chase” with a class:Person one can distinguish Chevy Chase, the comedian, from Chevy Chase, Maryland as part of the class: Location. Furthermore, ideally each unique concept of Chevy Chase would have its own namespace or unique resource identifier (URI).

Precision: The ability to create a concept independent of the content without tightly coupling into a hierarchy, but allowing the concept to associate in a clear way with the appropriate facet or class and to get more precision. This same logic can be applied to more amorphous, squishy terms like “Compensation” or “Performance” or “Management” or “Quality” which can be deconstructed into more specific variants like “Executive Compensation” vs “Non-exempt Pay and Benefits” RDFs can be used to link to more appropriate term with a unique URI

Facilitate Linked Data: If taxonomies and data can be shared, it is faster to build serious applications that can solve real and acute problems. In our class, we built applications that mapped free wifi hot spots were next to swimming pools and taquerias in geographic location, but we also did a serious social policy application where we mapped cities in the United States that had increases in complaints about housing due to sexual orientation, national origin, race and other discriminatory practices, taking data from multiple, reputable sources and applying a common conceptual model.

There are some new challenges for taxonomists especially in understanding the importance of inferencing. Developers who work with OWL is that many inferencing errors can be traced back to bad, messy taxonomies where there are too many broad terms — in other words, avoid complex polyhierarchies.

To create taxonomies that are ready for the semantic future, the better practice is to how to arrange concepts into facets (which can be equated with classes or sets and avoiding complex polyhierarchies (a concept with too many parents). This will allow taxonomies to play well with applications such as user interface design and machine readable applications. The first step is to stop thinking about taxonomies as a monolithic hierarchy, but rather to look at taxonomies as a collection of classes (or facets), where a class is a set with individual members. If models and taxonomies can be easily built and used to connect across data worksheets resolving issues, applications based on linked data can be quickly built.

To try  semantic tools such as SKOS editors, download a trial copy of Top Braid Composer Free Edition.

Enhanced by Zemanta~Marlene Rockmore
Advertisements

Skills of a Classy Taxonomist

At SemTech in June 2010,  several speakers including Professor Deb McGuiness drew a very clear line was drawn between what a taxonomist does and what an ontologist does.  Taxonomists build hierarchies, and ontologists determine classes or categories.   In other words, ontologies are neat and unambiguous, and taxonomies are a bit messy.

Defining classes or ontology work  typically precedes building the taxonomy.  Defining the classes is like writing a specification for the taxonomy; in fact defining classes is the same as defining facets.   The goal of a taxonomist and ontologies should be to define a specific, unambiguous description of a term that helps manage how we find and organize content so the pathways are clear and specific; adding an ontology ensures that the term is placed in the most specific categories to help ensure clarity and lack of ambiguity. I would argue that no taxonomy is useful unless it is faceted – that is, has been divided into classes. Taxonomies work best when they share homogenous properties, and when they are smaller and focused.

By using class analysis, or facet analysis,  several problems are solved:

1)       Clarify specific terms by situation or functions: If I am interested in Java as a programming language, I want to see material related to Java as software, not as slang for coffee or  an island in the South Pacific.  If I am looking for “drill bits,”  it might be important to understand if the drill bits are for my home electric screwdriver  or for an oil rig.   Classes capture these distinctions, and help to create precise specific tagging and information retrieval.

2)       Ease longterm  maintenance issues: Christine Connors points to a simple but common example where taxonomies are built where people’s names are included as narrow terms under the role such as “Hillary Clinton” is “Secretary of State”  or “Charles Windsor” is the “Prince of Wales.” The problem is that when people filling these roles change, there is a maintenance headache.   A classy taxonomy recognizes that there is a separate class for <people> as an entity, as distinguished from <role>.  <People> and <Role>  can be connected by a predicate such as <isA>.  These distinctions are necessary for fast-changing information (such as who is dating whom in an entertainment application) or (who owns whom in a business application).

Abstraction <person> <has> <role>

Instance: Hillary Clinton <is>  Secretary of State

3)    Facilitate sharing  and importing taxonomies: Having taxonomies that are specified by a class description means the taxonomy will be more homogenous, have shared properties, and be more focused.  This will make it easier to import with less cleanup and review.  It will facilitate the use of SKOS for example. Messy taxonomies are harder to merge.

Anyone working with semantic technologies will tell you that most problems in inference happen when hierarchies in source taxonomies create odd associations by inserting a narrow or broad term. A taxonomist needs to be attentive to inferences in order to prevent false statements.   Professor Deb McGuiness calls this issue “truth maintenance.”

To keep these categories clear and distinct, ontologists rely on building a conceptual model or a picture of the domain (see earlier post on Taxonomies and modeling.)   Modeling strategies involve skills of most taxonomists.  Most taxonomists have been taught how to capture vocabulary and how to identify facets.  Check out the blog post Taxonomies and Modelling for more information.

Elaine Kendall  of Sandpiper Software, which is a concept-modelling tool.  suggested that “one could build an ontology in 2 hours.”   With new generation of tools that can create RDF/OWL from data and content,  this statement might be true.

    With good modelling tools that automatically generate RDF/OWL,such as TopQuadrant,  taxonomists might  be able to slide into the needed role as ontologists.  Taxonomists need to understand  some basic concepts in RDF/OWL to extend their skills such as what is a class, what is a property and what is a slot facet, what is class inheritance, what is meant by reciprocation and inverse properties and how to write a SPARQL query.  But more importantly,  a classy taxonomist can help become a facilitator to help build bridges between user and development communities and  to help diagnose and prevent technical problems.

    A taxonomist who is trained in ontologies  should bring the following skills:

    • Ability to create processes to identify the requirements for each class,
    • Develop  metrics to assess good results
    • Identify what vocabularies are needed and use skills to evaluate existing vocabularies, import and adapt these vocabularies to the current needs
    • Ensure the integrity and focus of vocabularies particularly when sourced from an outside vendor,
    • Develop processes to keep vocabularies current, and understand how to use metrics to “measure and improve” any vocabularies.
    • To be part of the development team to help identify if a source vocabulary might be part of false inference.

    The taxonomist works with different user communities as well as developers and helps bridge the gap between what users and experts know and what is needed to build a useful application.   A classy taxonomist has a well-rounded set of skills that can work with development teams and user organizations to build intelligent systems.

    Enhanced by Zemanta

    Is GoodRelations a Game Changer?

    One  ontology  worth watching might be GoodRelations, which is being implemented by   Best Buy.      The central component of this architecture was an ontology called GoodRelations developed by Martin Hepp, who presented at SemTech in San Francisco last week via Skype from Munich, Germany.    GoodRelations is a retail ontology which uses RDFa from XHTML webpages to populate global ontology.   But why would a major retailer use this  architecture?

    Best Buy discovered that it was impossible to be the top dog  in search engine optimization (SEO)  in every search category for every product.  To do this, they needed to have finely tuned individual pages.  They also wanted to provide immediate content about “open box” – returned items at local stores.    looking for a solution that could add more granularity, precision and localization, but still enable global search and have metadata that was controlled by the enterprise.

    GoodRelations is a retail ontology, which offers facets or classes, metadata descriptions and attributes  that are common in the retail industry.   It is expressed in RDFa which is a flavor of RDF that works in web browsers.  Yahoo Search Monkey supports RDFa,  Facebook directed graphs will support RDF.  Google snippets also support RDFa.

    Because there is common metadata, it is easy for employees or customers (who are called “user agents” in the semantic world) to tag content via templates which populate the RDF.  RDFa can be maintained in a corporate or enterprise repository which can be configured as needed for distribution in the enterprise.

    In the GoodRelations RDF, the additional metadata might include price, color, dimensions, model and other attributes that interest consumers.  GoodRelations is an ontology that can be shared over any retail enterprise in any country.  The cost per webpage, once implemented, is minimal because “user agents” are familiar with how to complete forms over the web. The RDFa can then be appended to an HTML page written in XHTML or HTML5.  These HTML code for adding the specific metadata attributes is about 30-50 lines.  This creates HTML that has more granularity than a typical <keyword> metatag. The high costs are in the metadata management.

    Adding RDFa as metadata to a webpage should be easy to adopt because it works in the current web paradigm.   Google is offering RDFa markup language that can be appended to a webpage called Google Rich Snippets.  Snippets is competing with the another format called Microformat.  The problem is that every domain needs a shared set of s metadata attributes to enable search across smaller domains.   Google is rolling out examples of RDFa for restaurants, currently only has 2500 markup pages. To see an example of snippets,  try a search on Google for “Baked Ziti.”  Drupal 7 also offers RDF, and has been implemented in http://www.whitehouse.gov, as part of the Obama Administration transparency initiative.

    Why does this interest me as a  classy taxonomist (future ontologist)?  Clearly, this technology has evolved to a point of adoption, but further adoption depends on political and organizational work to get other applications to take the risk to try RDFa.    RDFa depends on common adoption of similar metadata  This requires political and organization skills to define and manage common metadata knowledge models.  First, taxonomists understand vocabulary and metadata as a way to capture common knowledge and shared metadata.  Second, if this innovation becomes more widely adopted and gains traction,  there may be interest in building similar process in other applications in making any information that has to be shared.

    Further, if RDFa coupled with ontology and metadata management, makes data management and querying easier through SPARQL,  then more attention can be paid to the political and organizational work of working with local agencies to contribute good data and content.

    There is a long way to go to make this vision a reality.. browsers have to adopt RDFa, applications have to prove the viability and ontologies in other domains need to be created.  But in the long run, this might be a more democratic way to extend information access on the web.

    However,  to move toward this vision, faceted navigation and defining common metadata and taxonomies is  good intermediate step.  By creating faceted taxonomies and browsing, and collecting data, user communities are moving towards understanding what search fields, common language, and unambiguous terms that matter to their users.  A little semantics goes a long way.

    ~Marlene Rockmore