It’s time to get comfortable with ontologies, RDFa, SPARQL and OWL. After a few days at Drupal Design Camp at MIT and SemTech09 in San Jose, I’m convinced more than ever that it’s time to start thinking about ontologies. It’s time to think in triples.
Why does RDF and Ontologies matter? To understand why RDF matters, it might help to define ontology. An ontology consists of concepts that are fully described and where all the ambiguity has been resolved. Extracting meaningful links from databases and putting these concepts in a separate search structure solves so many problems. It will make search engine indexes richer than standalone keywords that have no context, it simplifies building indexes for programmers, allows filtering of data by facets, and enable visual interfaces (in the future – that’s the dream). Thinking through the conceptual links can give a structure to unstructured data so it can be interpreted and analyzed by programs. Yahoo experimented for the last year in RDF-enhanced search called Search Monkey. Search Monkey users add structured metadata to their web using XHTML/RDFa which enhanced their search and changed how their data was displayed in the results.
What’s RDFa? RDF stands for Resource Definition Framework. What the ‘ a’ stands for is not clear. Drupal’s Benjamin Melancon said it might be the first version. At SemTech in San Jose, it was suggested that it stands for RDF+XHTML. It might even stand for attributes. No one really knows. RDFa is an HTML-like syntax that will link to database schema or data definition for a concept. RDFa is used to create a term that consists of a subject-predicate-object (or a “triple”). The important thing here is the predicate which is the link between a subject and object. For example, in the triple, “<person> <has> <skills>” , has is the predicate. Take triples such as “<person> <works at> <company>” or “ <celebrity><isdating><celebrity>”. The verbs works at or is dating are the predicates. Instead of using the ANSI Standard library language of broad and narrow term, ontologies are implement using XHTML/RDFa /XML as the enabling technology and can created these lively predicates.
How does an ontology differ from an taxonomy or thesaurus? Simply put, ontologies allow hierarchical relations just like taxonomies, but there is also some flexibility in defining links or connections between terms. That’s the use of predicates.
CEOS and CIOs are recognizing that value of taxonomies and ontologies in managing information. Times are changing when business managers start talking that adding or modifying a term in the taxonomy can be faster change than trying to modify a database. Ontologies and taxonomies are perceived as responsive to changes in concepts as opposed to databases that have static structure and query language that has to be modified through an IT process. Because the taxonomy can be modified by a “user” or subject matter expert, without programming intervention. That’s empowering.
Here are some simple ways to get started without learning any RDFa:
- If you have a taxonomy, pay attention to ambiguous terms. Create categories (also called facets) where terms can be placed comfortably. Don’t put square pegs in round holes. For example, if you have a building products application, you can classify “Green” under “Building Products” and “Color.” Green Building Products and The color Green are 2 separate, distinct concepts just as Lincoln, the American President, and Lincoln, Nebraska are distinct terms. Don’t forget you are classifying concepts, unique ideas, not keywords! By classifying terms to a category, you give terms context and meaning.
- Connect terms with links between concepts. No term should standalone without a relationship to another term or category and every term should be disambiguated by being linked to larger concepts. Try to have at least 3 touchpoints for your term, such as a broader category or a synonym and a link or predicate. If you are uncertain about how to classify a term, put it in an “emerging concepts” category while you get some more input about intent. Simple relationships to look for are hierarchical relations such as a broad term, parent child, part-of, or a type-of, and synonyms where terms are same-as or very closely related in meaning.
- Research context and intent! Find out how are your users looking for information? How do they want to use information? What types of analysis are they doing? Collecting this important user-centered research to begin to capture awareness of the situational and contextual process. That means that the term has been placed in a context and also reflects intent or how term will be used. Context and Intent is important to resolving ambiguity. Context is about location, process or role, time, or situation. That means that terminology is in a context or data structure that captures meaning. For example, think of the term that has local term variations such as “milk shake.” In most of the United States, a milk shake has ice cream, but in Boston, you need to order a “frappe.” Otherwise, you’ll get shaken milk. Intent looks at information from different user perspectives. Got an upcoming New Product Announcement? The Engineer cares about how it is built, the CEO looks at the revenue, and the lawyer looks at the contracts and licenses. From each perspective, the term New Product Announcement has different meanings.
- Try blogging tools to see how taxonomy works in user interfaces and how easy it is to add and modify concepts on the fly. Typepad, Moveable Type and Drupal blogging software all support RDFa. Drupal can be downloaded from Acquia.com.
- Try using taxonomy management tools: Test drive a taxonomy tool such as Data Harmony or Synaptica. Try one of the free ontology tools Topbraid Composer is available for free as is Protégé from Stanford University. You might find that traditional taxonomy tools such as Data Harmony and Synaptica are sometimes easier to learn and can product OWL and SKOS output which is compatible with XHTML.
Here’s the best part . Taking a step back to use good methodology including understanding information problems, capturing views of information based on user needs, disambiguating and categorizing terminology is the best practice for taxonomies in whatever form, independent whether the vocabulary is a list, taxonomy, thesaurus or ontology.
~ Marlene Rockmore