Web Ontologies. Standards and Formats for Linked-Data and the Semantic Web: RDF (Resource Description Framework), OWL (Web Ontology Language), JSON-LD, etc. [relevant to Solid, Holo and others]

By rhyzom | rhyzom | 27 Mar 2020

Already mentioned Tim Berners-Lee's latest project (Solid) in a previous recent post. And as pointed out, Linked-Data and the concept of the Semantic Web are at the core of the concept. Basically it is an extension and advancing of the idea of interoperable ontologies as a machine-readable layer of sense-making in the globally connected world. In computer science, an ontology is a formal specification of how we name and define types, categories of things and their properties and relationships as they exist within a particular domain. Or, as Tom Gruber concisely puts it: "An ontology is a formal specification of a shared conceptualization." (We'll repeatedly continue to explain it further along.)

Linked data, in that sense, is conceptually similar to Google's knowledge graph or Facebook's open graph (i.e.,mapping the relationships and associations between pieces of data and objects). It can relate/correlate available data to directly answer questions and queries (similar to the Wolfram Alpha engine, for example), the discoverability aspect of which is particularly valuable in facilitating interdisciplinary research across different domains and revealing hidden relationships between things.

The Resource Description Framework (RDF) is the general (meta-data) method used for the conceptual descriptions and modeling of information in building web ontologies. RDF basically extends the linking structure of the Web in using URIs (Unified Resource Identifiers - for example, URLs) to formulate relationships between things allowing data to be put together, exposed and shared across different applications and platforms. This linking structure forms a directed graph where the edges represent the named link or relationship between two resources, represented by the nodes in the graph.

351665157-5e2c8c0a9bce094611345177bc9e615a0adeba46d008b40327af76dd516b42f8.png

The basic building blocks of RDF are semantic triples of a subject-predicate-object format and constitute some declarative fact about the world. And as already mentioned, not all facts may be explicitly mapped out by their relationships, but could also be indirectly inferred from existing triples. RDF datatypes which allow us to model RDF data can be of the JSON type (RDF/JSON), XML (RDF/XML) or TTL — commonly referred to as "turtle", which is the Terse RDF Triple Language.

Also, RDF subjects and predicates should be more broadly substituted with IRIs (Internationalized Resource Identifiers) in place of URIs — which are a fundamental component of the current Web and the foundation of semantic data. IRIs extend URIs to include characters beyond ASCII and eliminate ambiguity when data comes from different sources.

SPARQL is the query language of RDF. It is similar to SQL and is used to explore data by querying unknown relationships and performing complex joins of disparate data in a single simple query — it transforms/translates RDF data from one ontology to another. The RDF triple stores themselves are non-relational databases that store semantic data. They're flexible, NoSql type DBs, with no schema design upfront. They're a form of graph database where subjects and objects are stored as nodes, while predicates as edges (like a DAG). It's fast and scalable and able to interpret data and uncover/reveal hidden relationships.

The Semantic Web on the other hand is a parallel existing layer implementing the above (embedded in HTML/XHTML and/or using XML/RDF as its building blocks) in making networked data on the Web machine-readable and thus easily manageable, combable, crawlable, reachable, etc.

A recently launched and quickly becoming popular collaborative research and note-taking tool, roamresearch.com ("a note-taking tool for networked thought") makes use of exactly linked-data and relationship graphs in the same way.

RDFS: RDF Schema

RDF Schema (RDFS) is a language for writing ontologies. An ontology being a model of (a relevant part of) the world, listing the types of object, the relationships that connect them, and constraints on the ways that objects and relationships can be combined. A simple example of an ontology (though not written in RDFS syntax):

_{class: Person
class: Project
property: worksOn

worksOn domain Person
worksOn range Project}

Which says that in our model of the world, we only care about People and Projects. People can work on Projects, but not the other way around.

RDFa: Resource Description Framework in Attributes

RDFa (Resource Description Framework in Attributes) adds a set of attribute-level extensions to HTML, XHTML and various XML-based document types for embedding rich meta-data within Web documents. The RDF data-model mapping enables its use for embedding RDF triples (subject-predicate-object expressions) within XHTML documents and the extraction of RDF model triples by compliant user agents. As mentioned, RDF uses URIs/IRIs (those being how we identify things on the Web) to specify subjects and predicates (e.g., URLs as one common type of an URI/IRI). Below is an example of how one would express "Molly likes cookies" in N3 notation of a triple in RDFa:

@prefix pref: <https://example.org/vocabulary#>.
<#dave> <#pref:likes><#cookies>.

The @prefix line lets us know what the short-hand is for all the QRIs (short-hand URIs) in the document.

What is JSON-LD

JSON-LD is Javascript Object Notation for Linking Data. It's an extension to JSON which is used to transmit/exchange information between web sites and browsers — JSON schema is basically a number of key-value pairs that are both human-readable and machine parsable. JSON-LD introduces a simple concept of context (designated as @context in the JSON) to resolve possible ambiguities that may arise from interpretation(s) of diversely correlated data. Another useful feature is global identifiers, highlighted in the JSON-LD schema as @id, usually following the @context line (the global identifier @id basically indicating that the contents of the entire JSON-LD is universally identified through the value it has been given). JSON-LD is interconvertible with RDF.

Solid makes extensive use of JSON-LD and provides a Javascript-based DSL (domain-specific language, called LDflex) specifically designed for accessing and handling data in Solid pods through LDflex expressions.

OWL: Web Ontology Language

Ontologies themselves are specified by OWL (Web Ontology Language). The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies (formal way to describe taxonomies and classification networks, essentially defining the structure of knowledge for various domains) — the nouns representing classes of objects and the verbs representing relations between the objects. The OWL languages are characterized by formal semantics. They are built upon the World Wide Web Consortium's (W3C) XML standard for RDF. Both OWL and RDF have attracted significant academic, medical and commercial interest.

The W3C announced OWL 2 in 2009 which soon found its way into semantic editors such as Protégé and semantic reasoners (a.k.a. reasoning engine, rules engine or reasoner is a piece of software able to infer logical consequences from a set of asserted facts or axioms) such as Pellet, RacerPro, FaCT++, HermiT.

Here's an example of OWL with RDF graph.

Ontologies and RDF datasets (links and resources)

The ontology-based Research Group@IITM has a repository of available ontologies and RDF datasets here. But one particular resource I have been myself interested in is the Seshat Global History Databank (downloadable here), which is extensively used in cliodynamics research.

Digital humanities

Ontologies, RDF, linked-data and the Semantic Web are particularly valuable and useful in the academia and are central to what today is known as the digital humanities (DH) — an area of scholarly activity which intersects digital technologies with disciplines of the humanities, opening up new ways of going about research and doing scholarship which involve a more collaborative, multi-disciplinary and computationally engaged research, experimentation, demonstration teaching and publishing.

351665157-56adec81fd7fccd492796cdd6e79f748642bd5ce5f1bbb67cb6db38b3d9b0c58.png

The DIKW pyramid (also known as the DIKW hierarchy or DIKW information pyramid) is a loose model for representing the structural and functional relationships between data, information, knowledge and wisdom. Basically, raw data is only useful as information (or the meaning produced from the bare facts available) — in showing the kinds of relationships between different pieces of data, such that they constitute some valid statement or observation about the world. Knowledge then involves the synthesis of multiple and diverse sources of information over time and the organization and processing of accumulated experience and information (constituting the domain of expertise and skill).

On this, check these two posts:

Digital Humanities: Racing with the Machines
and this reddit post of mine from awhile ago: Seshat: Global History Databank. Web Ontologies, Digital Humanities & A Possible Ceptr/Holochain Implementation

Technology Philosophy Science Research web 3.0

How do you rate this article?

rhyzom

Verum ipsum factum. Chaotic neutral.

rhyzom

Ad hoc heuristics for approaching complex systems and the "unknown unknowns". Techne & episteme. Verum ipsum factum. In the words of Archimedes: "Give me a lever and a place to rest it... or I shall kill a hostage every hour." Rants, share-worthy pieces and occasional insights and revelations.