Aesopica, Part 1: General Introduction

2018-09-10

The stories called Aesop's Fables or the Aesopica, are an ancient collection of stories that have been passed down to modern day. These stories are of diverse origins they cover a wide variety of themes. Although originally intended for an adult audience, in later times were often used for the education of children.

One of such stories is the tale of the Fox and the Stork. There are many versions of this fable, but the overall outline is generally as follows:

"The fox invited the stork to dinner. At the dinner soup was served from a shallow plate, that the fox could eat but the hungry stork could not even taste. In turn the stork invited the fox to a dinner. Dinner was served in a narrow mouthed jug filled with crumbled food. This time the fox could not reach the food, while the stork ate."

1884 fountain design depicting the story of the Fox and the Stork by Catalan sculptor Eduard Batiste Alentorn in Barcelona

The intention of stories such as these, as well as text in general, is to convey meaning. However, in addition to humans, a new audience for text has come to light in recent years: machines. To facilitate this new audience a set of technologies has been developed to convey the meaning of text in a precise and unambiguous way that is easily understandable for both humans and machines alike. Many of these new methods fall under the umbrella of the Semantic Web. The goal of the Semantic Web is to create a web of data where the meaning of the information is both human and machine understandable.

One of the cornerstone technologies in conveying information for this purpose is Linked Data, and in particular the Resource Description Framework (RDF) standard that defines how this Linked Data can be expressed. I have written a short introduction to Linked Data before but to summarize: it allows for the expressing information as a set of facts. These facts have the form of subject, predicate, object triples. A set of these facts is often called a knowledge base, or in an alternative view this can also been seen as a knowledge graph where the facts define the nodes and edges.

In a Linked Data representation the story of Fox and the Stork would look something like this:

@base <http://www.newresalhaider.com/ontologies/aesop/foxstork/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

<fox> rdf:type <animal>.
<stork> rdf:type <animal>.
<fox> <gives-invitation> <invitation1>.
<invitation1> <has-invited> <stork>.
<invitation1> <has-food> <soup>.
<invitation1> <serves-using> <shallow-plate>.
<stork> <gives-invitation> <invitation2>.
<invitation2> <has-invited> <fox>.
<invitation2> <has-food> <crumbled-food>.
<invitation2> <serves-using> <narrow-mouthed-jug>.
<fox> <can-eat-food-served-using> <shallow-plate>.
<fox> <can-not-eat-food-served-using> <narrow-mouthed-jug>.
<stork> <can-eat-food-served-using> <narrow-mouthed-jug>.
<stork> <can-not-eat-food-served-using> <shallow-plate>.

This is in the Turtle syntax of RDF. There are other types of syntax are available to represent Linked Data, for example in JSON form as JSON-LD.

To summarize a bit of what this Linked Data format does in this scenario, is that it uses Uniform Resource Identifiers (URIs) to define the subjects, predicates and objects of each fact. This allows to precisely and unambiguously define and link the meaning between these elements. For example, the fact that the fox is a type of animal could be expressed by the triple with the full URIs: http://www.newresalhaider.com/ontologies/aesop/foxstork/fox http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.newresalhaider.com/ontologies/aesop/foxstork/animal . Due to the fact that writing the full URIs can be quite cumbersome, the Turtle syntax uses two kinds of shorthands to help out. In this case one can define a base URI for the current document, http://www.newresalhaider.com/ontologies/aesop/foxstork, as well as prefixes for other namespeaces, such as http://www.w3.org/1999/02/22-rdf-syntax-ns#, with which the writing of each fact that would begin with these URI fragments could be shortened.

When everything put together this format still describes the original story, albeit restructured into separate facts.

There exists many tools for handling Linked Data such as the above story. For example APIs, such as Jena, can aid in the creation, storage and querying of data made available in such a fashion. Of course more and better tools and techniques are always welcome. In this article in particular we hope to describe how we can use the Clojure programming language to enable working with Linked Data.

Clojure is a language that offers a lot of benefits. The focus on manipulating pure data, with immutable data-structures and functional programming, provides an excellent way to organize code. The ability to inter-operate with the Java and JavaScript ecosystems, allows for the use of many mature libraries as well as many avenues for deployment.

To use the data manipulation capabilities of Clojure to enable the Semantic Web, seems like a natural combination. Some previous works also aimed at exploring this area, notably EDN-LD which gives a convention and a library for working with Linked Data.

In this article we will also explore how we can use Clojure to interact with Linked Data. In our case we will focus on the creation Linked Data from a Clojure environment and we might take different conventions compared to previous work, so we start with a fresh implementation.

In Clojure, information is directly represented as data, as opposed to it being encapsulated into various other abstractions such as objects. A large subset of elements data in Clojure is also a data format called the Extensible Data Notation (EDN). The built-in elements in this notation are nil, booleans, strings, characters, symbols, keywords, integers, floating-point numbers, lists, vectors, maps and sets. The meaning behind most of these elements is relatively straightforward, so we only give a brief summary of them here and some examples.

Nil

An empty or non-existent element is represented by nil.

Booleans

A boolean value can be true or false.

Strings

Strings are written in double quotes, for example: "This sentence is a string.".

Characters

Characters representing single characters, and are preceded by a backslash, for example \c or \newline.

Symbols

Symbols are representing identifiers, written by a set of characters (with a few additional rules). Examples of identifiers are for example foo, clojure.core, clojure.string/split. As some of these examples show, in Clojure they are used, among other things to refer to modules and functions. Another interesting feature, as the clojure.string/split example shows, is that they can be namespaced which helps to organize symbols and avoid name collisions.

Keywords

Keywords are very similar to symbols but they are identifiers that refer to themselves. They are constructed much like symbols, but with a leading :. Examples of keywords are :fruit or :company.persons/name.

Integers and Floats

Integers and floats (floating point numbers) are used, as expected, to write numbers 3 or 4.5 for example.

All these elements described above can be put in collections.

Lists

Lists are a sequence of values enclosed in (), for example (2 "A string." false).

Vectors

Vectors are a sequence of values enclosed in [], for example [true nil :company/name]. which are designed for random access of its elements.

Sets

Sets are collections of unique values enclosed in #{}, such as #{:fruit 2}.

Maps

Finally maps are key value pairs, enclosed in curly braces {}, for example {:name "John Smith", :age 4}, where each key is unique. Of course collections can also nested any type of collection.

Using this notation elements of EDN, we can build an EDN based version of the story of the Fox and the Stork, using some conventions.

Given that in many practical cases we are probably going to shorten URIs with prefixes when writing, we can use a keyword for denoting elements. In the case where we would use the base prefix, we can just use a regular, non-namespaced, keyword, i.e. :fox, and in cases where we would refer to any other prefix we can use namespaced keywords, i.e. :rdf/type. A full fact could then be described with a relatively straightforward vector, for example [:fox :rdf/type :animal] and the knowledge base with a set of facts such as #{[:fox :rdf/type :animal] [:stork :rdf/type :animal]}.

Of course this means that in addition to facts we also need some data for the context, in which we store the base and other prefixes and to what they map to, to be able to fully build an equivalent Linked Data representation. The context will be a map of the relevant prefixes as keys, as well as nil for the base prefix. For the above example this means that the below example will describe the context needed to resolve all the full URIs:

{nil "http://www.newresalhaider.com/ontologies/aesop/foxstork/"
:rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#"}

Putting everything together, to have a full Linked Data graph we need a context and a set of facts, so the overall structure will be a map where these are both defined:

  {::aes/context
   {nil "http://www.newresalhaider.com/ontologies/aesop/foxstork/"
    :rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#"}
   ::aes/facts
   #{[:fox :rdf/type :animal]
     [:stork :rdf/type :animal]
     [:fox :gives-invitation :invitation1]
     [:invitation1 :has-invited :stork]
     [:invitation1 :has-food :soup]
     [:invitation1 :serves-using :shallow-plate]
     [:stork :gives-invitation :invitation2]
     [:invitation2 :has-invited :stork]
     [:invitation2 :has-food :crumbled-food]
     [:invitation2 :serves-using :narrow-mouthed-jug]
     [:fox :can-eat-food-served-using :shallow-plate]
     [:fox :can-not-eat-food-served-suing :narrow-mouthed-jug]
     [:stork :can-eat-food-served-using :narrow-mouthed-jug]
     [:stork :can-not-eat-food-served-suing :shallow-plate]}}

I have started a small library for manipulating Linked Data structures written this way, with the name Aesopica. It is in very early stages, where the current main functionality is to translate Linked Data written this way into the Turtle format described above.

Of course there are lot of other elements of Linked Data that needs to be represented in this that we did not tackle yet. In addition there are also a large number of Clojure libraries that could be used to make writing and using Linked Data in this fashion easier. How these features could be achieved however is a story for another time.