Aesopica, Part 4: Basic Conversion to Other Formats

2018-12-18

This article is the fourth part of a series, examining the use of the Clojure language for representing Linked Data, with examples from Aesop's stories. The topic of this article is to describe how to do some basic conversions from our Clojure representation of Linked Data, to some of the other formats, such as Turtle, NQUADS or JSON-LD.

In previous articles of this series, we created a Clojure based syntax for defining Linked Data. In order to make this syntax a viable member of the Linked Data ecosystem, it is important to provide conversion functionality to other Linked Data formats. This allows for the user of the associated Aesopica library, to create and use Linked Data in a Clojure based environment, and convert it, when needed, to other formats. In order to implement this functionality we made use of Clojure's Java interop and the Apache Jena, and made it available in the Aesopica library.

To start off we begin with the basic example of "The Fox and The Stork" story that we introduced initially. The Clojure representation of this is as follows:

(def fox-and-stork-edn
  {::aes/context
   {nil "http://www.newresalhaider.com/ontologies/aesop/foxstork/"
    :rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#"}
   ::aes/facts
   #{[:fox :rdf/type :animal]
     [:stork :rdf/type :animal]
     [:fox :gives-invitation :invitation1]
     [:invitation1 :has-invited :stork]
     [:invitation1 :has-food :soup]
     [:invitation1 :serves-using :shallow-plate]
     [:stork :gives-invitation :invitation2]
     [:invitation2 :has-invited :fox]
     [:invitation2 :has-food :crumbled-food]
     [:invitation2 :serves-using :narrow-mouthed-jug]
     [:fox :can-eat-food-served-using :shallow-plate]
     [:fox :can-not-eat-food-served-using :narrow-mouthed-jug]
     [:stork :can-eat-food-served-using :narrow-mouthed-jug]
     [:stork :can-not-eat-food-served-using :shallow-plate]}})

The full details of this representation are explained in our previous work. Here we only briefly summarize its main elements. The above mentioned example defines a knowledge base containing a set of facts that describe the "The Fox and The Stork" story. Each fact is a triple of a subject, predicate and object. These elements are all represented by an URI, but for human readability and use as well as ease of use in Clojure, they are identified by (namespaced) keywords. The context map, containing the keywords for the namespaces used, allows us to transform the namespaced keywords into full URIs when required. Of course there are more elements possible in Linked Data/RDF and in this representation as well, such as literal values, but this summary should suffice for this article.

Now in order to explain how this conversion is done, assuming no familiarity with Clojure or a similar language, two new concepts are required.

First, it is important to note that in the above example we bind the "The Fox and The Stork", Linked Data representation to the fox-and-stork-edn variable. Although this is not a necessity for creating the knowledge base, it allows us to reuse this definition from inside the code and in this article as well, without explicitly writing out the full representation each time.

The second concept that we make use is how Clojure functions are called to be executed. In Clojure invoking a function has the general form of (function-name param1 param2 ...). For example lets assume that the function for translation from our Clojure representation to Turtle is represented by conv/convert-to-turtle. Here conv is a shorthand for Aesopica's conversion namespace. Given this, the call to translate to a Turtle string representation of the Linked Data can be invoked by:

(conv/convert-to-turtle fox-and-stork-edn)

The resulting string representation shows the same Linked Data knowledge base in Turtle syntax:

"@base <http://www.newresalhaider.com/ontologies/aesop/foxstork/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

<fox> rdf:type <animal>.
<stork> rdf:type <animal>.
<fox> <gives-invitation> <invitation1>.
<invitation1> <has-invited> <stork>.
<invitation1> <has-food> <soup>.
<invitation1> <serves-using> <shallow-plate>.
<stork> <gives-invitation> <invitation2>.
<invitation2> <has-invited> <fox>.
<invitation2> <has-food> <crumbled-food>.
<invitation2> <serves-using> <narrow-mouthed-jug>.
<fox> <can-eat-food-served-using> <shallow-plate>.
<fox> <can-not-eat-food-served-using> <narrow-mouthed-jug>.
<stork> <can-eat-food-served-using> <narrow-mouthed-jug>.
<stork> <can-not-eat-food-served-using> <shallow-plate>."

This representation has a similar form to our Clojure based notation. A set of facts is represented and prefixes and/or a base prefix is used, to enable easy reading and writing of the triples.

Now let us look at some other formats and conversions.

TriG is an extension of the Turtle format for enabling "named graphs". Now this is a topic of a previous article but here it is suffice to say that by associating a set of facts with a specific graph, we enable the easy adding of metadata to these facts. To show this conversion, we use an example that uses this notion:

(def fox-and-stork-named-graph-edn
  {::aes/context
   {nil "http://www.newresalhaider.com/ontologies/aesop/foxstork/"
    :rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    :time "http://www.w3.org/2006/time#"}
   ::aes/facts
   #{[:fox :rdf/type :animal]
     [:stork :rdf/type :animal]
     [:fox :gives-invitation :invitation1 :dinner1]
     [:invitation1 :has-invited :stork :dinner1]
     [:invitation1 :has-food :soup :dinner1]
     [:invitation1 :serves-using :shallow-plate :dinner1]
     [:stork :gives-invitation :invitation2 :dinner2]
     [:invitation2 :has-invited :fox :dinner2]
     [:invitation2 :has-food :crumbled-food :dinner2]
     [:invitation2 :serves-using :narrow-mouthed-jug :dinner2]
     [:invitation1 :serves-using :narrow-mouthed-jug :dinner2]
     [:dinner1 :time/before :dinner2]}})

As one can see, here we simply extend our triple based representation of facts to include either triples or quads. In a quad the last element is the graph name identifier of the graph the fact is a member of.

Translating this representation to TriG can be done by:

(conv/convert-to-trig fox-and-stork-named-graph-edn)

Which results in the following string representation that is TriG formatted:

{ <http://www.newresalhaider.com/ontologies/aesop/foxstork/fox>
        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>  <http://www.newresalhaider.com/ontologies/aesop/foxstork/animal> .
<http://www.newresalhaider.com/ontologies/aesop/foxstork/dinner1>
        <http://www.w3.org/2006/time#before>  <http://www.newresalhaider.com/ontologies/aesop/foxstork/dinner2> .
<http://www.newresalhaider.com/ontologies/aesop/foxstork/stork>
        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>  <http://www.newresalhaider.com/ontologies/aesop/foxstork/animal> .
}

<http://www.newresalhaider.com/ontologies/aesop/foxstork/dinner2> {
    <http://www.newresalhaider.com/ontologies/aesop/foxstork/invitation2>
            <http://www.newresalhaider.com/ontologies/aesop/foxstork/serves-using>  <http://www.newresalhaider.com/ontologies/aesop/foxstork/narrow-mouthed-jug> ;
            <http://www.newresalhaider.com/ontologies/aesop/foxstork/has-food>  <http://www.newresalhaider.com/ontologies/aesop/foxstork/crumbled-food> ;
            <http://www.newresalhaider.com/ontologies/aesop/foxstork/has-invited>  <http://www.newresalhaider.com/ontologies/aesop/foxstork/fox> .
    <http://www.newresalhaider.com/ontologies/aesop/foxstork/stork>
            <http://www.newresalhaider.com/ontologies/aesop/foxstork/gives-invitation>  <http://www.newresalhaider.com/ontologies/aesop/foxstork/invitation2> .
    <http://www.newresalhaider.com/ontologies/aesop/foxstork/invitation1>
            <http://www.newresalhaider.com/ontologies/aesop/foxstork/serves-using>  <http://www.newresalhaider.com/ontologies/aesop/foxstork/narrow-mouthed-jug> .
}

<http://www.newresalhaider.com/ontologies/aesop/foxstork/dinner1> {
    <http://www.newresalhaider.com/ontologies/aesop/foxstork/fox>
            <http://www.newresalhaider.com/ontologies/aesop/foxstork/gives-invitation>  <http://www.newresalhaider.com/ontologies/aesop/foxstork/invitation1> .
    <http://www.newresalhaider.com/ontologies/aesop/foxstork/invitation1>
            <http://www.newresalhaider.com/ontologies/aesop/foxstork/has-food>  <http://www.newresalhaider.com/ontologies/aesop/foxstork/soup> ;
            <http://www.newresalhaider.com/ontologies/aesop/foxstork/serves-using>  <http://www.newresalhaider.com/ontologies/aesop/foxstork/shallow-plate> ;
            <http://www.newresalhaider.com/ontologies/aesop/foxstork/has-invited>  <http://www.newresalhaider.com/ontologies/aesop/foxstork/stork> .
}

There are a number of differences in representation from the above TriG output to he Clojure representation, but also from the previous Turtle output. Probably one of the most apparent is that in this output no prefixes are used: URIs are all written out fully. Both Turtle and TriG are flexible in whether they abbreviate URIs with prefixes or not. This is completely left up to the author, on in this case the specific way the conversion has been implemented. Another difference is how graphs are identified. Instead of using a quad like formatting for denoting the graph to which each fact belongs to they are grouped together. For example in the form of: <http://www.newresalhaider.com/ontologies/aesop/foxstork/dinner2> { ... }, all the facts inside the curly braces belong to the <http://www.newresalhaider.com/ontologies/aesop/foxstork/dinner2> graph. Finally a somewhat similar construction is used to abbreviate a group of triples that all use the same object. Instead of writing each fact out fully, "predicate-lists" are used to match a single subject with a series subject and object pairs. This is quite a nice feature, and something similar is definitely on the list of future improvements to the Clojure notation, although care must be taken that such shorthands can make the definition a bit more complex.

Speaking of complexity, an interesting format created with the purpose of being very simple is NQUADS. This is a straightfoward, line based syntax where each fact is represented by a single line. It is actually an extension of the N-Triples format, with support added for handling named graphs. The conversion of our named graph example using the function invocation:

(conv/convert-to-nquads fox-and-stork-named-graph-edn)

would give us the following example:

<http://www.newresalhaider.com/ontologies/aesop/foxstork/fox> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.newresalhaider.com/ontologies/aesop/foxstork/animal> .
<http://www.newresalhaider.com/ontologies/aesop/foxstork/dinner1> <http://www.w3.org/2006/time#before> <http://www.newresalhaider.com/ontologies/aesop/foxstork/dinner2> .
<http://www.newresalhaider.com/ontologies/aesop/foxstork/stork> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.newresalhaider.com/ontologies/aesop/foxstork/animal> .
<http://www.newresalhaider.com/ontologies/aesop/foxstork/invitation2> <http://www.newresalhaider.com/ontologies/aesop/foxstork/serves-using> <http://www.newresalhaider.com/ontologies/aesop/foxstork/narrow-mouthed-jug> <http://www.newresalhaider.com/ontologies/aesop/foxstork/dinner2> .
<http://www.newresalhaider.com/ontologies/aesop/foxstork/invitation2> <http://www.newresalhaider.com/ontologies/aesop/foxstork/has-food> <http://www.newresalhaider.com/ontologies/aesop/foxstork/crumbled-food> <http://www.newresalhaider.com/ontologies/aesop/foxstork/dinner2> .
<http://www.newresalhaider.com/ontologies/aesop/foxstork/invitation2> <http://www.newresalhaider.com/ontologies/aesop/foxstork/has-invited> <http://www.newresalhaider.com/ontologies/aesop/foxstork/fox> <http://www.newresalhaider.com/ontologies/aesop/foxstork/dinner2> .
<http://www.newresalhaider.com/ontologies/aesop/foxstork/stork> <http://www.newresalhaider.com/ontologies/aesop/foxstork/gives-invitation> <http://www.newresalhaider.com/ontologies/aesop/foxstork/invitation2> <http://www.newresalhaider.com/ontologies/aesop/foxstork/dinner2> .
<http://www.newresalhaider.com/ontologies/aesop/foxstork/invitation1> <http://www.newresalhaider.com/ontologies/aesop/foxstork/serves-using> <http://www.newresalhaider.com/ontologies/aesop/foxstork/narrow-mouthed-jug> <http://www.newresalhaider.com/ontologies/aesop/foxstork/dinner2> .
<http://www.newresalhaider.com/ontologies/aesop/foxstork/fox> <http://www.newresalhaider.com/ontologies/aesop/foxstork/gives-invitation> <http://www.newresalhaider.com/ontologies/aesop/foxstork/invitation1> <http://www.newresalhaider.com/ontologies/aesop/foxstork/dinner1> .
<http://www.newresalhaider.com/ontologies/aesop/foxstork/invitation1> <http://www.newresalhaider.com/ontologies/aesop/foxstork/has-food> <http://www.newresalhaider.com/ontologies/aesop/foxstork/soup> <http://www.newresalhaider.com/ontologies/aesop/foxstork/dinner1> .
<http://www.newresalhaider.com/ontologies/aesop/foxstork/invitation1> <http://www.newresalhaider.com/ontologies/aesop/foxstork/serves-using> <http://www.newresalhaider.com/ontologies/aesop/foxstork/shallow-plate> <http://www.newresalhaider.com/ontologies/aesop/foxstork/dinner1> .
<http://www.newresalhaider.com/ontologies/aesop/foxstork/invitation1> <http://www.newresalhaider.com/ontologies/aesop/foxstork/has-invited> <http://www.newresalhaider.com/ontologies/aesop/foxstork/stork> <http://www.newresalhaider.com/ontologies/aesop/foxstork/dinner1> .

As one can see this format does not use prefixes: each fact is a triple or a quad on a single line ending with a dot, with each element URI written out fully. This way of writing facts is similar to the Clojure based notation, with main change that the Clojure notation does use prefixes for URI abbreviation. This simplicity contrasts with the flexibility of the Turtle format, which can be more terse, but more complex to parse and generate. This also shows that a separate N-Triples converter is not really needed. As long as the original knowledge bases does not use any named-graphs the result will be the same as with N-Triples.

The final format that we aim to convert to is JSON-LD. This is a format based on the JavaScript Object Notation JSON, which allows for very easy interoperability with JSON based tools.

Converting can be done with the following invocation:

(conv/convert-to-json-ld fox-and-stork-named-graph-edn)

resulting in the following JSON representation:

{
  "@graph" : [ {
    "@graph" : [ {
      "@id" : "http://www.newresalhaider.com/ontologies/aesop/foxstork/fox",
      "http://www.newresalhaider.com/ontologies/aesop/foxstork/gives-invitation" : {
        "@id" : "http://www.newresalhaider.com/ontologies/aesop/foxstork/invitation1"
      }
    }, {
      "@id" : "http://www.newresalhaider.com/ontologies/aesop/foxstork/invitation1",
      "http://www.newresalhaider.com/ontologies/aesop/foxstork/has-food" : {
        "@id" : "http://www.newresalhaider.com/ontologies/aesop/foxstork/soup"
      },
      "http://www.newresalhaider.com/ontologies/aesop/foxstork/has-invited" : {
        "@id" : "http://www.newresalhaider.com/ontologies/aesop/foxstork/stork"
      },
      "http://www.newresalhaider.com/ontologies/aesop/foxstork/serves-using" : {
        "@id" : "http://www.newresalhaider.com/ontologies/aesop/foxstork/shallow-plate"
      }
    } ],
    "@id" : "http://www.newresalhaider.com/ontologies/aesop/foxstork/dinner1",
    "http://www.w3.org/2006/time#before" : {
      "@id" : "http://www.newresalhaider.com/ontologies/aesop/foxstork/dinner2"
    }
  }, {
    "@graph" : [ {
      "@id" : "http://www.newresalhaider.com/ontologies/aesop/foxstork/invitation1",
      "http://www.newresalhaider.com/ontologies/aesop/foxstork/serves-using" : {
        "@id" : "http://www.newresalhaider.com/ontologies/aesop/foxstork/narrow-mouthed-jug"
      }
    }, {
      "@id" : "http://www.newresalhaider.com/ontologies/aesop/foxstork/invitation2",
      "http://www.newresalhaider.com/ontologies/aesop/foxstork/has-food" : {
        "@id" : "http://www.newresalhaider.com/ontologies/aesop/foxstork/crumbled-food"
      },
      "http://www.newresalhaider.com/ontologies/aesop/foxstork/has-invited" : {
        "@id" : "http://www.newresalhaider.com/ontologies/aesop/foxstork/fox"
      },
      "http://www.newresalhaider.com/ontologies/aesop/foxstork/serves-using" : {
        "@id" : "http://www.newresalhaider.com/ontologies/aesop/foxstork/narrow-mouthed-jug"
      }
    }, {
      "@id" : "http://www.newresalhaider.com/ontologies/aesop/foxstork/stork",
      "http://www.newresalhaider.com/ontologies/aesop/foxstork/gives-invitation" : {
        "@id" : "http://www.newresalhaider.com/ontologies/aesop/foxstork/invitation2"
      }
    } ],
    "@id" : "http://www.newresalhaider.com/ontologies/aesop/foxstork/dinner2"
  }, {
    "@id" : "http://www.newresalhaider.com/ontologies/aesop/foxstork/fox",
    "@type" : "http://www.newresalhaider.com/ontologies/aesop/foxstork/animal"
  }, {
    "@id" : "http://www.newresalhaider.com/ontologies/aesop/foxstork/stork",
    "@type" : "http://www.newresalhaider.com/ontologies/aesop/foxstork/animal"
  } ]
}

The biggest benefit of this format is the compatibility with JSON based tools and techniques. Regular JSON parsers, encoders and other tooling will just work, giving the format a very wide reach. Similarly to this our Clojure based approach uses EDN as its basis. This is a subset of Clojure, notably its notation of data values, and is used by Datomic and others as a data transfer format.

To summarize, we have seen how converting Linked Data from the Clojure representation to various other formats using the Aesopica library is just a function invocation away. We have also looked at some of the differences between various syntaxes, notably the benefits that they provide: Turtle/TriG offers a lot of flexibity and shorthands for reading and writing, N-Quads simplicity of notation, and JSON-LD compatibility with an existing and well used standard. The Clojure representation is aimed creating a new, and hopefully interesting blend. It makes use of prefixes for easy reading and writing by human users, similarly to what is possible in Turtle. It has the simplicity of fact representation as triples and quads, like in N-Quads. Finally it uses a common, albeit not nearly as widespread, standard a basis so it can make use of EDN based tooling.

One interesting element, that the Turtle and Trig formats provide, is various short-hands for reading and writing. We believe this is a very useful feature, but of course the trade-offs of the shorthands versus the simplicity of notation must be taken into account. The format of which such shorthand will take shape, is therefor the topic for another article.

Note that previous articles in this series can be also be found on this site:

As always, the functionality detailed in these articles can be found in the Aesopica library for using Clojure to write Linked Data.