The Great AI'Tuin

2025-04-12

The Turtle language, along with the Semantic Web, offers a powerful way to represent and work with knowledge across virtually any domain. Unfortunately, it is often overlooked in practice due to several reasons: limited tooling, unfamiliarity with Semantic Web concepts, and the fact that much knowledge still exists in less structured formats, such as natural language.

However, in the age of large language models (LLMs) this does not have to be the case anymore. We finally have a way to bridge that gap between unstructured language and structured knowledge. Turtle in particular is very well suited for the structured knowledge representation. In this article I will demonstrate how LLMs can be used to translate natural language texts into the Turtle format and why this might be beneficial. To make this fun, I will use some examples derived from the Discworld series of books by Terry Pratchett. The Discworld books, which I highly recommend, are a series of fantasy novels often with satirical themes set in Discworld. Discworld is flat, resting upon the back of four elephants who in turn stand on a gigantic spacefaring turtle, named the Great A'Tuin. It alludes to the World Turtle myth, which makes it a very apt topic to talk about when explaining the uses of the Turtle language.

A depiction of the Great A'Tuin that carries the elephants that carry Discworld. Due to the limitations of image generation only three of the four elephants that carry Discworld are shown, the fourth is on a coffee break. 😃

Before diving into examples, let’s briefly look at what the Turtle language actually is. Turtle (short for Terse RDF Triple Language) is a compact, readable format for expressing information as triples: subject–predicate–object statements. It’s part of the RDF (Resource Description Framework) family, designed for describing relationships between entities in a way that’s both machine readable and human friendly.

A triple might look like this:

:book1 dct:title "Small Gods" .

This reads as: book1 has the title "Small Gods". Of course in most cases we would like to describe multiple facts that we might be interested in, such as the genre of the book, a textual summary of it, etc, for which we use multiple triples. These triples could then create a rich knowledge graph of information that enables both humans and software to understand and reason about a domain.

In particular Turtle excels at:

šŸ“Œ Formalizing concepts (e.g., by description of the book we mean it having a title, a creator, etc)

šŸ”— Connecting related information (e.g., all books in a series)

šŸ” Supporting precise structured queries (e.g., ā€œfind all books belonging to the crime genreā€)

🧠 Enabling inference and integration with other data sources (e.g., use a commonly used definition of describing a book)

Now, let’s see how this works in practice with a Discworld based example. First we describe some information both with natural language and with Turtle and go over the differences. We start with the descriptions of three Discworld books in natural language, in this case generated by ChatGPT:

The Colour of Magic: "A bumbling wizard and a naive tourist embark on a chaotic journey across the Disc, in the book that kicks off the Discworld series." Guards! Guards!: "A ragtag city watch uncovers a dangerous conspiracy involving dragons and secret societies, all while stumbling toward reluctant heroism." Small Gods: "A forgotten god and his lone believer confront religious dogma, power, and philosophical questions in a deeply thoughtful standalone story."

Given the information in this article about the Discworld series of books and these description we can represent this information with the Turtle language as follows (also generated by an LLM from the previous text):

@prefix dct: <http://purl.org/dc/terms/> .
@prefix ex: <http://example.org/book/> .
@prefix schema: <http://schema.org/> .

ex:colour_of_magic a schema:Book ;
    dct:title "The Colour of Magic" ;
    dct:creator "Terry Pratchett" ;
    schema:isPartOf ex:discworld_series ;
    schema:description "A bumbling wizard and a naive tourist embark on a chaotic journey across the Disc, in the book that kicks off the Discworld series." ;
    schema:genre "fantasy", "parody", "adventure" .

ex:guards_guards a schema:Book ;
    dct:title "Guards! Guards!" ;
    dct:creator "Terry Pratchett" ;
    schema:isPartOf ex:discworld_series ;
    schema:description "A ragtag city watch uncovers a dangerous conspiracy involving dragons and secret societies, all while stumbling toward reluctant heroism." ;
    schema:genre "satire", "crime", "dragons", "power structures" .

ex:small_gods a schema:Book ;
    dct:title "Small Gods" ;
    dct:creator "Terry Pratchett" ;
    schema:isPartOf ex:discworld_series ;
    schema:description "A forgotten god and his lone believer confront religious dogma, power, and philosophical questions in a deeply thoughtful standalone story." ;
    schema:genre "faith", "philosophy", "belief", "satire" .

ex:discworld_series a schema:BookSeries ;
    dct:title "Discworld" ;
    dct:creator "Terry Pratchett" .

Now lets go over the elements of the Turtle representation here:

As mentioned before, each line in Turtle expresses a triple: a simple fact linking a subject to an object via a predicate (or property). For instance:

ex:colour_of_magic a schema:Book ;

This says that ex:colour_of_magic (our identifier for the book) is a schema:Book: a class from the Schema.org vocabulary. The a is shorthand for rdf:type, indicating class membership. This is a common pattern in RDF and one of the ways Turtle stays concise and readable. The Turtle syntax also has the nice feature is that if we would have a series of triples such as:

ex:discworld_series a schema:BookSeries ;
ex:discworld_series dct:title "Discworld" ;
ex:discworld_series dct:creator "Terry Pratchett" .

that have a common subject, we do not have to repeat ourselves and can use the shorthand:

ex:discworld_series a schema:BookSeries ;
    dct:title "Discworld" ;
    dct:creator "Terry Pratchett" .

The identifiers that make use of the prefixes are actually full URIs, such as http://example.org/book/colour_of_magic. In Turtle you can use prefixes to keep things short and tidy. The prefixes, and what they stand for, are given at the start of the Turtle definition:

@prefix dct: <http://purl.org/dc/terms/> .
@prefix ex: <http://example.org/book/> .
@prefix schema: <http://schema.org/> .

Here the ex prefix is something we devised for this example, but the other prefixes refer to vocabularies that already exist that we could use: Dublin Core and Schema.org. This allows us to not reinvent the wheel and make use of the fact that there already exist vocabularies for describing books, creators of things, etc. This is a big benefit for interoperability that you normally do not get out of the box with many other formalisms (such as JSON). Any tool already existing that uses these common vocabularies would be compatible with our small RDF description using these prefixes.

Speaking of tooling, as mentioned previously, the set of triples creates a knowledge graph of interconnected information within a domain. We can also visualize this as a graph, for example using the isSemantic RDF Visualizer:

A graph visualization of the Turtle description of the Discworld books.

The key aspect that Turtle enables, especially in a world of LLMs, is allows us to be very precise and explicit with the semantics that we are describing. For example if we would just say we have a description of the Discworld books, it could be very unclear exactly what is included: the description, author, date of publishing, genres, etc. If we instead have a Turtle representation of the information, even if generated initially by an LLM, both humans as well as software can precisely check what information we describe and what we do not.

This explicitness is what allows us to do very precise queries on the semantics. Using SPARQL, a query language for these type of knowledge graphs, we can find the book that has crime as part of its genre definition:

PREFIX schema: <http://schema.org/>
PREFIX dct: <http://purl.org/dc/terms/>

SELECT ?bookTitle WHERE {
  ?book schema:genre "crime" ;
        dct:title ?bookTitle .
}

This query would return the following when run on our knowledge graph:

"Guards! Guards!"

Such queries would work even if we have lot more of these book definitions, without the potential for hallucinations or inaccuracies that sometimes occur when using LLMs and natural language.

As one can see, the Semantic Web and Turtle format offer powerful ways to structure and connect information. The main drawback used to be that knowledge is often represented in natural language, or other less accessible formats. This made the benefits of such knowledge graphs hard to realize.

Now, large language models can act as bridges: translating rich, human-authored natural language into structured formats like Turtle. This opens up new opportunities for building precise, interoperable knowledge graphs from books, documentation, or even casual writing. It allows humans or even software systems, to make very precise corrections or queries on this graph, without fear of inaccuracies.

If you ever have the chance, give this approach using Turtle a go. It can feel quite magical!