Spec-stacular Spider-Man

2018-04-29

Spider-Man is one of the most iconic heroes of the Marvel universe. Created by Stan Lee and Steve Ditko, Spider-Man is a regular teenager named Peter Parker, who due to being bitten by a radio-active spider, gains abilities such as the proportional strength of a spider, wall crawling and a spider sense to detect upcoming danger. One of the biggest draws of Spider-Man that although he is a superhero and fought various villains from cosmic beings to petty criminals, he also had to deal with regular everyday problems, such as money issues, school life and the pressure of a job.

The Spectacular Spider-Man © Marvel Entertainment

In software there are also everyday problems which one has to tackle before one can defeat the villains of the domain at hand. One of these everyday problems is the issue of data validation. Data validation is the process of ensuring that the elements of the data are correct. This process has to be done in pretty much all domains when working with actual data. Consider the financial domain where a financial product can only be made available if the right requirements are fulfilled in the request. If the request is not written correctly then the request needs to be denied. In the legal and regulatory domains certain information that is required for a law must be fulfilled, otherwise costly corrections or fines can follow. Another good example is the clinical domain, where the a patients data needs to be transferred to an application. Here is it essential that this data fulfills the requirement for requesting a clinical procedure or a medication, as any mistake can lead to huge negative impact on the health of the patient.

One relatively recent tool that can be used to solve this problem is the clojure.spec library in the Clojure programming language. In this article we aim to explain, alongside Spider-Man, how these specs can be used to tackle the data validation problem in a spectacular way. As this library relies on the Clojure language some knowledge of Clojure is needed. In order to make this article understandable to those without such prior expertise we introduce some aspects of Clojure. In particular we focus on two features of it: the way information (data) is represented and the fact that it is a Lisp.

In Clojure data is represented with relatively few elements that are combined together. Take for example a scenario where we want to create a profile of Spider-Man, as taken from the Marvel wiki entry on Spider-Man.

The full name of Spider-Man can be represented in text form as a string. Like in many other languages the text is placed in between quotation marks.

"Peter Benjamin Parker"

For the numbers to represent his relative power in the Marvel universe, we use natural numbers (we leave the concepts and issues surrounding very large or floating point numbers out in this article). In case of Spider-Man his durability is 5:

5

Of course having just a value of the name and the durability of Spider-Man just floating around makes the representation somewhat incomplete, as they are not attached to the concepts of "name" or "durability". Just like how Spider-Man needs buildings to sling off of, we need a representation that links the values with what they represent. In Clojure, keywords are often used for this purpose.

:real-name
:spider-man-spec.core/name
::durability

Keywords are symbolic identifiers. Think of them as symbols, much like one would use a string, but with some special powers attached. They are text prefaced by ":", as it can be seen in the keyword :real-name. They have the ability to be namespace-qualified, such as :spider-man-spec.core/name which indicates that this is the keyword in the spider-man-spec.core namespace. Namespaces are what allows us to modularize our data and code, by grouping them under together a single identifier. In our case this is spider-man-spec.core. This namespacing ensures that our definition of the concepts of "name", "real-name", "durability", etc. can remain distinct from any other use of similar concepts. Finally, when writing internally to the library which uses the namespace, or when aliasing to it, we can just shorten the keyword with "::", such as in ::durability.

Keywords come with some nice implementation details, such as fast equality checks and some other powers we will show in the future. This makes them the preferred keys in data-structures such as maps. And speaking of maps they allow the description of information in key-value pairs, as written between curly braces in the small example below.

  {::name "Spider-Man"
   ::real-name "Peter Benjamin Parker"}

The curly backets around the pairs express the keys and values in a map in Clojure. In the above example ::name and ::real-name are the key and value pairs for "Spider-Man" and "Peter Benjamin Parker" respectively.

Maps are just one of the ways one can describe a collection of elements. You also have sets, collections in which each element is unique. This can be done with a hashtag and some curly brackets "#{}". In the example below we list the current and former affiliations of Spider-Man.

{::current-affiliations #{"Avengers"}
 ::former-affiliations #{"Secret Defenders" "New Fantastic Four" "The Outlaws"}}

Note that how sets are used within maps to represent this knowledge. This is actually a common way to represent knowledge in Clojure: you combine all the various data representations directly. This way you can have a list containing maps, with keywords as keys and values that contain maps and strings, where the maps contain numbers, etc. You have these data-structures in pretty much all commonly used programming languages. Where Clojure differs from many is that it does not put (almost any) sugaring or abstraction on top.

Spider-Man swinging around the city. © Marvel Studios

Just as Spider-Man is often at his best when he is just being "plain old Spidey", having data represented this way has some nice advantages. The biggest is simplicity. Instead of learning to work with specific wrappers, objects, prototypes, etc on top of this data, that can differ between applications and libraries, it is enough to learn how to handle and manipulate maps, list, sets once. This knowledge can be then reused in any domain, and frees up the attention of the programmer to focus on the domain problem, and not the exact way the data was wrapped up in a library.

This of course also means that a system, such as clojure.spec, that aims at data validation in Clojure, has to handle the above-mentioned style of composition well. But before we get ahead of ourselves lets finish up by providing the profile of Spider Man.

(def spider-man-profile
  {::name "Spider-Man"
   ::real-name "Peter Benjamin Parker"
   ::identity ::secret
   ::affiliations
                   {::current-affiliations #{"Avengers"}
                    ::former-affiliations #{"Secret Defenders" "New Fantastic Four" "The Outlaws"}}
   ::power-grid {
                 ::durability 3
                 ::energy 4
                 ::fighting 5
                 ::intelligence 4
                 ::speed 5
                 ::strength 4
                 }
   }
  )


(def vulture-profile
  {::name "Vulture"
   ::real-name "Adrian Toomes"
   ::identity ::publicly-known
   ::affiliations
                   {::current-affiliations {}
                    ::former-affiliations #{"Sinister Twelve" "Sinister Six"}}
   ::power-grid {
                 ::durability 4
                 ::energy 3
                 ::fighting 4
                 ::intelligence 4
                 ::speed 5
                 ::strength 3
                 }
   }
  )

(def spider-man-characters [spider-man-profile vulture-profile])

Oh no, our Spider Senses should be tingling. It is Vulture, that has shown up in our list of Spider-Man characters. In addition we just introduced some new elements in our example that need some explanation for readers new to Clojure.

Uh oh, Vulture must be up to no good if he shows up here. © Marvel Studios

The first is the use of square brackets [], which indicate a list. This is a collection of elements, in this case of spider-man-profile and vulture-profile, that unlike a set, can have multiples of the same element.

The other new type of element we use is the form of using parentheses along side def as in (def spider-man-characters ...). Expressions of these type, called symbolic expressions, or s-expressions for short, are a characteristic of the Lisp family of languages to which Clojure belongs to. In a Lisp, parts of the program are either atoms, such as 5, "Peter Benjamin Parker", true, or an s-expression where the first element between parens is a function and the rest are parameters. For example (+ 1 3). While atoms evaluate to themselves, the s-expressions evaluate to a function with the given parameters. In the case of (+ 1 3) they should evaluate to 4. You can also nest s-expressions, such as (- (+ 1 3) 2), which will evaluate to 2.

You might be thinking, "Wait, if everything is either an atom or an s-expression, what kind of villainous things are those strange brackets that one has to use to create a set, list or map!". For all the simplicity in Clojure, it does make use of some syntactical sugar. Lists can be written [spider-man-profile vulture-profile] as a shorthand for the s-expression (list spider-man-profile vulture-profile). Similar functions exist for maps and sets as well.

Much like Spider-Man, who for all his powers still has to struggle with juggling a school and a job and has to make practical decisions, Clojure has to make them as well. In this case because certain things, such as maps, sets and list are used so often, it uses a shorter syntax for creating them. This does makes the language slightly more complex, but in the author's view, it pays off.

Another matter of practicality of course is that while we can nest the two profiles directly into a list, we can create variables for them to associate. The def function does exactly this, and it also ensures they become part of the current namespace. For example, if the current namespace is spider-man-spec.core then a def of vulture-profile can be referred to as spider-man-spec.core/vulture-profile from other namespaces, and simply vulture-profile in the current namespace. This allows us to break up the overall data in smaller parts to use.

Now we finally described the profiles of both Spider-Man and Vulture, but are they correct? The library of clojure.spec uses the notion of a spec for this. A spec is simply a function on a single parameter that returns a truthy value (in most cases a true if the spec holds, false if the spec does not hold).

In essence this allows for many existing functions to be used as specs. For example the already existing function string? checks whether a particular value is a string or not.

In order to check whether a value is valid for a particular spec we can use the s/valid? function. Here the s stands the namespace of the spec library clojure.spec.alpha, so by calling s/valid? we are calling the valid? function of this particular namespace.

(s/valid? string? "Spider-Man")

The above function call will checking if "Spider-Man" is indeed a string, and return true if it is. On the other hand if we check whether a number is valid for this spec, using (s/valid? string? 6) we instead get false returned.

Another way to use a spec, is to explain why a value is wrong. For example, we can call the function explain-data on with the spec and an incorrect value, to get a map back with an explanation. The function call:

(s/explain-data string? 6)

would result in the map:

{:val 6 :predicate :clojure.spec.alpha/unknown}

Now the above example clearly shows the value on which the spec has failed, but it denotes the predicate as unknown with :clojure.spec.alpha/unknown. The solution to this is to provide a name for the spec, which the system can use to pin point if things fail. We can register any spec using the function s/def. For example the functions:

(s/def ::name string?)

(s/def ::real-name string?)

will register the two specs under the keys :name and :real-name in the current namespace, i.e.: under spider-man-spec.core/name and spider-man-spec.core/name respectively.

Now if we would aim to explain why the spec :real-name does not allow the value 6, it would return the explanation:

{:val 6 :predicate :spider-man-spec.core/real-name}

where the predicate now identifies the spec that was not fulfilled.

Specs can also be created in other ways. For example a set of values indicating the correct values can be used as a spec.

(s/def ::identity #{::secret ::publicly-known})

The above code defines a spec for identity as having two possible values: either ::secret or ::publicly-known.

Specs can also be defined for collections as well. The specs for current- and former affiliations:

(s/def ::current-affilications (s/coll-of string? :kind set?))

(s/def ::former-affilications (s/coll-of string? :kind set?))

These specs describe that that both current- and former have to be sets of strings. The affiliations part of a profile is actually map containing both current- and former affiliations. This is defined as the spec:

(s/def ::affiliations (s/keys :req [::current-affiliations] :opt [::former-affiliations]) )

which makes it requirement for affiliations to contain current-affiliations, but any former affiliations are optional.

For checking whether Spider-Man has a valid profile we can use the s/valid? function again. We use the following code to do just that:

    (let [spider-man-affiliations (:spider-man-spec.core/affiliations spider-man-profile)]
      (s/valid? :spider-man-spec.core/affiliations spider-man-affiliations))

The let form is new here, but what it essentially does is deconstructing the while spider-man-profile and associating its affiliations temporarily the spider-man-affiliations. This allows us to use a shorthand when calling functions, instead of writing out everything in a single line.

While this value is also valid according to the spec, as the spec and the value we are checking gets more complex, it could also be useful to gather the exact value that has passed the spec. In such cases we can use s/conform to gather these. The call:

(s/conform :spider-man-spec.core/affiliations spider-man-affiliations)

Returns the map:

#:spider-man-spec.core{:current-affiliations #{"Avengers"}, :former-affiliations #{"The Outlaws" "Secret Defenders" "New Fantastic Four"}}

Note that this is a namespaced map, which is a feature that allows us to refer to the keywords inside a map more efficiently, instead of writing them all out in each case.

The final aspect of each profile, the power grid, is also something that can be given a spec. Each of the powers can only take a whole number value from 1 until 7. We can specify this with the follow spec:

(s/def ::power-value (s/and pos-int? #(>= % 1) #(<= % 7)))

(s/def ::durability ::power-value )
(s/def ::energy ::power-value )
(s/def ::fighting ::power-value )
(s/def ::intelligence ::power-value )
(s/def ::speed ::power-value )
(s/def ::strength ::power-value )

Note that we use the function s/and to combine three specs: that the value should be a positive integer, greater than or equal to 1 and less or equal to 7. Such a combined spec can then be (re-)used like any other.

We can combine all the previous specs together to specify a profile:

(s/def ::profile (s/keys :req [::name ::real-name ::identity ::affiliations ::power-grid] ) )

For this spec, both Spider-Man and Vulture are valid profiles. However, this is a problem, as it does not allow us to differentiate between a hero and a villain. Of course we do not want to get Vulture get into the same places as Spider-Man can. We must fight him, much like Spider-Man, but in our own way: by creating a spec for which the Spider-Man profile is a valid value, but not that of Vulture.

Spider-Man vs Vulture © Marvel Entertainment

While we can make a separate requirement that only persons with the name "Spider-Man" can fulfill our new "hero-spec" this might be too restrictive. Instead we are going to spec an Avenger profile, so Spider-Man and all his friends can join in, while villains such as Vulture are kept out.

The requirement for an Avenger in our system, is that any-one with the current affiliation of "Avengers" is an avenger. We can describe this requirement as a spec, using a function defined for this:

(defn is-avenger? [profile]
  (contains? (::current-affiliations (::affiliations profile) ) "Avengers")
  )

(s/def ::avenger-profile (s/and ::profile is-avenger? ) )

Now we can check whether a profile is a valid Avenger, which will be true for Spider-Man but not for Vulture. Finally, we can get rid of this villain that showed up in our tutorial. In addition, this spec will also make sure that all current members of the Avengers be valid, so Spider-Man can fight freely alongside of them.

Our spec answers the question posed in this cover: Spider-Man is indeed an Avenger © Marvel Entertainment

So there we have it, a brief look at using the spec library to validate data. There are many things that I have not touched, such as the ability to generate values based on the Spec, other ways to compose a spec, etc.

Nonetheless I hope this article gives a solid introduction, and maybe an interest to using the spec library, even if one does not have a Clojure or even a heavy programming background. The source code snippets are available at: Spider-Man-Spec.

If you have a data validation problem, by all means take a swing at it with the Spec library. I am convinced that the results you will get will be nothing short of spectacular.