Parsing S-Expressions with Kotlin

alt

I have been a long time student of electronics. I am still a rank amateur, but along the way I became interested in the EDA software used to design circuits and printed circuit boards. Things have come a long way since OrCad (and it is still around!), and nowadays I prefer to use KiCad.

Regardless, all EDA tools have more or less come to a consensus on what file formats to use. The first stage in any electronics project is the circuit design, and for that, EDA packages rely on collections of symbols, each of which represent a particular component. E.g. an integrated circuit is represented as a rectangle, with some lines coming out for each of the pins, and each of those pins is labelled in accordance with the IC’s datasheet.

The consensus on symbol files is to use a specific format that is based on s-expressions, which come from the days when keyboards did not have semicolons or curly braces.

An S-expression (mutatis mutandi in Kicad): - A symbol: That is just a bare string of characters - A number: Either an integer or a float. - A string: A string of characters in quotes. - A cons list: A parentheses-delimited list of… s-expressions.

The term cons list I repurposed from the Lisp language, which used S-expressions since it predates the invention of programming language syntax.

Let’s look at an example:

 (font 12.5 12.5)

The above is an 1) s-expression, in particular a 2) cons list containing three s-expressions: a symbol and two floats.

Computer myopia⌗

The first challenge in the game is that computers are near-sighted. At their very core, they can really consider one small thing at a time. In this case, a CPU can really only consider a single character at a time. That means that, for instance, in the example above we are fist only going to see the '(' character, then the 'f' character, and so on. We have to devise a strategy for making sense of that font expression one character at a time.

But what does it mean for us to “make sense” of the expression?

Ultimately, we want to produce a data structure that represents the data we read in. For our font example, it looks easy… we could do, for instance:

data class Font(xSize:Float, ySize:Float)

But now let’s consider a slightly more complex expression:

(effects
   (font 12.5 12.5)
   (hide yes)
)

… and it is clear that things are a bit more complicated: s-expressions can contain s-expressions themselves.

So, to start simple, which is after all the point of software engineering, let’s scale down what we mean by “make sense”.

For now, we can stick to the easy(er) cases of symbol, number and string, and let’s also detect when we have pesky parentheses to deal with. What we are doing here is converting a sequence of characters into a sequence of tokens, which are things that have a lexical meaning (a meaning derived from their “spelling”), and sometimes even a value.

Let’s see how we could implement this in Kotlin:

 sealed interface Token { // 1
     data object LPAREN: Token //2
     data object RPAREN: Token
     data class SYMBOL(name:String): Token
     data class STRING(value:String): Token
     sealed interface NUM { // 3
         companion object {
            fun new(value:String) = value.toNum() // 4
         }
         data class INT(value:Int): NUM
         data class FLOAT(value:Float): NUM
     }
 }

Some comments:

A sealed interface in Kotlin is like a Java enum, in that the compiler will complain if e.g. we do not check for all interface inheritors in a when statement, preventing issues if we forget to do so. It is unlike an enum since all members of an enum need to have the same properties.
A data object is like a regular object, but the compiler generates a toString() method for the object (in this case, it returns "LPAREN")
Sealed interfaces can be inherited by other sealed interfaces, turtles all the way down.
This is a rust-style constructor for Token.NUM from a String. We’ll take a look later, since it uses one of my favorite Kotlin features: function receivers.

Next we’ll look at how we can start turning our Strings into sequences of tokens.