Lisp bytecode compiler | Alessandro Scandone

This project consists in an implementation of a lisp dialect. There are different approaches when implementing a programming language:

The naive one, writing a program that directly interprets the AST (the data structure that represents a given program) and returns the result. It's the closest approach to operational semantics but has a huge overhead.
Precompiling it into an intermediate language, and execute it via a virtual machine. This approach is used (among the others) by Java (using the well known JVM), CPython, Erlang.
Emitting native code. This is the most performant approach but is extremely complex and has to be re-implemented for each architecture (unless technologies like LLVM are used)

The approach I used is the second. In particular, the kind of virtual machine used is a stack machine.

Using such a machine, the operation 10 + 20 would be encoded in the following operation codes:

PUSH 10
PUSH 20
ADD

which, in turn, would be executed, with the following steps:

Initially, the stack is empty first step

First, the value 10 is fetched and pushed into the stack incrementing the instruction pointer so that it points to the following instruction.

second step Then, the value 20 is fetched and pushed into the stack.

third step Then, the ADD opcode, pops the last two values into the stack, and pushes their sum.

fourth step

The opcodes are usually encoded as bytecode instructions in order to more efficiently pack memory.

There are many different opcodes in order to be able to implement all the language constructs, such as conditional jumps, constants lookups, function calls, and any other kinds of primitives.

The language

Here's a Fibonacci implementation - the hello world of functional languages - using the implemented language:

(defun fibonacci (n)
  (cond
    (eq? 0 n) 1)
    (eq? 1 n) 1)
    (otherwise (+
                  (fibonacci (- n 1))
                  (fibonacci (- n 2)))))

This program can actually be seen as a list, whose the first element is the symbol defun, the second is the symbol fibonacci, and the third is itself a list.

This property is often referred as homoiconicity, and allows the language to define and use powerful metaprogramming constructs called macros. Unlike languages like C, that simply allow one to create some code "templates", lisp has the power of the whole language to extend its own semantics. For example, here's the definition of the defun construct:

(defmacro defun (name params &rest body)
  (list ’def name
    (cons ’lambda (cons params body))))

Using this macro, the following syntax:

(defun add-1 (x) (+ 1 x))

Is transformed to primitive constructs during compilation time:

(def add-1 (lambda (x) (+ 1 x)))

In order to facilitate the reading, it's often used a template-like syntax

(defmacro defun (name params &rest body)
  `(def ,name (lambda ,params ,@body)))

But, once again, this is simply syntax sugar for regular lisp syntax:

(defmacro defun (name params &rest body)
  (backquote
    (def (unquote name)
      (lambda (unquote params)
        (unquote-splicing body)))))

And backquote, unquote, unquote-splicing are themselves macros.

Wrapping up

To recap all the above, here are the main steps the interpreter executes:

read the source code and parse it to s-expressions
interpret the parsed AST and compile it emitting the needed opcodes
- while doing this, expand the macros using step 3 and repeat step 2
run the outputted opcodes via the virtual machine