Dust is a functional, dynamically typed language with a minimal core AST. You can read more about its features in the repo or in the language documentation. Here, instead, I'm going to focus on the implementation details and the interesting problems behind it.
Dust is implemented using the same interpreter architecture described here, but using Rust instead.
TL;DR: a bytecode interpreter compiles the code to a pseudo-assembly representation, and then executes it via a virtual machine.
There are few reasons why Rust is a great fit for a similar task:
- Expressive and pragmatic type system makes complex tasks comprehensible. Working with data structures such as ASTs wouldn't be feasible otherwise
- The error handling is type-safe (since errors are just data) but Rust's constructs (like
?
) make it very pragmatic to handle all the edge cases - No garbage collection means that you can have a low-level control if needed - and implement garbage collection yourself
- Great macro-powered libraries to create a CLI
- Rust allows to target the web browser using webassembly as compilation target
- And of course, great speed. Not crucial for the compiler but crucial for the runtime
Garbage collection
Since Rust itself is not garbage-collected, a custom implementation was needed. A classic choice is the mark and sweep algorithm. However that's what's called a "stop the world" garbage collection algorithm: whenever the garbage collection phase begins, every other calculation is blocked. That can result in unpredictable runtime performances. Since Rust offers it as a safe abstraction, a natural choice is the reference counting algorithm - that prevents the problems above. A downside of this is that it can lead to memory leaks if there are cycles. Due to Dust's semantics, however, this cannot happen.
Formatting
Believe it or not, pretty-printing a program is a particularly difficult task. Bob Nystrom - one of the dart lang contributors - mentions it as the hardest program he's ever written. I implemented a non-recursive version of the algorithm proposed in Christian Lindig's "Strictly Pretty" paper, that defines an algebra to formalize the pretty-printing process. The paper is, in turn, a strict implementation of Philip Wadler "A prettier printer's" paper and its the approach used by Elixir's Inspect.Algebra module or from the well-known Javascript's de-facto formatter Prettier.
Parsing
Parsing is implemented via the Pratt parsing algorithm. You can read a very approachable explanation in this blog post (from one of the main Rust analyzer and the Rust intellij plugin contributors). Other beginner-friendly resources are Bob Nystrom's crafting interpreters and Thorsten Ball's Writing an interpreter in Go.
Tail call optimization
In a functional language whose the only looping mechanism is recursion, is crucial to be able to tail-recur without oveflowing the call stack. Due to the simplicity of the AST (stripped of the syntax sugar), it's easy to perform static analysis to check when a call is in tail position.
Here's the complete set of rules used by the static analysis algorithm
fn arg_1, ..., arg_n { body }
- body is in tail position
f(x_1, ..., x_n)
- Neither
f
norx_i
(for every i) are in tail position - if the node is in tail position, this is considered a tail call
- Neither
x && y
x
is not in tail positiony
is in tail position when the node is in tail position
{ a; b }
a
is not in tail positionb
is in tail position when the node is in tail position- subsequent
;
expressions are just sugar nested;
expressions
{ let name = value; body }
value
is not in tail positionbody
is in tail position when the node is in tail position- subsequent
let
expressions are just sugar nested let expressions
if b { x } else { y }
b
is not in tail positionx
andy
are in tail position when the node is in tail position
Conclusion
Implementing a programming language is a beatiful problem, filled of interesting sub-problems to solve. The language is still at a very early POC stage, and I hope to update this page as I implement new features.