[−][src]Crate inc
An Incremental scheme compiler
A tiny scheme to x86 asm compiler as described in the paper An Incremental Approach to Compiler Construction by Abdulaziz Ghuloum.
Where do I get started? 🕵️♀️
Read the first few sections of the paper to understand the premise.
Background Reading 📚
There is a lot of C, Rust and x86 assembly here and these are some good places to start learning them.
- x86 module documentation contains links to a few good x86 tutorials.
- How to C in 2016 is a pretty good C refresher.
- The Rust Programming language book is a good place to start learning rust.
This project also uses a lot of iterators, so Effectively Using Iterators In Rust might be useful as well
Misc
Micro blogs & lessons learned 🤷
1. Debugging with GDB
Debugging (occasionally wrong) generated assembly without a debugger is pretty hard and it is absolutely worth the effort getting familiar with gdb. GDB doesn't work on OSX despite the several dozens of blogs that claim otherwise and this project would be impossible without gdb. It is easier to setup remote debugging with docker than fight code signing on osx.
Build the image
$ docker build . -t inc:latest
Run the container in privileged mode and expose a port
$ docker run --rm -it --privileged -p 8080:8080 inc
Run the program you want to debug in the container and build the executable
/inc# echo "(let ((f (lambda (x) (+ x 1)))) (f 41))" | cargo run -q
Start a remote debugging session
/inc# gdbserver 127.0.0.1:8080 ./inc
Start GDB on the host machine with the custom .gdbinit
file
$ cat .gdbinit set startup-with-shell off target remote 127.0.0.1:8080
This should work with the CLI as well as Emacs
$ gdb Reading /inc/inc from remote target... warning: File transfers from remote targets can be slow. Use "set sysroot" to access files locally instead. Reading /inc/inc from remote target... Reading symbols from target:/inc/inc... Reading /lib64/ld-linux-x86-64.so.2 from remote target... Reading /lib64/ld-linux-x86-64.so.2 from remote target... Reading /lib64/5dfd7b95be4ba386fd71080accae8c0732b711.debug from remote target... Reading /lib64/.debug/5dfd7b95be4ba386fd71080accae8c0732b711.debug from remote target... Reading /usr/local/Cellar/gdb/8.3/lib/debug//lib64/5dfd7b95be4ba386fd71080accae8c0732b711.debug from remote target... Reading /usr/local/Cellar/gdb/8.3/lib/debug/lib64//5dfd7b95be4ba386fd71080accae8c0732b711.debug from remote target... Reading target:/usr/local/Cellar/gdb/8.3/lib/debug/lib64//5dfd7b95be4ba386fd71080accae8c0732b711.debug from remote target... 0x00007ffff7fd6090 in ?? () from target:/lib64/ld-linux-x86-64.so.2 (gdb)
2. All the different kind of functions
While implementing stdlib functions, I noticed that they belong to a few different levels - closely resembling the kind of privilege they have.
The low level primitives get access to everything, including the register allocation. The runtime functions know about the memory layout of objects. A scheme function is far more limited and can only see the high level functional constructs. When possible a function should be implemented in the highest level possible - prefer scheme over rust for safety and kind of a self referential check.
Primitives
These are things you really have to build into the core of the compiler and are
written in Rust. primitives::string::make
is a pretty good example since
inlining the string constants is not something you could do with scheme.
Sort of primitives
All the math! You don't really have to implement + and ** in Rust, but it allows the compiler to not treat them as function calls and emit a single efficient instruction immediately. I'd consider a compiler performing basic math during compilation as form of interpretation - inc doesn't do this, but is fairly trivial to implement.
Runtime
Functions like string-length
understand the memory layout of the objects and
is probably easiest done in C or ASM. Because of the currently odd 'everything
in stack' calling convention, this is written in asm instead of C, but must be
rewritten in C for simplicity once FFI works.
All syscalls and FFI probably belong here in the same level.
Stdlib
AFAIU there shouldn't be a difference b/w user defined functions and functions shipped as a stdlib implemented in scheme.
Modules
cli | Command line interface for inc |
compiler | Inc Compiler |
core | Core types shared by most of the program |
immediate | Runtime representation of typed scheme values |
lambda | Scheme functions |
parser | A scheme parser in nom. |
primitives | Scheme functions implemented within the compiler rather than the runtime. |
runtime | Runtime functions implemented in C or ASM |
strings | A string is a blob of UTF-8 encoded bytes prefixed with the length if it. |
x86 | A general purpose x86 library. |