[][src]Crate inc

An Incremental scheme compiler

A tiny scheme to x86 asm compiler as described in the paper An Incremental Approach to Compiler Construction by Abdulaziz Ghuloum.

Where do I get started? 🕵️‍♀️

Read the first few sections of the paper to understand the premise.

Background Reading 📚

There is a lot of C, Rust and x86 assembly here and these are some good places to start learning them.

This project also uses a lot of iterators, so Effectively Using Iterators In Rust might be useful as well

Misc

Micro blogs & lessons learned 🤷

1. Debugging with GDB

Debugging (occasionally wrong) generated assembly without a debugger is pretty hard and it is absolutely worth the effort getting familiar with gdb. GDB doesn't work on OSX despite the several dozens of blogs that claim otherwise and this project would be impossible without gdb. It is easier to setup remote debugging with docker than fight code signing on osx.

Build the image

$ docker build . -t inc:latest

Run the container in privileged mode and expose a port

$ docker run --rm -it --privileged -p 8080:8080 inc

Run the program you want to debug in the container and build the executable

/inc# echo "(let ((f (lambda (x) (+ x 1)))) (f 41))" | cargo run -q

Start a remote debugging session

/inc# gdbserver 127.0.0.1:8080 ./inc

Start GDB on the host machine with the custom .gdbinit file

$ cat .gdbinit

set startup-with-shell off
target remote 127.0.0.1:8080

This should work with the CLI as well as Emacs

$ gdb

Reading /inc/inc from remote target...
warning: File transfers from remote targets can be slow. Use "set sysroot" to access files locally instead.
Reading /inc/inc from remote target...
Reading symbols from target:/inc/inc...
Reading /lib64/ld-linux-x86-64.so.2 from remote target...
Reading /lib64/ld-linux-x86-64.so.2 from remote target...
Reading /lib64/5dfd7b95be4ba386fd71080accae8c0732b711.debug from remote target...
Reading /lib64/.debug/5dfd7b95be4ba386fd71080accae8c0732b711.debug from remote target...
Reading /usr/local/Cellar/gdb/8.3/lib/debug//lib64/5dfd7b95be4ba386fd71080accae8c0732b711.debug from remote target...
Reading /usr/local/Cellar/gdb/8.3/lib/debug/lib64//5dfd7b95be4ba386fd71080accae8c0732b711.debug from remote target...
Reading target:/usr/local/Cellar/gdb/8.3/lib/debug/lib64//5dfd7b95be4ba386fd71080accae8c0732b711.debug from remote target...
0x00007ffff7fd6090 in ?? () from target:/lib64/ld-linux-x86-64.so.2
(gdb)

Screenshot of GDB running in Emacs over remote protocol

2. All the different kind of functions

While implementing stdlib functions, I noticed that they belong to a few different levels - closely resembling the kind of privilege they have.

The low level primitives get access to everything, including the register allocation. The runtime functions know about the memory layout of objects. A scheme function is far more limited and can only see the high level functional constructs. When possible a function should be implemented in the highest level possible - prefer scheme over rust for safety and kind of a self referential check.

Primitives

These are things you really have to build into the core of the compiler and are written in Rust. primitives::string::make is a pretty good example since inlining the string constants is not something you could do with scheme.

Sort of primitives

All the math! You don't really have to implement + and ** in Rust, but it allows the compiler to not treat them as function calls and emit a single efficient instruction immediately. I'd consider a compiler performing basic math during compilation as form of interpretation - inc doesn't do this, but is fairly trivial to implement.

Runtime

Functions like string-length understand the memory layout of the objects and is probably easiest done in C or ASM. Because of the currently odd 'everything in stack' calling convention, this is written in asm instead of C, but must be rewritten in C for simplicity once FFI works.

All syscalls and FFI probably belong here in the same level.

Stdlib

AFAIU there shouldn't be a difference b/w user defined functions and functions shipped as a stdlib implemented in scheme.

Modules

cli

Command line interface for inc

compiler

Inc Compiler

core

Core types shared by most of the program

immediate

Runtime representation of typed scheme values

lambda

Scheme functions

parser

A scheme parser in nom.

primitives

Scheme functions implemented within the compiler rather than the runtime.

runtime

Runtime functions implemented in C or ASM

strings

A string is a blob of UTF-8 encoded bytes prefixed with the length if it.

x86

A general purpose x86 library.