1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141
#![feature(slice_patterns)] /*! # An Incremental scheme compiler A tiny scheme to x86 asm compiler as described in the paper [An Incremental Approach to Compiler Construction][paper] by Abdulaziz Ghuloum. ## Where do I get started? 🕵️♀️ Read the first few sections of the paper to understand the premise. ## Background Reading 📚 There is a lot of C, Rust and x86 assembly here and these are some good places to start learning them. - [x86 module documentation](./x86/index.html) contains links to a few good x86 tutorials. - [How to C in 2016](https://matt.sh/howto-c) is a pretty good C refresher. - [The Rust Programming language][book] book is a good place to start learning rust. This project also uses a lot of iterators, so [Effectively Using Iterators In Rust][iter] might be useful as well ## Misc Micro blogs & lessons learned 🤷 ### 1. Debugging with GDB Debugging (occasionally wrong) generated assembly without a debugger is pretty hard and it is absolutely worth the effort getting familiar with gdb. GDB doesn't work on OSX despite the several dozens of blogs that claim otherwise and this project would be impossible without gdb. It is easier to setup remote debugging with docker than fight code signing on osx. Build the image $ docker build . -t inc:latest Run the container in privileged mode and expose a port $ docker run --rm -it --privileged -p 8080:8080 inc Run the program you want to debug in the container and build the executable /inc# echo "(let ((f (lambda (x) (+ x 1)))) (f 41))" | cargo run -q Start a remote debugging session /inc# gdbserver 127.0.0.1:8080 ./inc Start GDB on the host machine with the custom `.gdbinit` file $ cat .gdbinit set startup-with-shell off target remote 127.0.0.1:8080 This should work with the CLI as well as Emacs $ gdb Reading /inc/inc from remote target... warning: File transfers from remote targets can be slow. Use "set sysroot" to access files locally instead. Reading /inc/inc from remote target... Reading symbols from target:/inc/inc... Reading /lib64/ld-linux-x86-64.so.2 from remote target... Reading /lib64/ld-linux-x86-64.so.2 from remote target... Reading /lib64/5dfd7b95be4ba386fd71080accae8c0732b711.debug from remote target... Reading /lib64/.debug/5dfd7b95be4ba386fd71080accae8c0732b711.debug from remote target... Reading /usr/local/Cellar/gdb/8.3/lib/debug//lib64/5dfd7b95be4ba386fd71080accae8c0732b711.debug from remote target... Reading /usr/local/Cellar/gdb/8.3/lib/debug/lib64//5dfd7b95be4ba386fd71080accae8c0732b711.debug from remote target... Reading target:/usr/local/Cellar/gdb/8.3/lib/debug/lib64//5dfd7b95be4ba386fd71080accae8c0732b711.debug from remote target... 0x00007ffff7fd6090 in ?? () from target:/lib64/ld-linux-x86-64.so.2 (gdb) ![Screenshot of GDB running in Emacs over remote protocol][screenshot] ### 2. All the different kind of functions While implementing stdlib functions, I noticed that they belong to a few different levels - closely resembling the kind of privilege they have. The low level primitives get access to everything, including the register allocation. The runtime functions know about the memory layout of objects. A scheme function is far more limited and can only see the high level functional constructs. When possible a function should be implemented in the highest level possible - prefer scheme over rust for safety and kind of a self referential check. **Primitives** These are things you really have to build into the core of the compiler and are written in Rust. `primitives::string::make` is a pretty good example since inlining the string constants is not something you could do with scheme. **Sort of primitives** All the math! You don't really have to implement + and ** in Rust, but it allows the compiler to not treat them as function calls and emit a single efficient instruction immediately. I'd consider a compiler performing basic math during compilation as form of interpretation - inc doesn't do this, but is fairly trivial to implement. **Runtime** Functions like `string-length` understand the memory layout of the objects and is probably easiest done in C or ASM. Because of the currently odd 'everything in stack' calling convention, this is written in asm instead of C, but must be rewritten in C for simplicity once FFI works. All syscalls and FFI probably belong here in the same level. **Stdlib** AFAIU there shouldn't be a difference b/w user defined functions and functions shipped as a stdlib implemented in scheme. [Chez]: https://www.scheme.com [book]: https://doc.rust-lang.org/book/#the-rust-programming-language [iter]: https://hermanradtke.com/2015/06/22/effectively-using-iterators-in-rust.html [paper]: https://github.com/jaseemabid/inc/blob/master/docs/paper.pdf [rkt]: https://github.com/jaseemabid/inc/commit/a8ab1e6c7506023e59ddcf11cfabe53fbaa5c00a [rust]: https://github.com/jaseemabid/inc/commit/cc333332a5f20dc9de168954808d363621bd0c97 [screenshot]: https://raw.githubusercontent.com/jaseemabid/inc/master/docs/gdb.png */ pub mod cli; pub mod compiler; pub mod core; pub mod immediate; pub mod lambda; pub mod parser; pub mod primitives; pub mod runtime; pub mod strings; pub mod x86;