
[](https://github.com/fuad1502/oonta/actions/workflows/CI.yml)
*Oonta* is a compiler front-end for the [OCaml programming
language](https://ocaml.org): it generates [LLVM intermediate representation
(IR)](https://llvm.org/docs/LangRef.html) from OCaml source code.
*Oonta* uses the [JJIK](https://github.com/fuad1502/jjik) parser generator and
[JLEK](https://github.com/fuad1502/jlek) lexer generator to perform the parsing
and lexing stages. For building the IR, *Oonta* does not depend on the LLVM
API.
> [!IMPORTANT]
> This project is still a work in progress, many OCaml features are not yet
> supported. For example, polymorphic functions and types, and modules are not
> yet supported. Additionally, the runtime does not yet provide a garbage
> collection service. See the issues tab for the list of work items.
> [!NOTE]
> This project is part of the ["Compiler
> Toys"](https://github.com/fuad1502/compiler_toys) project, originally meant
> as a learning exercise on Compilers.
## Quick Start
```sh
cargo install oonta
cat << EOF > main.ml
let rec factorial x = if x <= 1 then 1 else x * factorial (x - 1)
let a = factorial 5
let () = print_int a
EOF
oonta --opt --exec main.ml
./main.out
# 120
```
## Benchmark
Two benchmarks are provided:
Merge sort (`benchmark/merge_sort.ml`):
```ocaml
type list =
| Empty
| Cat of (int * list)
let rec map f lst =
match lst with
| Empty -> ()
| Cat (x, l) ->
let () = f x in
map f l
let rec create_lst_aux n acc =
match n with
| 0 -> acc
| _ -> create_lst_aux (n - 1) (Cat (read_int (), acc))
let create_lst n = create_lst_aux n Empty
let rec split_aux lst acc_left acc_right =
match lst with
| Empty -> (acc_left, acc_right)
| Cat (head, rest) -> split_aux rest acc_right (Cat (head, acc_left))
let split lst = split_aux lst Empty Empty
let rec merge left right =
match (left, right) with
| Empty, right -> right
| lest, Empty -> lest
| Cat (left_head, left_rest), Cat (right_head, right_rest) ->
if left_head <= right_head then
Cat (left_head, merge left_rest right)
else
Cat (right_head, merge left right_rest)
let rec merge_sort lst =
match lst with
| Empty -> lst
| Cat (_, Empty) -> lst
| _ ->
let splitted_lst = split lst in
let left =
match splitted_lst with
| lst, _ -> lst
in
let right =
match splitted_lst with
| _, lst -> lst
in
merge (merge_sort left) (merge_sort right)
let n = read_int ()
let lst = create_lst n
let sorted_lst = merge_sort lst
let () = map print_int sorted_lst
```
And insertion sort (`benchmark/insertion_sort.ml`):
```ocaml
type list =
| Empty
| Cat of (int * list)
let rec map f lst =
match lst with
| Empty -> ()
| Cat (x, l) ->
let () = f x in
map f l
let rec create_lst_aux n acc =
match n with
| 0 -> acc
| _ -> create_lst_aux (n - 1) (Cat (read_int (), acc))
let create_lst n = create_lst_aux n Empty
let rec insert elem lst =
match lst with
| Empty -> Cat (elem, lst)
| Cat (head, tail) ->
if elem <= head then
Cat (elem, lst)
else
Cat (head, insert elem tail)
let rec insertion_sort lst =
match lst with
| Empty -> lst
| Cat (head, tail) -> insert head (insertion_sort tail)
let n = read_int ()
let lst = create_lst n
let sorted_lst = insertion_sort lst
let () = map print_int sorted_lst
```
Execute the following command inside the repository root folder to run the
benchmarks.
```sh
python3 benchmark/benchmark.py 1000000 0 # run the merge_sort.ml benchmark on a list with 1 million elements
python3 benchmark/benchmark.py 10000 0 # run the insertion_sort.ml benchmark on a list with 10000 elements
```
> [!IMPORTANT]
> Currently, for running the benchmark on large inputs, we have to increase the
> stack size limit with `ulimit -s unlimited`
On my machine (AMD Ryzen™ 7 7700X × 16), the result of running the above
benchmarks are as follows:
```text
Benchmarking: merge_sort.ml
Elapsed time (./benchmark/ocamlopt.out): 0.8710 seconds
Elapsed time (./benchmark/oonta.out): 0.7323 seconds
> 1.19 times faster
Benchmarking: insertion_sort.ml
Elapsed time (./benchmark/ocamlopt.out): 0.1323 seconds
Elapsed time (./benchmark/oonta.out): 0.3363 seconds
> 2.54 times slower
```
## Dependencies
The `oonta` command provides the `--opt, -O`, `--compile / -c` and `--exec /
-e` options to optimize the generated IR, compile the generated IR to an object
code and executable, respectively. Internally, `oonta` will invoke the
following commands:
```sh
# with --opt
opt -S -O3 -o <.ll file> <.ll file>
# with --compile
llc -O3 -relocation-model=pic --filetype=obj -o <output> <.ll file>
# with --exec
clang -o <output> <.o file> -loonta_runtime
```
On Ubuntu, install the `llvm` package to make those commands available.
```sh
sudo apt install llvm
```
> [!NOTE]
> I will be working on my own LLVM backend as part of my compiler learning
> journey! ✨
> [!WARNING]
> The generated IR uses [opaque
> pointers](https://llvm.org/docs/OpaquePointers.html). If your LLVM version is
> older than version 15, the `--compile` / `--exec` options might not work.
Additionally, with `--exec`, you need the *Oonta runtime library*:
`liboonta_runtime.a`. If you're running an x64 linux machine, you can obtain
the static library from the [release page]. If not, you would need to build the
runtime from source by following the guide [below](#building-from-source).
> [!NOTE]
> Currently the runtime only provides allocation requests using *bump
> allocation*. Garbage collection service is not yet available.
## User Guide
```sh
oonta --help
```
## Feature Highlights
### Debug compile phases
Use the `--verbose / -v` option to debug each compile phase.
```sh
cat << EOF > main.ml
let rec factorial x = if x <= 1 then 1 else x * factorial (x - 1)
let () = print_int (factorial 5)
EOF
oonta --exec -v main.ml
```
```text
=> Lexing & Parsing Start
=> Lexing & Parsing End (1 ms)
=> Build AST Start
factorial =
FunExpr
├─▸ parameters: [x]
├─▸ captures: []
├─▸ recursive: yes
└─▸ body:
CondExpr
├─▸ condition:
│ BinOpExpr
│ ├─▸ operator: <=
│ ├─▸ lhs:
│ │ VarExpr ("x")
│ └─▸ rhs:
│ LiteralExpr (1)
├─▸ then expr:
│ LiteralExpr (1)
└─▸ else expr:
BinOpExpr
├─▸ operator: *
├─▸ lhs:
│ VarExpr ("x")
└─▸ rhs:
ApplicationExpr
├─▸ function:
│ VarExpr ("factorial")
└─▸ binds:
└─▸ (0)
BinOpExpr
├─▸ operator: -
├─▸ lhs:
│ VarExpr ("x")
└─▸ rhs:
LiteralExpr (1)
() =
ApplicationExpr
├─▸ function:
│ VarExpr ("print_int")
└─▸ binds:
└─▸ (0)
ApplicationExpr
├─▸ function:
│ VarExpr ("factorial")
└─▸ binds:
└─▸ (0)
LiteralExpr (5)
=> Build AST End (0 ms)
=> Resolve types Start
Top level bindings:
factorial: (int -> int)
=> Resolve types End (0 ms)
=> Transform application expressions Start
=> Transform application expressions End (0 ms)
=> Build LLVM module Start
=> Build LLVM module End (0 ms)
=> Write LLVM module Start
=> Write LLVM module End (1 ms)
=> LLVM backend Start
=> LLVM backend End (117 ms)
```
### Error reporting
```text
Error: cannot infer expression type: Unbound value foo
```
```text
Line 1|let rec f x = f
^
Error: cannot infer expression type: Cannot unify 'b with ('a -> 'b)
```
```text
Error: cannot bind expression of type int to ()
```
## Building from source
1. Install C++ build dependencies
```
sudo apt install build-essential cmake
```
2. Install `cargo` tool:
```sh
3. Clone repository.
```sh
git clone https://github.com/fuad1502/oonta.git
```
4. Build `liboonta_runtime.a`
```
cmake -S runtime -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build
```
5. Install `liboonta_runtime.a`
```
sudo cmake --install build
```
6. Build `oonta`.
```sh
cargo build
cargo test
```
## Why is it called Oonta?
*Oonta*, is based on the Indonesian word *unta*, which translates to "camel".