oonta 0.2.0 - Docs.rs

Oonta is a compiler front-end for the OCaml programming language: it generates LLVM intermediate representation (IR) from OCaml source code.

Oonta uses the JJIK parser generator and JLEK lexer generator to perform the parsing and lexing stages. For building the IR, Oonta does not depend on the LLVM API.

[!IMPORTANT] This project is still a work in progress, many OCaml features are not yet supported. For example, polymorphic functions and types, and modules are not yet supported. Additionally, the runtime does not yet provide a garbage collection service. See the issues tab for the list of work items.

[!NOTE] This project is part of the "Compiler Toys" project, originally meant as a learning exercise on Compilers.

Quick Start

cargo install oonta
cat << EOF > main.ml
let rec factorial x = if x <= 1 then 1 else x * factorial (x - 1)
let a = factorial 5
let () = print_int a
EOF
oonta --opt --exec main.ml
./main.out
# 120

Benchmark

Two benchmarks are provided:

Merge sort (benchmark/merge_sort.ml):

type list =
  | Empty
  | Cat of (int * list)

let rec map f lst =
  match lst with
  | Empty -> ()
  | Cat (x, l) ->
      let () = f x in
      map f l

let rec create_lst_aux n acc =
  match n with
  | 0 -> acc
  | _ -> create_lst_aux (n - 1) (Cat (read_int (), acc))

let create_lst n = create_lst_aux n Empty

let rec split_aux lst acc_left acc_right =
  match lst with
  | Empty -> (acc_left, acc_right)
  | Cat (head, rest) -> split_aux rest acc_right (Cat (head, acc_left))

let split lst = split_aux lst Empty Empty

let rec merge left right =
  match (left, right) with
  | Empty, right -> right
  | lest, Empty -> lest
  | Cat (left_head, left_rest), Cat (right_head, right_rest) ->
      if left_head <= right_head then
        Cat (left_head, merge left_rest right)
      else
        Cat (right_head, merge left right_rest)

let rec merge_sort lst =
  match lst with
  | Empty -> lst
  | Cat (_, Empty) -> lst
  | _ ->
      let splitted_lst = split lst in
      let left =
        match splitted_lst with
        | lst, _ -> lst
      in
      let right =
        match splitted_lst with
        | _, lst -> lst
      in
      merge (merge_sort left) (merge_sort right)

let n = read_int ()
let lst = create_lst n
let sorted_lst = merge_sort lst
let () = map print_int sorted_lst

And insertion sort (benchmark/insertion_sort.ml):

type list =
  | Empty
  | Cat of (int * list)

let rec map f lst =
  match lst with
  | Empty -> ()
  | Cat (x, l) ->
      let () = f x in
      map f l

let rec create_lst_aux n acc =
  match n with
  | 0 -> acc
  | _ -> create_lst_aux (n - 1) (Cat (read_int (), acc))

let create_lst n = create_lst_aux n Empty

let rec insert elem lst =
  match lst with
  | Empty -> Cat (elem, lst)
  | Cat (head, tail) ->
      if elem <= head then
        Cat (elem, lst)
      else
        Cat (head, insert elem tail)

let rec insertion_sort lst =
  match lst with
  | Empty -> lst
  | Cat (head, tail) -> insert head (insertion_sort tail)

let n = read_int ()
let lst = create_lst n
let sorted_lst = insertion_sort lst
let () = map print_int sorted_lst

Execute the following command inside the repository root folder to run the benchmarks.

python3 benchmark/benchmark.py 1000000 0 # run the merge_sort.ml benchmark on a list with 1 million elements
python3 benchmark/benchmark.py 10000 0 # run the insertion_sort.ml benchmark on a list with 10000 elements

[!IMPORTANT] Currently, for running the benchmark on large inputs, we have to increase the stack size limit with ulimit -s unlimited

On my machine (AMD Ryzen™ 7 7700X × 16), the result of running the above benchmarks are as follows:

Benchmarking: merge_sort.ml
Elapsed time (./benchmark/ocamlopt.out): 0.8710 seconds
Elapsed time (./benchmark/oonta.out): 0.7323 seconds
> 1.19 times faster

Benchmarking: insertion_sort.ml
Elapsed time (./benchmark/ocamlopt.out): 0.1323 seconds
Elapsed time (./benchmark/oonta.out): 0.3363 seconds
> 2.54 times slower

Dependencies

The oonta command provides the --opt, -O, --compile / -c and --exec / -e options to optimize the generated IR, compile the generated IR to an object code and executable, respectively. Internally, oonta will invoke the following commands:

# with --opt
opt -S -O3 -o <.ll file> <.ll file>
# with --compile
llc -O3 -relocation-model=pic --filetype=obj -o <output> <.ll file>
# with --exec
clang -o <output> <.o file> -loonta_runtime

On Ubuntu, install the llvm package to make those commands available.

sudo apt install llvm

[!NOTE] I will be working on my own LLVM backend as part of my compiler learning journey! ✨

[!WARNING] The generated IR uses opaque pointers. If your LLVM version is older than version 15, the --compile / --exec options might not work.

Additionally, with --exec, you need the Oonta runtime library: liboonta_runtime.a. If you're running an x64 linux machine, you can obtain the static library from the [release page]. If not, you would need to build the runtime from source by following the guide below.

[!NOTE] Currently the runtime only provides allocation requests using bump allocation. Garbage collection service is not yet available.

User Guide

oonta --help

Feature Highlights

Debug compile phases

Use the --verbose / -v option to debug each compile phase.

cat << EOF > main.ml
let rec factorial x = if x <= 1 then 1 else x * factorial (x - 1)
let () = print_int (factorial 5)
EOF
oonta --exec -v main.ml

=> Lexing & Parsing Start
=> Lexing & Parsing End (1 ms)
=> Build AST Start
factorial = 
FunExpr
├─▸ parameters: [x]
├─▸ captures: []
├─▸ recursive: yes
└─▸ body:
    CondExpr
    ├─▸ condition:
    │   BinOpExpr
    │   ├─▸ operator: <=
    │   ├─▸ lhs:
    │   │   VarExpr ("x")
    │   └─▸ rhs:
    │       LiteralExpr (1)
    ├─▸ then expr:
    │   LiteralExpr (1)
    └─▸ else expr:
        BinOpExpr
        ├─▸ operator: *
        ├─▸ lhs:
        │   VarExpr ("x")
        └─▸ rhs:
            ApplicationExpr
            ├─▸ function:
            │   VarExpr ("factorial")
            └─▸ binds:
                └─▸ (0)
                    BinOpExpr
                    ├─▸ operator: -
                    ├─▸ lhs:
                    │   VarExpr ("x")
                    └─▸ rhs:
                        LiteralExpr (1)

() = 
ApplicationExpr
├─▸ function:
│   VarExpr ("print_int")
└─▸ binds:
    └─▸ (0)
        ApplicationExpr
        ├─▸ function:
        │   VarExpr ("factorial")
        └─▸ binds:
            └─▸ (0)
                LiteralExpr (5)

=> Build AST End (0 ms)
=> Resolve types Start
Top level bindings:
factorial: (int -> int)
=> Resolve types End (0 ms)
=> Transform application expressions Start
=> Transform application expressions End (0 ms)
=> Build LLVM module Start
=> Build LLVM module End (0 ms)
=> Write LLVM module Start
=> Write LLVM module End (1 ms)
=> LLVM backend Start
=> LLVM backend End (117 ms)

Error reporting

Line   1|let x = foo 3
                 ^--
Error: cannot infer expression type: Unbound value foo

Line   1|let rec f x = f
                       ^
Error: cannot infer expression type: Cannot unify 'b with ('a -> 'b)

Line   1|let () = 1 + 2
                  ^----
Error: cannot bind expression of type int to ()

Building from source

Install C++ build dependencies

sudo apt install build-essential cmake

Install cargo tool:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Clone repository.

git clone https://github.com/fuad1502/oonta.git

Build liboonta_runtime.a

cmake -S runtime -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build

Install liboonta_runtime.a

sudo cmake --install build

Build oonta.

cargo build
cargo test

Why is it called Oonta?

Oonta, is based on the Indonesian word unta, which translates to "camel".