
Table of Contents
- Introduction
- Quick Start (Ubuntu)
- Benchmark against
ocamlopt - User Guide
- Dependencies
- Feature Highlights
- Building from source
- Why is it called Oonta?
Introduction
Oonta is a compiler front-end for a subset of the OCaml programming language: it generates LLVM intermediate representation (IR) from OCaml source code.
Oonta uses the JJIK parser generator and JLEK lexer generator to perform the parsing and lexing stages. For building the IR, Oonta does not depend on the LLVM API.
This project is still a work in progress, many OCaml features are not yet supported. For example, modules, classes and objects are not yet supported. Additionally, the runtime does not yet provide a garbage collection service. Checkout this section for a complete list of currently supported language features.
[!NOTE] One could argue that with the many OCaml key features currently missing, it is not appropriate to call it "a compiler for a subset of the OCaml language", since it is more akin to simpler dialects of the ML programming language family, such as SML or Caml. However, the implementation and syntax reference has always been OCaml. Moreover, the final goal is to support as much OCaml features possible.
[!NOTE] This project is part of the "Compiler Toys" project, originally meant as a learning exercise on Compilers.
Quick Start (Ubuntu)
Install Rust and LLVM (sudo apt install llvm), then:
Benchmark against ocamlopt
Two benchmarks are provided:
Merge sort (benchmark/merge_sort.ml):
match lst (
let ( = f x in
map f l
match n acc
create_lst_aux
create_lst_aux n Empty
match lst
split_aux rest acc_right
split_aux lst Empty Empty
match right
lest
if left_head <= right_head then
Cat
else
Cat
match lst lst
lst
let splitted_lst = split lst in
let left =
match splitted_lst lst
in
let right =
match splitted_lst lst
in
merge
let n = read_int (
let lst = create_lst n
let sorted_lst = merge_sort lst
let ( = map print_int sorted_lst
Insertion sort (benchmark/insertion_sort.ml):
match lst (
let ( = f x in
map f l
match n acc
create_lst_aux
create_lst_aux n Empty
match lst Cat
if elem <= head then
Cat
else
Cat
match lst lst
insert head
let n = read_int (
let lst = create_lst n
let sorted_lst = insertion_sort lst
let ( = map print_int sorted_lst
And polymorphic list comparison (benchmark/polymorphic_compare.ml):
match n acc
create_lst_aux
create_lst_aux n Empty
if lst_a > lst_b then
print_int 1
else if lst_a = lst_b then
print_int 0
else
print_int 2
let n = read_int (
let lst = create_lst n
let ( = compare lst lst
Execute the following command inside the repository root folder to run the benchmarks.
[!WARNING] Currently, for running the merge sort benchmark on large inputs, increase the stack size limit with
ulimit -s unlimited
On my Ubuntu machine (AMD Ryzen™ 7 7700X × 16), with ocamlopt version 5.4.0
and LLVM version 20.1.8, the result of running the above benchmarks are as
follows:
Benchmarking: merge_sort.ml
Elapsed time (./benchmark/ocamlopt.out): 0.8747 seconds
Elapsed time (./benchmark/oonta.out): 0.7445 seconds
> 14.88% faster
Benchmarking: insertion_sort.ml
Elapsed time (./benchmark/ocamlopt.out): 0.1325 seconds
Elapsed time (./benchmark/oonta.out): 0.3481 seconds
> 262.68% slower
Benchmarking: polymorphic_compare.ml
Elapsed time (./benchmark/ocamlopt.out): 0.1026 seconds
Elapsed time (./benchmark/oonta.out): 0.0635 seconds
> 38.14% faster
User Guide
Dependencies
With no command line options given, the oonta command will simply generates
LLVM IR without requiring any runtime dependencies. However, it provides
--opt, -O, --compile / -c and --exec / -e options to optimize the
generated IR, compile the generated IR to an object code and executable,
respectively, which requires certain dependencies to function. Internally,
oonta will invoke the following commands when given those options:
# with --opt
# with --compile
# with --exec
On Ubuntu, install the llvm package to make opt, llc, and clang
available.
[!WARNING] The generated IR uses opaque pointers. If the LLVM version is older than version 15, the
--opt / --compile/--execoptions might not work. If you're on Debian/Ubuntu, see https://apt.llvm.org/ for instructions on installing other versions.
Additionally, with --exec, you need the Oonta runtime library:
liboonta_runtime.a. If you're running an x64 linux machine, you can obtain
the static library from the release
page. If not, you would need to
build the runtime from source by following the guide
below.
[!NOTE] Currently the runtime only provides memory allocation requests using bump allocation. Garbage collection service is not yet available.
Feature Highlights
Supported OCaml language features
[!NOTE] This list is by no means complete. Unsupported features were intentionally not fully expanded for brevity.
| Feature | Supported? |
|---|---|
| Basic data types | Integers and booleans |
| Type inference | Yes |
| Global definitions | Yes |
| Local definitions | Yes |
| First class functions | Yes |
| Closures | Yes |
| Partial applications | Yes |
| Recursive definitions | Yes |
| Parametric variants | Yes |
| Polymorphic functions | Yes |
| Anonymous functions | Yes |
| Conditionals | Yes |
| Pattern matching | Yes. However, no exhaustiveness check. |
| Mutually recursive definitions | No |
| Records | No |
| Imperative features | No |
| Exceptions | No |
| Lazy expressions | No |
| Module system | No |
| Classes and objects | No |
| Labeled arguments | No |
| Polymorphic variants | No |
Compile-time monomorphization
Oonta handles (parameteric) polymorphic function through compile-time monomorphization. It therefore generates more optimized code, at the cost of code bloat. This also allows removing the need for pointer tag.
For example, for the following OCaml code:
if x > y then
f x
else
f y
let ( = apply_on_greater print_int 3 2
It will monomorphize apply_on_greater with type (('a -> 'b) -> 'a -> 'a -> 'b) into ((int -> unit) -> int -> int -> unit) and generate the following
LLVM IR:
define ccc void @oonta.apply_on_greater.int.unit.fun(ptr %f, i64 %x, i64 %y, ptr %env) {
entry:
%r = icmp sgt i64 %x, %y
br i1 %r, label %then, label %else
then:
%r0 = load ptr, ptr %f
call ccc void %r0(i64 %x, ptr %f)
br label %follow
else:
%r1 = load ptr, ptr %f
call ccc void %r1(i64 %y, ptr %f)
br label %follow
follow:
ret void
}
Debug compile phases
Use the --verbose / -v option to debug each compile phase.
=> Lexing & Parsing Start
=> Lexing & Parsing End (0 ms)
=> Build AST Start
apply_on_greater =
FunExpr
├─▸ parameters: [f, x, y]
├─▸ captures: []
├─▸ recursive: no
└─▸ body:
CondExpr
├─▸ condition:
│ BinOpExpr
│ ├─▸ operator: >
│ ├─▸ lhs:
│ │ VarExpr ("x")
│ └─▸ rhs:
│ VarExpr ("y")
├─▸ then expr:
│ ApplicationExpr
│ ├─▸ function:
│ │ VarExpr ("f")
│ └─▸ binds:
│ └─▸ (0)
│ VarExpr ("x")
└─▸ else expr:
ApplicationExpr
├─▸ function:
│ VarExpr ("f")
└─▸ binds:
└─▸ (0)
VarExpr ("y")
() =
ApplicationExpr
├─▸ function:
│ VarExpr ("apply_on_greater")
└─▸ binds:
├─▸ (0)
│ VarExpr ("print_int")
├─▸ (1)
│ LiteralExpr (3)
└─▸ (2)
LiteralExpr (2)
=> Build AST End (0 ms)
=> Infer types Start
Top level bindings:
apply_on_greater: (('a -> 'b) -> 'a -> 'a -> 'b)
=> Infer types End (0 ms)
=> Currying expressions Start
=> Currying expressions End (0 ms)
=> Monomorphization Start
Monomorphing 'apply_on_greater' (('a -> 'b) -> 'a -> 'a -> 'b) into ((int -> unit) -> int -> int -> unit):
'a -> int
'b -> unit
=> Monomorphization End (0 ms)
=> Build LLVM module Start
=> Build LLVM module End (0 ms)
=> Write LLVM module Start
=> Write LLVM module End (0 ms)
=> Optimize LLVM IR Start
=> Optimize LLVM IR End (12 ms)
=> LLVM backend Start
=> LLVM backend End (37 ms)
Error reporting
Line 1|let x = foo 3
^--
Error: cannot infer expression type: Unbound value foo
Line 1|let rec f x = f
^
Error: cannot infer expression type: Cannot unify 'b with ('a -> 'b)
Line 1|let () = 1 + 2
^----
Error: cannot bind expression of type int to ()
Building from source
- Install C++ build dependencies
sudo apt install build-essential cmake
- Install
cargotool
|
- Clone repository
- Build
liboonta_runtime.a
cmake -S runtime -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build
- Install
liboonta_runtime.a
sudo cmake --install build
- Build
oonta
Why is it called Oonta?
Oonta, is based on the Indonesian word unta, which translates to "camel".