kamo/
lib.rs

1#![cfg_attr(docsrs, feature(doc_cfg))]
2//! A library to assist in the creation of an interpreter or compiler and its
3//! associated runtime.
4//!
5//! This is the third release of the Kamo project. This crate adds type-system
6//! and type-checking support to the project. The type system is used to infer
7//! the types of the intermediate representation and the AST. The type-checker
8//! is used to check the types of the intermediate representation and the AST.
9//!
10//! This is the second release of the Kamo project. This crate contains the
11//! parser library and the memory management library, which also includes the
12//! runtime representation of values.
13//!
14//! ## Memory Management
15//!
16//! The [`Mutator`](crate::mem::Mutator) implements automatic memory management.
17//! It holds an [`Arena`](crate::mem::Arena) for each type. Memory collection is
18//! done by a mark and sweep garbage collector.
19//!
20//! The mutator implementation is thread local. The arena allocator is
21//! implemented as a vector of [`Bucket`](crate::mem::Bucket)s. Each bucket
22//! holds a vector of [`Slot`](crate::mem::Slot)s. The slots are used to store
23//! the values. Buckets have a fixed size and are allocated on the heap. The
24//! buckets are allocated on demand when the slots are full.
25//!
26//! The hierarchy of types is as follows:
27//!
28//! * Mutator contains Arenas.
29//! * Arena contains Buckets.
30//! * Bucket contains Slots.
31//! * Slot contains a value.
32//! * a value is represented by a Pointer.
33//!
34//! Values managed by the mutator must be traceable. The mutator uses the
35//! [`Trace`](crate::mem::Trace) trait to trace the values. The `Trace` trait is
36//! implemented for all types that can contain values managed by the mutator.
37//!
38//! [`Pointer`](crate::mem::Pointer)s use reference counting to keep track of
39//! the number of references to the value. When the number of references reaches
40//! zero the value may be subject to garbage collection. To avoid cycles
41//! [`Pair`](crate::value::Pair)s, also known as `Cons` cells, and
42//! [`Vector`](crate::value::Vector)s do not hold locks to their elements.
43//! Instead they are traced by the mutator when memory collection is done. This
44//! is the reason why a drop of the reference count of a pointer to zero does
45//! not immediately free the value.
46//!
47//! The garbage collector is implemented as a mark and sweep collector,
48//! [`collect_garbage()`](crate::mem::Mutator::collect_garbage()). All values
49//! which hold a lock are roots. The garbage collector traces all roots and
50//! marks all values that are reachable from the roots. All values that are not
51//! marked are unreachable and are freed. Tracing is done iteratively by
52//! repeatedly calling the [`trace()`](crate::mem::Trace::trace()) method on the
53//! values and adding the values to a work list. This continues until the work
54//! list is empty.
55//!
56//! ### `Allocator` Trait
57//!
58//! The mutator uses the global allocator to allocate memory for buckets,
59//! arenas, vectors and strings etc. Dynamic memory allocation is done by the
60//! allocator provided by the [`std::alloc`] crate. Since the implementation of
61//! the allocator provided by the standard library is highly optimized it would
62//! be a waste of effort to reimplement it.
63//!
64//! As a consequence the mutator is not able to free memory when the global
65//! allocator fails to allocate memory. This is because the mutator does not
66//! know which memory is allocated by the global allocator. The mutator only
67//! knows which memory is allocated by its arenas. Collection of memory is
68//! triggered only by the allocation pressure.
69//!
70//! When the allocator trait is stabilized the mutator will be able to use the
71//! the trait to mitigate allocation failures by freeing memory when the global
72//! allocator fails.
73//!
74//! ## Runtime Values
75//!
76//! [`Value`](crate::value::Value)s can either hold immediate values or pointers
77//! to values allocated in the mutator.
78//!
79//! The following immediate values are supported:
80//!
81//! * `nil`, the empty list
82//! * `bool`
83//! * `char`, a Unicode scalar value
84//! * `i64`
85//! * `f64`
86//!
87//! The following pointer values are supported:
88//!
89//! * `pair`, a pair of values, also known as `Cons` cell
90//! * `symbol`, a symbol, an interned string
91//! * `string`, a UTF-8 string
92//! * `vector`, a vector of values
93//! * `bytevector`, a vector of bytes
94//!
95//! Value provides a safe interface to the values in the runtime. It is
96//! implemented as a wrapper around [`ValueKind`](crate::value::ValueKind). It
97//! provides methods to convert the value into a Rust type, to check the type of
98//! the value, and to visit the value. Heap values are automatically locked and
99//! unlocked when necessary.
100//!
101//! Value implements the visitor pattern. The [`Visitor`](crate::value::Visitor)
102//! trait is used to visit the value. When implementing the `Visitor` trait it
103//! must be considered that the value may be cyclic. The visitor must keep track
104//! of the values it has already visited to avoid infinite loops.
105//!
106//! The `Visitor` trait is used to implement printing of values. The `Value`
107//! itself does not implement printing. Instead the
108//! [`Display`](std::fmt::Display) trait is implemented for a specific vistor.
109//! This makes it possible to implement different printing strategies for
110//! values.
111//!
112//! An implementation of the `Visitor` trait is provided to print values in a
113//! Scheme-like syntax,
114//! [`SimplePrinterVisitor`](crate::value::SimplePrinterVisitor).
115//!
116//! ## Combination Parser
117//!
118//! The parser library is a general purpose parser library that can be used to
119//! parse any language. It is designed to be used in conjunction with the future
120//! parts of this project, but it can also be used on its own.
121//!
122//! The memory management library is a library for automatic memory management.
123//! Values are allocated in a mutator which holds an arena allocator for each
124//! type. Memory collection is done by a mark and sweep garbage collector.
125//!
126//! The future parts of this project will extend this crate with a library for
127//! compile time evaluation and transformation into an intermediate
128//! representation, an implemenation of an interpreter for the intermediate
129//! representation, and a reference implementation of Scheme. The implementation
130//! will be based on the R7RS standard and will implement a subset of the
131//! standard.
132//!
133//! The design goal of this project is to create a framework that can be used to
134//! create a language interpreter or compiler. AST nodes will be created by the
135//! parser and then passed to the evaluator. The AST is represented as symbolic
136//! expressions (s-expressions) and is designed to be language agnostic. The
137//! evaluator will then evaluate the AST into a combination of s-expressions and
138//! intermediate representation. This is the compile time representation of the
139//! program. The intermediate representation can then be either interpreted or
140//! compiled into a target representation.
141//!
142//! ### Parser Example
143//!
144//! Example of a parser that parses a Scheme-like byte-vector and returns it as
145//! `Vec<u8>`.
146//!
147//! The grammar is defined as follows:
148//!
149//! ```text
150//! ByteVector = "#u8(" Byte* ')'
151//! Byte       = <any exact integer between 0 and 255>
152//! ```
153//!
154//! The parser is defined as follows:
155//!
156//! ```rust
157//! use kamo::parser::{code, prelude::*, Input, ParseError, ParseResult, Span};
158//!
159//! fn main() {
160//!     assert_eq!(
161//!         parse_bytevec(Input::new("#u8(0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15)")),
162//!         Ok((
163//!             vec![0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],
164//!             Input::new("")
165//!         ))
166//!     );
167//! }
168//!
169//! fn parse_bytevec(input: Input<'_>) -> ParseResult<'_, Vec<u8>> {
170//!     context_as(
171//!         delimited(
172//!             pair(tag("#u8("), ascii::whitespace0),
173//!             list0(parse_bytevec_value, ascii::whitespace1),
174//!             context_as(
175//!                 preceded(ascii::whitespace0, char(')')),
176//!                 code::ERR_CUSTOM + 2,
177//!                 "expecting a closing parenthesis: )",
178//!             ),
179//!         ),
180//!         code::ERR_CUSTOM + 1,
181//!         "expecting a byte vector: #u8(<byte>*)",
182//!     )(input)
183//! }
184//!
185//! fn parse_bytevec_value(input: Input<'_>) -> ParseResult<'_, u8> {
186//!     let result = preceded(
187//!         ascii::whitespace0,
188//!         any((
189//!             parse_binary_natural,
190//!             parse_octal_natural,
191//!             parse_decimal_natural,
192//!             parse_hexadecimal_natural,
193//!             parse_natural,
194//!         )),
195//!     )(input);
196//!
197//!     match result {
198//!         Ok((value, cursor)) => {
199//!             if value > u8::MAX as u64 {
200//!                 Err(ParseError::new(
201//!                     Span::new(input.position(), cursor.position()),
202//!                     code::ERR_CUSTOM + 3,
203//!                     "expecting a byte value: 0..255",
204//!                 ))
205//!             } else {
206//!                 Ok((value as u8, cursor))
207//!             }
208//!         }
209//!         Err(err) => Err(err),
210//!     }
211//! }
212//!
213//! #[inline]
214//! fn parse_binary_natural(input: Input<'_>) -> ParseResult<u64> {
215//!     preceded(tag("#b"), literal::natural(literal::Radix::Binary))(input)
216//! }
217//!
218//! #[inline]
219//! fn parse_octal_natural(input: Input<'_>) -> ParseResult<u64> {
220//!     preceded(tag("#o"), literal::natural(literal::Radix::Octal))(input)
221//! }
222//!
223//! #[inline]
224//! fn parse_decimal_natural(input: Input<'_>) -> ParseResult<u64> {
225//!     preceded(tag("#d"), parse_natural)(input)
226//! }
227//!
228//! #[inline]
229//! fn parse_hexadecimal_natural(input: Input<'_>) -> ParseResult<u64> {
230//!     preceded(tag("#x"), literal::natural(literal::Radix::Hexadecimal))(input)
231//! }
232//!
233//! #[inline]
234//! fn parse_natural(input: Input<'_>) -> ParseResult<u64> {
235//!     literal::natural(literal::Radix::Decimal)(input)
236//! }
237//! ```
238//!
239//! ## Type System
240//!
241//! The type system is available when the `types` feature is enabled. It is not
242//! enabled by default. When the type system is enabled the [`mod@env`] module is
243//! also available. It is used by the
244//! [`TypeChecker`](crate::types::TypeChecker).
245//!
246//! Types are defined by a type code which is a byte array. The type code is
247//! used to represent the type of a value. This allows for a simple and
248//! efficient way to represent types. The syntax of the type code is
249//! straightforward and easy to understand and parse. The type code allows the
250//! construction of complex types with multiple levels of nesting by combining
251//! simpler types.
252//!
253//! For more information see the [`types`] module.
254//!
255//! ## Environment
256//!
257//! The environment module is available when the `types` feature is enabled. It
258//! is not enabled by default.
259//!
260//! The environment defines the scopes and bindings of an interpreter. The
261//! environment is used to store the bindings of variables and functions. During
262//! the execution or compilation of a program, the environment is used to lookup
263//! the bindings of variables and functions. The environment is also used to
264//! define new bindings.
265//!
266//! The [`TypeChecker`](crate::types::TypeChecker) uses the environment to store
267//! or to lookup the types of variables and functions.
268//!
269//! ## Macros
270//!
271//! Macros are available when the `macros` feature is enabled. It is not
272//! enabled by default.
273//!
274//! The macros are used to parse a string literal into a single or an array of
275//! `kamo::value::Value`s. The macros are defined as follows:
276//!
277//! - `sexpr!(<mutator>, <expression>)` - A macro for parsing a single
278//!   s-expression from a string. It returns a `kamo::value::Value`.
279//!
280//! - `sexpr_file!(<mutator>, <filename>)` - A macro for parsing multiple
281//!   s-expressions from a file. It returns an array of `kamo::value::Value`s.
282//!   The array may be empty.
283//!  
284//! - `sexpr_script!(<mutator>, <expressions>)` - A macro for parsing multiple
285//!   s-expressions from a string. It returns an array of `kamo::value::Value`s.
286//!   The array may be empty.
287//!  
288//! The macros all take an optional `MutatorRef` identifier. This is used to allocate
289//! values on the heap. If the expression does not contain any values that need to
290//! be allocated on the heap, then the `Mutator` identifier can be omitted.
291//!  
292//! The syntax for the macros is as defined by the Scheme standard R7RS for the
293//! `read` procedure. The syntactic definition is the `<datum>` in section
294//! [`7.1.2 External representations`](https://standards.scheme.org/official/r7rs.pdf)
295//! of the standard.
296//!  
297//! The syntax deviations from the standard are:
298//!  
299//! - The extactness (`#e` and `#i`) of numbers is not supported. Floating-point
300//!   numbers are always inexact and integers are always exact.
301//!  
302//! - Numbers may only be signed 64-bit integers or IEEE 754 double precision
303//!   floating-point numbers. The standard allows for arbitrary precision
304//!   integers, rationals and complex numbers.
305//!  
306//! - Labels are not supported.
307//!  
308//! - The `#;` comment syntax is only supported in the macros which parse
309//!   multiple s-expressions. The `#;` comment syntax may not be nested.
310//!  
311//! - Character literals defined by a hex escape sequence may have 1 to 6
312//!   digits. The standard excepts 1 or more digits. The code must be a valid
313//!   Unicode code point.
314//!  
315//! The parser is implemented with the [`pest`](https://crates.io/crates/pest)
316//! crate. The grammar is defined in `src/sexpr/sexpr.pest`. This is necessary
317//! because the combination parser library defined in `kamo::parser` cannot be
318//! used here. It would be cyclic dependency. There will be an implementation
319//! of a parser for s-expressions in the `kamo::form` module in the future. It
320//! will be based on the `kamo::parser` crate and will be used by the scheme
321//! interpreter.
322//!  
323//! ### Examples
324//!  
325//! ```rust
326//! use kamo::{mem::Mutator, value::{print, Value}};
327//! use kamo_macros::sexpr;
328//!  
329//! let m = Mutator::new_ref();
330//! let value = sexpr!(m, "(1 2 3)");
331//!  
332//! assert_eq!(print(value).to_string(), "(1 2 3)");
333//! ```
334//!  
335//! ```rust
336//! use kamo::value::{print, Value};
337//! use kamo_macros::sexpr_file;
338//!  
339//! let values: &[Value] = &sexpr_file!("tests/macros/values.scm");
340//!  
341//! assert_eq!(values.len(), 3);
342//! assert_eq!(print(values[0].clone()).to_string(), "()");
343//! assert_eq!(print(values[1].clone()).to_string(), "100");
344//! assert_eq!(print(values[2].clone()).to_string(), "#t");
345//!
346//! let values: &[Value] = &sexpr_file!("tests/macros/empty.scm");
347//! assert_eq!(values.len(), 0);
348//! ```
349//!  
350//! ```rust
351//! use kamo::{mem::Mutator, value::{print, Value}};
352//! use kamo_macros::sexpr_script;
353//!  
354//! let m = Mutator::new_ref();
355//! let values: &[Value] = &sexpr_script!(m, "(define a 1)\n(define b 2)\n(+ a b)");
356//!  
357//! assert_eq!(values.len(), 3);
358//! assert_eq!(print(values[0].clone()).to_string(), "(define a 1)");
359//! assert_eq!(print(values[1].clone()).to_string(), "(define b 2)");
360//! assert_eq!(print(values[2].clone()).to_string(), "(+ a b)");
361//!
362//! let values: &[Value] = &sexpr_script!("");
363//! assert_eq!(values.len(), 0);
364//! ```
365//!
366//! ## Feature List
367//!
368//! - [x] Module `kamo::parser` for parsing UTF-8 text. A parser combinator
369//! library for parsing UTF-8 text in a safe and mostly zero-copy way.
370//! - [x] Module `kamo::mem` for automatic memory management. Values are allocated
371//! in a mutator which holds an arena allocator for each type. Memory collection
372//! is done by a mark and sweep garbage collector.
373//! - [x] Module `kamo::value` for values. Values can either hold immediate values
374//! or pointers to values allocated in the mutator.
375//! - [x] Module `kamo::types` for types. The type system is used to infer the
376//! types of the intermediate representation and the AST.
377//! - [ ] Module `kamo::eval` for evaluation. The evaluator processes an AST, which
378//! is an symbolic expression tree, and evaluates it to an intermediate
379//! representation. The intermediate representation can then be interpreted or
380//! compiled to a target representation. The interpreter is generic and can be
381//! used to interpret any intermediate representation.
382//! - [ ] Module `kamo::repl` for a read-eval-print-loop. The REPL is used to
383//! interactively evaluate expressions and is generic and can be used to
384//! evaluate any intermediate representation.
385//! - [ ] Module `kamo::lang::scheme` for the Scheme language. The Scheme language
386//! is implemented as a library on top of the `kamo` modules. It implements a
387//! subset of the R7RS standard.
388
389#[cfg(feature = "types")]
390#[macro_use]
391extern crate lazy_static;
392
393#[cfg(feature = "types")]
394#[cfg_attr(docsrs, doc(cfg(feature = "types")))]
395pub mod env;
396pub mod form;
397pub mod mem;
398pub mod parser;
399#[cfg(feature = "types")]
400#[cfg_attr(docsrs, doc(cfg(feature = "types")))]
401pub mod types;
402pub mod value;
403
404mod position;
405pub use position::Position;
406
407#[cfg(feature = "macros")]
408#[cfg_attr(docsrs, doc(cfg(feature = "macros")))]
409pub use kamo_macros::{sexpr, sexpr_file, sexpr_script};
410
411#[cfg(all(test, feature = "macros"))]
412mod tests {
413    use super::*;
414    use super::{
415        mem::Mutator,
416        value::{print, Value},
417    };
418
419    #[test]
420    fn sexpr_macro() {
421        let m = Mutator::new_ref();
422
423        let value = sexpr!(r#"()"#);
424        assert_eq!(print(value).to_string(), r"()");
425
426        let value = sexpr!(r#"#true"#);
427        assert_eq!(print(value).to_string(), r"#t");
428
429        let value = sexpr!(r#"#t"#);
430        assert_eq!(print(value).to_string(), r"#t");
431
432        let value = sexpr!(r#"#false"#);
433        assert_eq!(print(value).to_string(), r"#f");
434
435        let value = sexpr!(r#"#f"#);
436        assert_eq!(print(value).to_string(), r"#f");
437
438        let value = sexpr!(r#"#\a"#);
439        assert_eq!(print(value).to_string(), r"#\a");
440
441        let value = sexpr!(r#"#\a"#);
442        assert_eq!(print(value).to_string(), r"#\a");
443
444        let value = sexpr!(r#"100"#);
445        assert_eq!(print(value).to_string(), r"100");
446
447        let value = sexpr!(r#".5"#);
448        assert_eq!(print(value).to_string(), r"0.5");
449
450        let value = sexpr!(r#"+inf.0"#);
451        assert_eq!(print(value).to_string(), r"+inf.0");
452
453        let value = sexpr!(r#"#b101"#);
454        assert_eq!(print(value).to_string(), r"5");
455
456        let value = sexpr!(m, r#""Hello World!""#);
457        assert_eq!(print(value).to_string(), r#""Hello World!""#);
458
459        let value = sexpr!(m, r#"hello"#);
460        assert_eq!(print(value).to_string(), r"hello");
461
462        let value = sexpr!(m, r#"#u8( 1 2 3)"#);
463        assert_eq!(print(value).to_string(), r"#u8(1 2 3)");
464
465        let value = sexpr!(m, r#"'(+ 1 2)"#);
466        assert_eq!(print(value).to_string(), r"(quote (+ 1 2))");
467
468        let value = sexpr!(m, r#",(+ 1 2)"#);
469        assert_eq!(print(value).to_string(), r"(unquote (+ 1 2))");
470
471        let value = sexpr!(m, r#"`(+ 1 2)"#);
472        assert_eq!(print(value).to_string(), r"(quasiquote (+ 1 2))");
473
474        let value = sexpr!(m, r#",@(+ 1 2)"#);
475        assert_eq!(print(value).to_string(), r"(unquote-splicing (+ 1 2))");
476
477        let value = sexpr!(m, r#"#(+ 1 2)"#);
478        assert_eq!(print(value).to_string(), r"#(+ 1 2)");
479    }
480
481    #[test]
482    fn sexpr_file_success() {
483        let values: &[Value] = &sexpr_file!("tests/macros/values.scm");
484
485        assert_eq!(values.len(), 3);
486        assert_eq!(print(values[0].clone()).to_string(), "()");
487        assert_eq!(print(values[1].clone()).to_string(), "100");
488        assert_eq!(print(values[2].clone()).to_string(), "#t");
489
490        let values: &[Value] = &sexpr_file!("tests/macros/empty.scm");
491
492        assert_eq!(values.len(), 0);
493    }
494
495    #[test]
496    fn sexpr_script_success() {
497        let m = Mutator::new_ref();
498        let values: &[Value] = &sexpr_script!(m, "(define a 1)\n(define b 2)\n(+ a b)");
499
500        assert_eq!(values.len(), 3);
501        assert_eq!(print(values[0].clone()).to_string(), "(define a 1)");
502        assert_eq!(print(values[1].clone()).to_string(), "(define b 2)");
503        assert_eq!(print(values[2].clone()).to_string(), "(+ a b)");
504
505        let values: &[Value] = &sexpr_script!("");
506
507        assert_eq!(values.len(), 0);
508    }
509}