Struct Compiler

Source
pub struct Compiler {
    pub chunks: Vec<Chunk>,
    pub labels: HashMap<Label, Option<(usize, usize)>>,
    pub sym_table: SymbolTable,
    /* private fields */
}
Expand description

A compiler that provides tools to generate bytecode instructions for cas-rs’s virtual machine (see Vm).

Note: If you’re looking to run a CalcScript program, you should use the Vm struct found in cas-vm instead.

This is the main entry point to the compiler. The compiler translates a CalcScript AST (produced by cas_parser) into a series of bytecode Chunks, which can then be executed by cas-rs’s Vm. It also is mostly responsible for managing CalcScript’s semantics through lexical scoping and symbol resolution, value stack layout, and generation of chunks. These details are described later in this documentation, but are not important if you’re just looking to run a program.

To compile a complete program, it is recommended that you use Compiler::compile_program, which is the easiest way to ensure the resulting bytecode is valid. However, there are also a number of other methods that can be used to manually compile CalcScript. There is one important rule to keep in mind when taking this approach (which would otherwise be handled by the compiler), which requires a quick explanation of Vm’s value stack.

During execution, Vm uses a value stack to keep track of values generated by and used around the bytecode. The compiler must ensure that the instructions it generates manipulates the value stack’s semantics correctly.

The most important rule is that the value stack must have exactly one value on it when the program finishes executing. This is the value that is returned by the program (printed when using the cas-rs REPL). When manually compiling a program, you must ensure that each statement’s instructions leave no value on the stack when the statement completes, except for the last statement in a block or chunk.

(Note that failing to uphold this rule will never result in undefined behavior; it will most likely either panic or result in an error during execution.)

Most CalcScript programs consist of a sequence of statements, for example:

x = 3
y = 4
z = hypot(x, y)

In this case, the compiler generates these instructions:

use cas_compiler::Compiler;
use cas_parser::parser::Parser;

let ast = Parser::new("x = 3
y = 4
z = hypot(x, y)").try_parse_full_many().unwrap();

let compiler = Compiler::compile_program(ast).unwrap();

use cas_compiler::{item::Symbol, Instruction, InstructionKind::*};
assert_eq!(compiler.chunks[0].instructions, vec![
    // x = 3
    Instruction { kind: LoadConst(3.into()), spans: vec![] },
    Instruction { kind: StoreVar(0), spans: vec![] },
    Instruction { kind: Drop, spans: vec![] },

    // x = 4
    Instruction { kind: LoadConst(4.into()), spans: vec![] },
    Instruction { kind: StoreVar(1), spans: vec![] },
    Instruction { kind: Drop, spans: vec![] },

    // z = hypot(x, y)
    Instruction { kind: LoadVar(Symbol::User(0)), spans: vec![22..23] },
    Instruction { kind: LoadVar(Symbol::User(1)), spans: vec![25..26] },
    Instruction { kind: LoadVar(Symbol::Builtin("hypot")), spans: vec![16..21] },
    Instruction { kind: Call(2), spans: vec![16..22, 26..27, 22..23, 25..26] },
    Instruction { kind: StoreVar(2), spans: vec![] }
]);

Notice that each statement is terminated by a InstructionKind::Drop instruction, except for the last one. For example, the first statement, x = 3, loads the constant 3 onto the stack. The InstructionKind::StoreVar instruction stores the value into the variable x (0), but does not remove it from the stack. The Drop instruction then removes the value from the stack, leaving the stack empty. (It is more optimal to use InstructionKind::AssignVar in this case, but the compiler does not implement this behavior yet.)

The final statement, z = hypot(x, y), stores the computed value into the variable z (2), but does not drop the value from the stack, making it the final value on the stack, and thus the return value of the program.

You need to be mindful of this behavior when manually compiling programs.

Fields§

§chunks: Vec<Chunk>

The bytecode chunks generated by the compiler.

The entire program is represented as multiple chunks of bytecode, where each chunk represents a function body. The first chunk represents the implicit “main” function.

§labels: HashMap<Label, Option<(usize, usize)>>

Labels generated by the compiler, mapped to the index of the instruction they reference.

When created, labels aren’t associated with any instruction. Before the bytecode is executed, the compiler will resolve these labels to the actual instruction indices.

§sym_table: SymbolTable

A symbol table that maps identifiers to information about the values they represent.

This is used to store information about variables and functions that are defined in the program.

Implementations§

Source§

impl Compiler

Source

pub fn new() -> Self

Creates a new compiler.

Source

pub fn compile<T: Compile>(expr: T) -> Result<Self, Error>

Compiles the given type into a sequence of Instructions.

Source

pub fn compile_program(stmts: Vec<Stmt>) -> Result<Self, Error>

Compiles multiple statements into a sequence of Instructions.

Source

pub fn with_state<F, G>( &mut self, modify_state: F, compile: G, ) -> Result<(), Error>
where F: FnOnce(&mut CompilerState), G: FnOnce(&mut Self) -> Result<(), Error>,

Creates a new compilation scope with the given modified state. Compilation that occurs in this scope will then use the modified state.

Source

pub fn chunk(&self) -> &Chunk

Returns an immutable reference to the current chunk.

Source

pub fn chunk_mut(&mut self) -> &mut Chunk

Returns a mutable reference to the current chunk.

Source

pub fn add_item(&mut self, symbol: &LitSym, item: Item) -> Result<(), Error>

Add an item to the symbol table at the current scope.

If the item to add matches that of a builtin item, one of the following will occur:

  • If this function is called from the global scope, an OverrideBuiltinConstant or OverrideBuiltinFunction error is returned.
  • If this function is called anywhere else, the symbol table will successfully be updated with the new item. This item shadows the existing builtin, meaning the builtin will not be accessible until the scope in which this item was declared, ends.
Source

pub fn new_scope<F>(&mut self, f: F) -> Result<(), Error>
where F: FnOnce(&mut Compiler) -> Result<(), Error>,

Creates a new scope in the symbol table. Within the provided function, all compiler methods that add or mutate symbols will do so in the new scope.

The scope is popped off the symbol table stack when the function returns. If no symbols were added to the scope, it will not be added to the symbol table.

Source

pub fn new_chunk<F>( &mut self, header: &FuncHeader, f: F, ) -> Result<NewChunk, Error>
where F: FnOnce(&mut Compiler) -> Result<(), Error>,

Creates a new chunk and a scope for compilation. Within the provided function, all compiler methods that add or edit instructions will do so to the new chunk.

Returns the unique identifier for the function and the index of the new chunk, which will be used to add corresponding InstructionKind::LoadConst and InstructionKind::StoreVar instructions to the parent chunk.

Source

pub fn add_symbol(&mut self, symbol: &LitSym) -> Result<usize, Error>

Adds a symbol to the symbol table at the current scope.

This is a shortcut for Compiler::add_item that creates a new Item::Symbol from the given symbol and returns the unique identifier for the symbol.

§Manual compilation

Compiler::add_symbol can be used to declare the existence of uninitialized variables. This is useful for creating a symbol and acquiring its unique identifier in order to manipulate it in a virtual machine.

If you do this, you must ensure that the symbol is initialized before it is used. This can be done in the virtual machine. See the cas-vm crate for an example.

Source

pub fn resolve_user_symbol_or_insert( &mut self, symbol: &LitSym, ) -> Result<usize, Error>

Resolves a path to a user-created symbol, inserting it into the symbol table if it doesn’t exist.

If the symbol name matches that of a builtin constant, one of the following will occur:

  • If this function is called from the global scope, an OverrideBuiltinConstant error is returned.
  • If this function is called anywhere else, the symbol table will successfully be updated with the new symbol. This symbol shadows the existing builtin constant, meaning the builtin will not be accessible until the scope in which this symbol was declared, ends.

Returns the unique identifier for the symbol, which can be used to reference the symbol in the bytecode.

Source

pub fn resolve_symbol(&mut self, symbol: &LitSym) -> Result<Symbol, Error>

Resolves a path to a symbol without inserting it into the symbol table. If the symbol is determined to be captured from a parent scope, the enclosing function will be marked as capturing the symbol.

Returns the unique identifier for the symbol, or an error if the symbol is not found within the current scope.

Source

pub fn add_instr(&mut self, instruction: impl Into<Instruction>)

Adds an instruction to the current chunk with no associated source code span.

Source

pub fn add_instr_with_spans( &mut self, instruction: impl Into<Instruction>, spans: Vec<Range<usize>>, )

Adds an instruction to the current chunk with an associated source code span(s).

Source

pub fn replace_instr(&mut self, idx: usize, instruction: Instruction)

Replaces an instruction at the given index in the current chunk with a new instruction.

Source

pub fn new_unassociated_label(&mut self) -> Label

Creates a unique label with no associated instruction. This label can be used to reference a specific instruction in the bytecode.

Source

pub fn new_end_label(&mut self) -> Label

Creates a unique label pointing to the end of the currently generated bytecode in the current chunk.

When this method is called and Compile::compile is called immediately after, the label will point to the first instruction generated by the compilation.

Source

pub fn set_end_label(&mut self, label: Label)

Associates the given label with the end of the currently generated bytecode.

This is useful for creating labels that point to the end of a loop, for example.

Trait Implementations§

Source§

impl Clone for Compiler

Source§

fn clone(&self) -> Compiler

Returns a copy of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for Compiler

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for Compiler

Source§

fn default() -> Self

Returns the “default value” for a type. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Az for T

Source§

fn az<Dst>(self) -> Dst
where T: Cast<Dst>,

Casts the value.
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<Src, Dst> CastFrom<Src> for Dst
where Src: Cast<Dst>,

Source§

fn cast_from(src: Src) -> Dst

Casts the value.
Source§

impl<T> CheckedAs for T

Source§

fn checked_as<Dst>(self) -> Option<Dst>
where T: CheckedCast<Dst>,

Casts the value.
Source§

impl<Src, Dst> CheckedCastFrom<Src> for Dst
where Src: CheckedCast<Dst>,

Source§

fn checked_cast_from(src: Src) -> Option<Dst>

Casts the value.
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> OverflowingAs for T

Source§

fn overflowing_as<Dst>(self) -> (Dst, bool)
where T: OverflowingCast<Dst>,

Casts the value.
Source§

impl<Src, Dst> OverflowingCastFrom<Src> for Dst
where Src: OverflowingCast<Dst>,

Source§

fn overflowing_cast_from(src: Src) -> (Dst, bool)

Casts the value.
Source§

impl<T> SaturatingAs for T

Source§

fn saturating_as<Dst>(self) -> Dst
where T: SaturatingCast<Dst>,

Casts the value.
Source§

impl<Src, Dst> SaturatingCastFrom<Src> for Dst
where Src: SaturatingCast<Dst>,

Source§

fn saturating_cast_from(src: Src) -> Dst

Casts the value.
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> UnwrappedAs for T

Source§

fn unwrapped_as<Dst>(self) -> Dst
where T: UnwrappedCast<Dst>,

Casts the value.
Source§

impl<Src, Dst> UnwrappedCastFrom<Src> for Dst
where Src: UnwrappedCast<Dst>,

Source§

fn unwrapped_cast_from(src: Src) -> Dst

Casts the value.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WrappingAs for T

Source§

fn wrapping_as<Dst>(self) -> Dst
where T: WrappingCast<Dst>,

Casts the value.
Source§

impl<Src, Dst> WrappingCastFrom<Src> for Dst
where Src: WrappingCast<Dst>,

Source§

fn wrapping_cast_from(src: Src) -> Dst

Casts the value.