pub struct Compiler {
pub chunks: Vec<Chunk>,
pub labels: HashMap<Label, Option<(usize, usize)>>,
pub sym_table: SymbolTable,
/* private fields */
}
Expand description
A compiler that provides tools to generate bytecode instructions for cas-rs
’s virtual machine
(see Vm
).
Note: If you’re looking to run a CalcScript program, you should use the Vm
struct found
in cas-vm
instead.
This is the main entry point to the compiler. The compiler translates a CalcScript AST
(produced by cas_parser
) into a series of bytecode Chunk
s, which can then be executed
by cas-rs
’s Vm
. It also is mostly responsible for managing CalcScript’s semantics through
lexical scoping and symbol resolution, value stack layout, and generation of chunks. These
details are described later in this documentation, but are not important if you’re just looking
to run a program.
To compile a complete program, it is recommended that you use Compiler::compile_program
,
which is the easiest way to ensure the resulting bytecode is valid. However, there are also a
number of other methods that can be used to manually compile CalcScript. There is one important
rule to keep in mind when taking this approach (which would otherwise be handled by the
compiler), which requires a quick explanation of Vm
’s value stack.
During execution, Vm
uses a value stack to keep track of values generated by and used
around the bytecode. The compiler must ensure that the instructions it generates manipulates
the value stack’s semantics correctly.
The most important rule is that the value stack must have exactly one value on it when the
program finishes executing. This is the value that is returned by the program (printed when
using the cas-rs
REPL). When manually compiling a program, you must ensure that each
statement’s instructions leave no value on the stack when the statement completes, except for
the last statement in a block or chunk.
(Note that failing to uphold this rule will never result in undefined behavior; it will most likely either panic or result in an error during execution.)
Most CalcScript programs consist of a sequence of statements, for example:
x = 3
y = 4
z = hypot(x, y)
In this case, the compiler generates these instructions:
use cas_compiler::Compiler;
use cas_parser::parser::Parser;
let ast = Parser::new("x = 3
y = 4
z = hypot(x, y)").try_parse_full_many().unwrap();
let compiler = Compiler::compile_program(ast).unwrap();
use cas_compiler::{item::Symbol, Instruction, InstructionKind::*};
assert_eq!(compiler.chunks[0].instructions, vec![
// x = 3
Instruction { kind: LoadConst(3.into()), spans: vec![] },
Instruction { kind: StoreVar(0), spans: vec![] },
Instruction { kind: Drop, spans: vec![] },
// x = 4
Instruction { kind: LoadConst(4.into()), spans: vec![] },
Instruction { kind: StoreVar(1), spans: vec![] },
Instruction { kind: Drop, spans: vec![] },
// z = hypot(x, y)
Instruction { kind: LoadVar(Symbol::User(0)), spans: vec![22..23] },
Instruction { kind: LoadVar(Symbol::User(1)), spans: vec![25..26] },
Instruction { kind: LoadVar(Symbol::Builtin("hypot")), spans: vec![16..21] },
Instruction { kind: Call(2), spans: vec![16..22, 26..27, 22..23, 25..26] },
Instruction { kind: StoreVar(2), spans: vec![] }
]);
Notice that each statement is terminated by a InstructionKind::Drop
instruction, except for
the last one. For example, the first statement, x = 3
, loads the constant 3
onto the stack.
The InstructionKind::StoreVar
instruction stores the value into the variable x
(0), but
does not remove it from the stack. The Drop
instruction then removes the value from the
stack, leaving the stack empty. (It is more optimal to use InstructionKind::AssignVar
in
this case, but the compiler does not implement this behavior yet.)
The final statement, z = hypot(x, y)
, stores the computed value into the variable z
(2),
but does not drop the value from the stack, making it the final value on the stack, and thus
the return value of the program.
You need to be mindful of this behavior when manually compiling programs.
Fields§
§chunks: Vec<Chunk>
The bytecode chunks generated by the compiler.
The entire program is represented as multiple chunks of bytecode, where each chunk represents a function body. The first chunk represents the implicit “main” function.
labels: HashMap<Label, Option<(usize, usize)>>
Labels generated by the compiler, mapped to the index of the instruction they reference.
When created, labels aren’t associated with any instruction. Before the bytecode is executed, the compiler will resolve these labels to the actual instruction indices.
sym_table: SymbolTable
A symbol table that maps identifiers to information about the values they represent.
This is used to store information about variables and functions that are defined in the program.
Implementations§
Source§impl Compiler
impl Compiler
Sourcepub fn compile<T: Compile>(expr: T) -> Result<Self, Error>
pub fn compile<T: Compile>(expr: T) -> Result<Self, Error>
Compiles the given type into a sequence of Instruction
s.
Sourcepub fn compile_program(stmts: Vec<Stmt>) -> Result<Self, Error>
pub fn compile_program(stmts: Vec<Stmt>) -> Result<Self, Error>
Compiles multiple statements into a sequence of Instruction
s.
Sourcepub fn with_state<F, G>(
&mut self,
modify_state: F,
compile: G,
) -> Result<(), Error>
pub fn with_state<F, G>( &mut self, modify_state: F, compile: G, ) -> Result<(), Error>
Creates a new compilation scope with the given modified state. Compilation that occurs in this scope will then use the modified state.
Sourcepub fn add_item(&mut self, symbol: &LitSym, item: Item) -> Result<(), Error>
pub fn add_item(&mut self, symbol: &LitSym, item: Item) -> Result<(), Error>
Add an item to the symbol table at the current scope.
If the item to add matches that of a builtin item, one of the following will occur:
- If this function is called from the global scope, an
OverrideBuiltinConstant
orOverrideBuiltinFunction
error is returned. - If this function is called anywhere else, the symbol table will successfully be updated with the new item. This item shadows the existing builtin, meaning the builtin will not be accessible until the scope in which this item was declared, ends.
Sourcepub fn new_scope<F>(&mut self, f: F) -> Result<(), Error>
pub fn new_scope<F>(&mut self, f: F) -> Result<(), Error>
Creates a new scope in the symbol table. Within the provided function, all compiler
methods that add or mutate symbols will do so in the new scope.
The scope is popped off the symbol table stack when the function returns. If no symbols were added to the scope, it will not be added to the symbol table.
Sourcepub fn new_chunk<F>(
&mut self,
header: &FuncHeader,
f: F,
) -> Result<NewChunk, Error>
pub fn new_chunk<F>( &mut self, header: &FuncHeader, f: F, ) -> Result<NewChunk, Error>
Creates a new chunk and a scope for compilation. Within the provided function, all
compiler
methods that add or edit instructions will do so to the new chunk.
Returns the unique identifier for the function and the index of the new chunk, which will
be used to add corresponding InstructionKind::LoadConst
and InstructionKind::StoreVar
instructions to the parent chunk.
Sourcepub fn add_symbol(&mut self, symbol: &LitSym) -> Result<usize, Error>
pub fn add_symbol(&mut self, symbol: &LitSym) -> Result<usize, Error>
Adds a symbol to the symbol table at the current scope.
This is a shortcut for Compiler::add_item
that creates a new Item::Symbol
from the
given symbol and returns the unique identifier for the symbol.
§Manual compilation
Compiler::add_symbol
can be used to declare the existence of uninitialized variables.
This is useful for creating a symbol and acquiring its unique identifier in order to
manipulate it in a virtual machine.
If you do this, you must ensure that the symbol is initialized before it is used. This can
be done in the virtual machine. See the cas-vm
crate for an example.
Sourcepub fn resolve_user_symbol_or_insert(
&mut self,
symbol: &LitSym,
) -> Result<usize, Error>
pub fn resolve_user_symbol_or_insert( &mut self, symbol: &LitSym, ) -> Result<usize, Error>
Resolves a path to a user-created symbol, inserting it into the symbol table if it doesn’t exist.
If the symbol name matches that of a builtin constant, one of the following will occur:
- If this function is called from the global scope, an
OverrideBuiltinConstant
error is returned. - If this function is called anywhere else, the symbol table will successfully be updated with the new symbol. This symbol shadows the existing builtin constant, meaning the builtin will not be accessible until the scope in which this symbol was declared, ends.
Returns the unique identifier for the symbol, which can be used to reference the symbol in the bytecode.
Sourcepub fn resolve_symbol(&mut self, symbol: &LitSym) -> Result<Symbol, Error>
pub fn resolve_symbol(&mut self, symbol: &LitSym) -> Result<Symbol, Error>
Resolves a path to a symbol without inserting it into the symbol table. If the symbol is determined to be captured from a parent scope, the enclosing function will be marked as capturing the symbol.
Returns the unique identifier for the symbol, or an error if the symbol is not found within the current scope.
Sourcepub fn add_instr(&mut self, instruction: impl Into<Instruction>)
pub fn add_instr(&mut self, instruction: impl Into<Instruction>)
Adds an instruction to the current chunk with no associated source code span.
Sourcepub fn add_instr_with_spans(
&mut self,
instruction: impl Into<Instruction>,
spans: Vec<Range<usize>>,
)
pub fn add_instr_with_spans( &mut self, instruction: impl Into<Instruction>, spans: Vec<Range<usize>>, )
Adds an instruction to the current chunk with an associated source code span(s).
Sourcepub fn replace_instr(&mut self, idx: usize, instruction: Instruction)
pub fn replace_instr(&mut self, idx: usize, instruction: Instruction)
Replaces an instruction at the given index in the current chunk with a new instruction.
Sourcepub fn new_unassociated_label(&mut self) -> Label
pub fn new_unassociated_label(&mut self) -> Label
Creates a unique label with no associated instruction. This label can be used to reference a specific instruction in the bytecode.
Sourcepub fn new_end_label(&mut self) -> Label
pub fn new_end_label(&mut self) -> Label
Creates a unique label pointing to the end of the currently generated bytecode in the current chunk.
When this method is called and Compile::compile
is called immediately after, the label
will point to the first instruction generated by the compilation.
Sourcepub fn set_end_label(&mut self, label: Label)
pub fn set_end_label(&mut self, label: Label)
Associates the given label with the end of the currently generated bytecode.
This is useful for creating labels that point to the end of a loop, for example.