# CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
## Project Overview
QuickPython is a lightweight Python bytecode VM written in Rust. It compiles Python source to custom bytecode and executes it on a stack-based virtual machine. The project prioritizes being small, fast to start, and easy to extend over full CPython compatibility.
**Current version:** 0.1.2
**Test coverage:** 254 tests passing
## Development Commands
### Building
```bash
cargo build # Debug build
cargo build --release # Release build
cargo build --workspace # Build all workspace members (includes quickpython-llm, quickpython-demo)
```
### Testing
```bash
cargo test # Run all tests
cargo test --lib # Run only library tests (254 tests)
cargo test test_name # Run specific test by name
cargo test async # Run all async-related tests
cargo test --workspace # Run tests for all workspace members
```
### Code Quality
```bash
cargo fmt # Format code (REQUIRED before commit)
cargo clippy --workspace -- -D warnings # Lint code (REQUIRED before commit)
```
**IMPORTANT:** All commits must pass both `cargo fmt` and `cargo clippy --workspace -- -D warnings` with no errors. Run these commands before committing any code changes.
### Running
```bash
cargo run -- run script.py # Execute Python file
cargo run -- compile script.py # Compile to bytecode (.pyq)
cargo run -- compile script.py -o out.pyq
cargo run --release -- run script.py # Run with optimizations
```
### Examples
```bash
cargo run -- run examples/async_sleep.py
cargo run -- run examples/11_comprehensive.py
```
## Architecture
### Core Pipeline
```
Python Source → rustpython_parser → AST → Compiler → ByteCode → VM → Result
```
The project uses RustPython's parser for the frontend and implements a custom bytecode compiler and stack-based VM.
### Key Components
**src/compiler.rs** - AST → ByteCode compiler
- Uses `rustpython_parser` to parse Python source into AST
- Compiles AST to custom bytecode instructions
- Handles function definitions, control flow, comprehensions, async/await
- Maintains local variable tracking and loop stack for break/continue
**src/vm.rs** - Stack-based virtual machine executor (2500+ lines)
- Main execution loop with giant match on `Instruction` enum
- Stack-based architecture: operands pushed/popped from `Vec<Value>`
- Frame-based function calls with separate `locals` per frame
- Exception handling with try/except/finally blocks
- Async/await support integrated with Tokio runtime
- **Critical**: Await instruction uses `Runtime::block_on` to maintain sync API while supporting async operations
**src/value.rs** - Runtime value representation
- `enum Value` represents all Python values (Int, Float, Bool, String, List, Dict, Tuple, etc.)
- Reference-counted heap objects (List, Dict, etc.) using `Rc<RefCell<...>>`
- Special types: Coroutine (async functions), AsyncSleep (for asyncio.sleep)
- Type objects for isinstance() checks
**src/bytecode.rs** - Bytecode instruction set
- Simple fixed-size instruction enum (not yet optimized with variable-length encoding)
- Instructions include: arithmetic ops, comparisons, control flow, function calls, await, etc.
- `MakeFunction` includes `is_async: bool` flag for async functions
**src/context.rs** - Public API facade
- `Context` provides high-level `eval()` method for running Python code
- Manages VM instance and global variables HashMap
- Extension module registration via `register_extension_module()`
**src/serializer.rs** - Bytecode serialization (.pyq files)
- Serializes bytecode to binary format for faster loading
- Not all instructions are serializable yet (returns errors for unsupported ones)
**src/builtins/** - Built-in modules (json, os, re, asyncio)
- Each module exports a `create_module()` function returning a `Module`
- Functions are `NativeFunction` types (Rust function pointers)
- `asyncio.rs`: asyncio.sleep() returns `Value::AsyncSleep` that VM's Await instruction recognizes
### Value System Design
Current implementation uses `enum Value` with `Rc<RefCell<...>>` for heap objects (not yet NaN boxing as described in spec/architecture.md). The spec describes future NaN boxing optimization, but current code is simpler:
```rust
pub enum Value {
Int(i32),
Float(f64),
Bool(bool),
None,
String(String),
List(Rc<RefCell<ListValue>>),
Dict(Rc<RefCell<HashMap<DictKey, Value>>>),
Tuple(Rc<Vec<Value>>),
Function(Function),
Coroutine(Function, Vec<Value>), // async function + captured args
AsyncSleep(f64), // Special marker for asyncio.sleep
// ... more types
}
```
### Async/Await Implementation
**Two-phase approach:**
1. Async functions return `Coroutine` objects when called (not immediately executed)
2. `await` expression executes the coroutine
**Key insight:** VM maintains sync API by using Tokio's `Runtime::block_on` in Await instruction:
```rust
Instruction::Await => {
match coroutine {
Value::Coroutine(func, args) => {
// Create new frame and execute synchronously
}
Value::AsyncSleep(seconds) => {
// Create Tokio runtime and block_on async sleep
let rt = Runtime::new()?;
rt.block_on(async {
tokio::time::sleep(Duration::from_secs_f64(seconds)).await;
});
}
}
}
```
This allows:
- ✅ True async I/O (real sleep using Tokio)
- ✅ Maintains synchronous `Context::eval()` API
- ✅ No breaking changes to existing code
- ⚠️ Creates new runtime per await (could be optimized with shared runtime)
### Module System
Three types of modules:
1. **Builtin modules** (json, os, re, asyncio) - compiled into binary
- Registered in `src/builtins/mod.rs`
- VM checks `is_builtin_module()` during import
2. **Extension modules** - registered at runtime via `Context::register_extension_module()`
- See `quickpython-llm/` and `quickpython-demo/` for examples
- Extension creates Module with functions/attributes
3. **Python modules** - not yet implemented
## Testing Strategy
Tests are in `src/main.rs` under `#[cfg(test)] mod tests`.
Each task (Tasks 001-039) has associated tests. When adding features:
1. Add tests in the `tests` module
2. Test both success cases and error cases
3. For async features, include timing verification (use `std::time::Instant`)
4. Update test count in README and CHANGELOG when adding tests
## Common Patterns
### Adding a new builtin function
1. Add to appropriate module in `src/builtins/*.rs`:
```rust
fn my_function(args: Vec<Value>) -> Result<Value, Value> {
if args.len() != 1 {
return Err(Value::error(ExceptionType::TypeError, "..."));
}
Ok(Value::Int(42))
}
```
2. Register in `create_module()`:
```rust
module.attributes.insert("my_func".to_string(), Value::NativeFunction(my_function));
```
### Adding a new bytecode instruction
1. Add variant to `Instruction` enum in `src/bytecode.rs`
2. Add compiler support in `src/compiler.rs` (in appropriate `compile_*` method)
3. Add VM execution in `src/vm.rs` (in main match statement around line 700-2000)
4. Add serializer support in `src/serializer.rs` (or return error if not serializable)
### Desugaring complex syntax
List comprehensions and f-strings are **desugared** to simpler instructions during compilation rather than having dedicated bytecode. See `compile_expr()` for examples:
- List comprehensions → for loop + append
- F-strings → concatenation of str() calls
This keeps bytecode simpler at cost of larger bytecode size.
## Workspace Structure
- **quickpython/** - Main library and CLI
- **quickpython-llm/** - Extension module example (wraps OpenAI-compatible APIs)
- **quickpython-demo/** - Demo application using quickpython-llm extension
Extension modules are separate crates that create `Module` objects and register them with Context.
## Important Notes
- **Value equality:** Lists and tuples use content equality (not pointer equality)
- **String iteration:** Implemented for for-loops and list comprehensions (Task 039)
- **Type objects:** `list`, `dict`, `int`, etc. are reserved keywords (can't use as variable names)
- **Iterator safety:** VM detects list modification during iteration (raises RuntimeError)
- **Async execution:** Sequential only (no concurrent execution of multiple coroutines yet)
## Task System
Tasks are documented in `tasks/*.md`. Each task has:
- Implementation description
- Test cases
- Status (completed/pending)
When implementing new tasks:
1. Create task file in `tasks/NNN-feature-name.md`
2. Implement feature across relevant files
3. Add comprehensive tests
4. Update CHANGELOG.md
5. Mark task as completed
## Spec Documents
`spec/` contains design documents (some aspirational):
- `spec/architecture.md` - Describes NaN boxing and other future optimizations (not all implemented)
- `spec/import-system.md` - Module import mechanism
- `spec/exception-system.md` - Exception handling design
**Note:** Spec files describe ideal architecture; actual implementation is simpler in v0.1.1.
## Programming Principles
This project follows the engineering philosophy of Linus Torvalds and John Carmack:
### Code Quality Over Cleverness
- **Simple, obvious code beats clever code** - If it's hard to understand, it's wrong
- **Readability first** - Code is read far more often than written
- **No premature optimization** - Make it work, make it right, then make it fast
- **Avoid abstraction layers** - Every layer has a cost; only add them when clearly beneficial
### Performance Through Simplicity
- **Measure, don't guess** - Profile before optimizing
- **Data structure choice matters more than algorithm tricks** - Cache-friendly data structures win
- **Keep hot paths simple** - The VM main loop should be straightforward, not clever
### Engineering Discipline
- **Fix the root cause, not symptoms** - Understand the problem before coding the solution
- **Delete code aggressively** - The best code is no code; remove unused features/abstractions
- **Small, focused changes** - Large refactors are where bugs hide
- **When in doubt, keep it simple** - Complexity is the enemy of reliability
### Specific to This Project
- **Desugaring over new bytecode** - F-strings and list comprehensions desugar to simple instructions
- **Sync API with async internals** - Use `block_on` rather than forcing callers to be async
- **Reference counting over complex GC** - Simple, predictable, deterministic
- **Content equality over pointer equality** - Lists/tuples compare by value, not identity
When adding features, ask:
1. Can this be implemented by desugaring to existing instructions?
2. Does this add a new abstraction layer, or simplify an existing one?
3. Will this still make sense when reading the code in 6 months?
4. Is there a simpler way that's "good enough"?