typed-ski Rust (WASM arena)

This directory contains the Rust implementation of the arena allocator and reducer used by typed-ski's WASM evaluators.

Overview

This Rust code targets wasm32-unknown-unknown and runs inside WebAssembly.

It provides:

Hash-consing arena allocation with structural sharing for memory efficiency
Iterative reduction algorithm to prevent stack overflow on deep expressions
Preemptive multitasking with cooperative yielding for fair worker scheduling
Lock-free ring buffers (io_uring-style) for efficient inter-thread communication
SharedArrayBuffer support for true parallel evaluation across Web Workers
Dynamic arena growth with seqlock-based synchronization

Architecture

The arena allocator supports two modes:

Single-threaded (Heap Mode): Uses local WASM memory with lazy allocation
Multi-threaded (SAB Mode): Uses WebAssembly shared memory (SharedArrayBuffer) to enable concurrent access from multiple Web Workers, with atomic operations and lock-free synchronization for thread safety

Hash consing is implemented using a hash table with chaining, ensuring that expressions with the same structure (same left and right children) are automatically deduplicated. The arena can grow dynamically up to a fixed maximum capacity.

Thread Safety and Locking

In SAB (SharedArrayBuffer) mode, the arena uses lock-free and atomic-based concurrency control to enable high-performance concurrent access:

Optimistic Concurrency Control

Hash-consing allocation uses optimistic concurrency control:

Optimistic read: Traverse bucket chain without locking
Pre-allocate ID: Use top.fetch_add() to reserve slot atomically
CAS insertion: Attempt to insert into hash table bucket
Graceful failure: If CAS fails, slot becomes a "hole" (kind = 0)

Correctness guarantee: Canonical node IDs are always returned. Readers treat non-NonTerm kinds as terminal values, so holes don't affect evaluation.

Lock-Free Ring Buffers

Inter-thread communication uses lock-free ring buffers (io_uring-style):

Submission Queue (SQ): Main thread → Worker communication
Completion Queue (CQ): Worker → Main thread results
Atomic operations: Wait-free producer/consumer patterns
Sequence numbers: ABA prevention in concurrent access
Blocking waits: Efficient WASM atomic wait/notify

Resize Synchronization

Arena growth uses a seqlock-style approach:

Stop-the-world pauses: All threads spin during resize operations
Sequence lock: Odd values indicate resize in progress, even values indicate stable
Reverse-order copying: Prevents overlap issues during memory migration
Bucket rebuild: Hash table reconstructed after resize
Poisoning on failure: Unrecoverable errors set poison sequences

Key Optimizations

1. Hash-Consing (Structural Sharing)

The arena uses hash consing to deduplicate identical subexpressions:

Avalanche hashing: Fast, high-quality hash function for node pairs
Separate chaining: Collision resolution in hash table buckets
Memory efficiency: DAG representation instead of tree (common subexpressions shared)
O(1) lookups: Fast deduplication during expression construction

2. Iterative Reduction Algorithm

Avoids recursion stack overflow on deep expressions:

Explicit stack management: Uses Continuation nodes instead of call stack
Two-phase reduction: Left child first, then right child
Memory recycling: Dead continuation frames reused immediately
Scalable depth: No theoretical limit on expression depth

3. Preemptive Multitasking

Enables fair scheduling in parallel evaluation:

Gas-based yielding: Workers yield control when traversal budget exhausted
Suspension nodes: Capture complete evaluation state for resumption
Cooperative scheduling: Prevents worker starvation
State serialization: Evaluation context stored directly in arena

4. Lock-Free Ring Buffers

High-performance inter-thread communication:

io_uring-style design: Submission and completion queues
Atomic operations: Wait-free producer/consumer patterns
Sequence numbers: ABA prevention in concurrent access
Blocking waits: Efficient WASM atomic wait/notify

Arena Node Types

The arena supports four node types for different purposes:

Terminal: SKI combinators (S, K, I) - leaf nodes
NonTerm: Function application (left right) - core expressions
Continuation: Stack frames for iterative reduction - optimization
Suspension: Paused evaluation state for multitasking - optimization

Execution Model (Host ↔ Workers)

The host process (JS/TS) is responsible for building expressions in the arena (via allocTerminal/allocCons).
The host then submits work to workers using the SQ/CQ rings.
Workers run an exported infinite loop (workerLoop) that:
- blocks waiting for SQ entries (memory.atomic.wait32)
- reduces the submitted root using iterative algorithm with gas limits
- yields control via Suspensions when gas exhausted (preemptive multitasking)
- enqueues a CQ completion when reduction finishes or yields

Correlation IDs

Work is correlated using a 32-bit req_id chosen by the host.

SQ entry: { node_id: u32, req_id: u32 }
CQ entry: { node_id: u32, req_id: u32 }

Completions may arrive out-of-order; req_id is how the host matches results to callers.

Thread Safety & Concurrency

Hash-Consing Under Concurrency

allocCons implements optimistic concurrency control:

Optimistic read: Traverse bucket chain without locking
Pre-allocate ID: Use top.fetch_add() to reserve slot
CAS insertion: Attempt to insert into hash table bucket
Graceful failure: If CAS fails, slot becomes a "hole" (kind = 0)

Correctness guarantee: Canonical node IDs are always returned. Readers treat non-NonTerm kinds as terminal values, so holes don't affect evaluation.

Stop-the-World Resizing

Uses seqlock-style synchronization for arena growth:

Odd sequence: Resize in progress (writers spin, readers retry)
Array migration: Reverse-order copying prevents overlap issues
Bucket rebuild: Hash table reconstructed after resize
Even sequence: Stable state (normal operation resumes)

Ring Buffer Concurrency

Single producer/consumer: Each ring has dedicated thread roles
ABA prevention: Sequence numbers prevent concurrent access issues
Memory barriers: Proper acquire/release ordering for visibility
Wait/notify: Efficient blocking with WASM atomic operations

Shared Memory Safety

Typed array views: JS-side access to arena arrays
Cache coherence: 64-byte alignment prevents false sharing
Poisoning on failure: Unrecoverable errors set poison sequences
Cross-thread visibility: SharedArrayBuffer enables true parallelism

Performance Characteristics

Time Complexity

Allocation: O(1) amortized (hash table lookup + potential growth)
Hash-consing lookup: O(1) average case (hash table with chaining)
Reduction step: O(depth) with iterative algorithm (no stack overflow)
Ring operations: O(1) (lock-free with atomic CAS)

Space Complexity

Per node: ~16 bytes (4 arrays × 4 bytes + metadata)
Hash table: O(capacity) for buckets and collision chains
Structural sharing: Significant savings for common subexpressions
Memory efficiency: ~60% less memory than naive tree representation

Scalability

Expression depth: Unlimited (iterative vs recursive)
Parallel workers: Linear scaling with CPU cores
Memory growth: Dynamic resizing with seqlock synchronization
Concurrent allocations: Lock-free with optimistic concurrency

Memory Layout

The shared arena is laid out as header + fixed rings + SoA arrays.

Header (`SabHeader`)

Magic number, ring parameters, memory offsets
Capacity, bucket masks, atomic counters
max_steps, resize_seq, top for coordination

Rings (64-byte aligned)

SQ (Submission Queue): Host → Worker communication
CQ (Completion Queue): Worker → Host results
Lock-free with sequence numbers for ABA prevention

SoA Arrays (Struct of Arrays)

kind: u8[] - Node type (Terminal/NonTerm/Continuation/Suspension)
sym: u8[] - Symbol/mode data
left_id: u32[] - Left child pointers
`right_id: u32[] - Right child pointers
hash32: u32[] - Hash values for deduplication
next_idx: u32[] - Hash table collision chains
buckets: u32[] - Hash table bucket heads
Terminal cache: Fast access to S/K/I nodes

The JS/TS side constructs typed array views over these arrays for fast read-only decoding and direct memory access.

Building

The build is managed by Nix. Cargo.toml is generated by the flake.

nix run .#generate-cargo
cargo build --target wasm32-unknown-unknown --release

WASM Exports

Arena / reducer

initArena(capacity: u32) -> u32: allocate and initialize a shared arena region (returns base pointer or error code).
connectArena(ptr: u32) -> u32: connect an instance to an existing arena (returns 1 on success).
reset()
allocTerminal(sym: u32) -> u32
allocCons(left: u32, right: u32) -> u32
arenaKernelStep(expr: u32) -> u32
reduce(expr: u32, max: u32) -> u32

SQ/CQ interface

hostSubmit(node_id: u32, req_id: u32, max_steps: u32) -> u32
- returns 0 ok, 1 SQ full, 2 not connected
- max_steps is the reduction limit for this specific request (eliminates race conditions)
hostPull() -> i64
- returns -1 if CQ empty, otherwise a packed value:
  - high 32 bits: req_id
  - low 32 bits: node_id (result root)
workerLoop() -> !: blocking loop for worker instances (never returns).

Debug

debugLockState() -> u32: returns resize_seq & 1 (1 if resizing).
getArenaMode() -> u32: 0 heap mode, 1 SAB mode.
debugGetArenaBaseAddr() -> u32
debugCalculateArenaSize(capacity: u32) -> u32

WASM Imports

memory: WebAssembly.Memory (provided via env.memory) - The shared memory instance (must be created with shared: true for SAB mode)
js_allow_block() -> i32 - Returns 0 for main thread (spin-only) or 1 for worker threads (can block). This is instance-local and not stored in shared memory to avoid state collisions.

Structure

src/lib.rs: crate entry point
src/arena.rs: arena allocator + rings + reducer

typed-ski 0.8.0

typed-ski Rust (WASM arena)

Overview

Architecture

Thread Safety and Locking

Optimistic Concurrency Control

Lock-Free Ring Buffers

Resize Synchronization

Key Optimizations

1. Hash-Consing (Structural Sharing)

2. Iterative Reduction Algorithm

3. Preemptive Multitasking

4. Lock-Free Ring Buffers

Arena Node Types

Execution Model (Host ↔ Workers)

Correlation IDs

Thread Safety & Concurrency

Hash-Consing Under Concurrency

Stop-the-World Resizing

Ring Buffer Concurrency

Shared Memory Safety

Performance Characteristics

Time Complexity

Space Complexity

Scalability

Memory Layout

Header (`SabHeader`)

Rings (64-byte aligned)

SoA Arrays (Struct of Arrays)

Building

WASM Exports

Arena / reducer

SQ/CQ interface

Debug

WASM Imports

Further Reading

Structure

typed-ski 0.8.0

typed-ski Rust (WASM arena)

Overview

Architecture

Thread Safety and Locking

Optimistic Concurrency Control

Lock-Free Ring Buffers

Resize Synchronization

Key Optimizations

1. Hash-Consing (Structural Sharing)

2. Iterative Reduction Algorithm

3. Preemptive Multitasking

4. Lock-Free Ring Buffers

Arena Node Types

Execution Model (Host ↔ Workers)

Correlation IDs

Thread Safety & Concurrency

Hash-Consing Under Concurrency

Stop-the-World Resizing

Ring Buffer Concurrency

Shared Memory Safety

Performance Characteristics

Time Complexity

Space Complexity

Scalability

Memory Layout

Header (SabHeader)

Rings (64-byte aligned)

SoA Arrays (Struct of Arrays)

Building

WASM Exports

Arena / reducer

SQ/CQ interface

Debug

WASM Imports

Further Reading

Structure

Header (`SabHeader`)