Expand description
Testing utilities for the ferrum inference engine.
- Mocks: MockModelExecutor, MockSampler, MockKvCacheManager, MockTokenizer
- Configurable: ConfigurableModelExecutor (specific token sequences, EOS)
- Bench: BenchmarkResult, percentile calculation, JSON output
- Paged: PagedAttentionExecutor with real paged KV cache
All components are hardware-independent (CPU-only, no GPU required).
Re-exports§
pub use paged_executor::PagedAttentionExecutor;pub use paged_executor::PagedExecutorConfig;
Modules§
- bench
- Benchmark result types and utilities.
- op_diff
- Cross-backend op-diff harness — PLAYBOOK § 3 L1.
- paged_
executor - Model executor that uses PagedAttention KV cache.
Structs§
- Configurable
Model Executor - Model executor that produces a configurable sequence of tokens.
- Mock
KvCache Handle - Mock KV cache handle — tracks block metadata without allocating real memory.
- Mock
KvCache Manager - Mock KV cache manager — tracks allocations in memory, simulates block limits.
- Mock
Model Executor - Mock model executor that simulates prefill/decode with configurable latency. No model weights, no GPU — pure async simulation.
- Mock
Sampler - Greedy sampler — always picks the token with highest logit. Deterministic, no temperature or top-k.
- Mock
Tensor - A mock tensor that stores shape and optional f32 data. No GPU, no Candle — pure Rust.
- Mock
Tensor Factory - Mock tensor factory implementing TensorFactory without any ML backend.
- Mock
Tokenizer - Mock tokenizer: splits on whitespace, assigns sequential token IDs. EOS token is vocab_size - 1.