Expand description
RuvLLM ESP32 - Tiny LLM Inference for Microcontrollers
This crate provides a minimal inference engine designed for ESP32 and similar resource-constrained microcontrollers.
§Constraints
- ~520KB SRAM available
- 4-16MB flash for model storage
- No floating-point unit on base ESP32 (ESP32-S3 has one)
- Single/dual core @ 240MHz
§Features
- INT8 quantized inference
- Fixed-point arithmetic option
- Tiny transformer blocks
- Memory-mapped model loading
- Optional ESP32-S3 SIMD acceleration
Re-exports§
pub use micro_inference::MicroEngine;pub use micro_inference::InferenceConfig;pub use micro_inference::InferenceResult;pub use quantized::QuantizedTensor;pub use quantized::QuantizationType;pub use model::TinyModel;pub use model::ModelConfig;pub use optimizations::BinaryVector;pub use optimizations::BinaryEmbedding;pub use optimizations::hamming_distance;pub use optimizations::hamming_similarity;pub use optimizations::ProductQuantizer;pub use optimizations::PQCode;pub use optimizations::SoftmaxLUT;pub use optimizations::ExpLUT;pub use optimizations::DistanceLUT;pub use optimizations::MicroLoRA;pub use optimizations::LoRAConfig;pub use optimizations::SparseAttention;pub use optimizations::AttentionPattern;pub use optimizations::LayerPruner;pub use optimizations::PruningConfig;pub use federation::FederationConfig;pub use federation::FederationMode;pub use federation::FederationSpeedup;pub use federation::PipelineNode;pub use federation::PipelineConfig;pub use federation::PipelineRole;pub use federation::FederationMessage;pub use federation::MessageType;pub use federation::ChipId;pub use federation::FederationCoordinator;pub use federation::ClusterTopology;pub use federation::MicroFastGRNN;pub use federation::MicroGRNNConfig;pub use federation::SpeculativeDecoder;pub use federation::DraftVerifyConfig;
Modules§
- attention
- Attention mechanisms for ESP32
- benchmark
- Benchmark Suite for RuvLLM ESP32
- diagnostics
- Error Diagnostics with Fix Suggestions
- embedding
- Embedding operations for ESP32
- federation
- Federation Module for Multi-ESP32 Distributed Inference
- micro_
inference - Micro Inference Engine for ESP32
- model
- Model definition and loading for ESP32
- models
- Model Zoo - Pre-quantized Models for RuvLLM ESP32
- optimizations
- Advanced Optimizations from Ruvector
- ota
- Over-the-Air (OTA) Update System for RuvLLM ESP32
- prelude
- Prelude for common imports
- quantized
- Quantized tensor operations for memory-efficient inference
- ruvector
- RuVector Integration for ESP32
Enums§
- Error
- Error types for ESP32 inference
- Esp32
Variant - Memory budget for ESP32 variants