Expand description
§RuvLLM - LLM Serving Runtime with Ruvector Integration
RuvLLM is an edge-focused LLM serving runtime designed for portable, high-performance inference across heterogeneous hardware. It integrates with Ruvector for intelligent memory capabilities, enabling continuous self-improvement through SONA learning.
§Architecture
RuvLLM uses Ruvector as a unified memory layer with three distinct roles:
- Policy Memory Store: Learned thresholds and parameters for runtime decisions
- Session State Index: Multi-turn conversation state with KV cache references
- Witness Log Index: Audit logging with semantic search capabilities
§Key Components
PagedAttention: Memory-efficient attention mechanism with page tablesTwoTierKvCache: FP16 tail + quantized store for optimal memory/quality tradeoffAdapterManager: LoRA adapter loading and hot-swappingSessionManager: Session lifecycle and state managementPolicyStore: Ruvector-backed policy storage with semantic searchWitnessLog: Audit logging with HNSW-indexed semantic searchSonaIntegration: Three-tier learning loop integration
§Example
ⓘ
use ruvllm::{RuvLLMConfig, RuvLLMEngine};
// Create engine with default configuration
let config = RuvLLMConfig::default();
let engine = RuvLLMEngine::new(config)?;
// Create a session
let session = engine.create_session("user-123")?;
// Process a request
let response = engine.process(&session, "Hello, world!")?;Re-exports§
pub use adapter_manager::AdapterManager;pub use adapter_manager::LoraAdapter;pub use adapter_manager::AdapterConfig;pub use autodetect::SystemCapabilities;pub use autodetect::Platform;pub use autodetect::Architecture;pub use autodetect::CpuFeatures;pub use autodetect::GpuCapabilities;pub use autodetect::GpuBackend;pub use autodetect::CoreInfo;pub use autodetect::ComputeBackend;pub use autodetect::InferenceConfig;pub use lora::MicroLoRA;pub use lora::MicroLoraConfig;pub use lora::TargetModule;pub use lora::AdaptFeedback;pub use lora::AdapterRegistry;pub use lora::AdapterPool;pub use lora::AdapterComposer;pub use lora::CompositionStrategy;pub use lora::TrainingPipeline;pub use lora::TrainingConfig;pub use lora::EwcRegularizer;pub use lora::LearningRateSchedule;pub use backends::create_backend;pub use backends::DeviceType;pub use backends::DType;pub use backends::GenerateParams;pub use backends::GeneratedToken;pub use backends::LlmBackend;pub use backends::ModelArchitecture;pub use backends::ModelConfig;pub use backends::ModelInfo;pub use backends::Quantization;pub use backends::SpecialTokens;pub use backends::StreamEvent;pub use backends::TokenStream;pub use backends::Tokenizer;pub use backends::CandleBackend;pub use backends::AsyncTokenStream;pub use backends::LlmBackendAsync;pub use error::RuvLLMError;pub use error::Result;pub use kv_cache::TwoTierKvCache;pub use kv_cache::KvCacheConfig;pub use kv_cache::CacheTier;pub use kv_cache::CacheQuantization;pub use kv_cache::KvCacheStats;pub use kv_cache::PooledKvCache;pub use kv_cache::PooledKvBlock;pub use kv_cache::PooledKvCacheStats;pub use memory_pool::InferenceArena;pub use memory_pool::ArenaStats;pub use memory_pool::BufferPool;pub use memory_pool::BufferSize;pub use memory_pool::PooledBuffer;pub use memory_pool::BufferPoolStats;pub use memory_pool::ScratchSpaceManager;pub use memory_pool::ScratchSpace;pub use memory_pool::ScratchStats;pub use memory_pool::MemoryManager;pub use memory_pool::MemoryManagerConfig;pub use memory_pool::MemoryManagerStats;pub use memory_pool::CACHE_LINE_SIZE;pub use memory_pool::DEFAULT_ALIGNMENT;pub use paged_attention::PagedAttention;pub use paged_attention::PagedAttentionConfig;pub use paged_attention::PageTable;pub use paged_attention::PageBlock;pub use policy_store::PolicyStore;pub use policy_store::PolicyEntry;pub use policy_store::PolicyType;pub use policy_store::QuantizationPolicy;pub use policy_store::RouterPolicy;pub use session::SessionManager;pub use session::Session;pub use session::SessionConfig;pub use session_index::SessionIndex;pub use session_index::SessionState;pub use session_index::KvCacheReference;pub use sona::SonaIntegration;pub use sona::SonaConfig;pub use sona::LearningLoop;pub use claude_flow::ClaudeFlowAgent;pub use claude_flow::ClaudeFlowTask;pub use claude_flow::AgentRouter;pub use claude_flow::AgentType;pub use claude_flow::RoutingDecision as AgentRoutingDecision;pub use claude_flow::TaskClassifier;pub use claude_flow::TaskType;pub use claude_flow::ClassificationResult;pub use claude_flow::FlowOptimizer;pub use claude_flow::OptimizationConfig;pub use claude_flow::OptimizationResult;pub use claude_flow::HnswRouter;pub use claude_flow::HnswRouterConfig;pub use claude_flow::HnswRouterStats;pub use claude_flow::HnswRoutingResult;pub use claude_flow::HnswDistanceMetric;pub use claude_flow::TaskPattern;pub use claude_flow::HybridRouter;pub use claude_flow::ClaudeModel;pub use claude_flow::MessageRole;pub use claude_flow::ContentBlock;pub use claude_flow::Message;pub use claude_flow::ClaudeRequest;pub use claude_flow::ClaudeResponse;pub use claude_flow::UsageStats;pub use claude_flow::StreamToken;pub use claude_flow::StreamEvent as ClaudeStreamEvent;pub use claude_flow::QualityMonitor;pub use claude_flow::ResponseStreamer;pub use claude_flow::StreamStats;pub use claude_flow::ContextWindow;pub use claude_flow::ContextManager;pub use claude_flow::AgentState;pub use claude_flow::AgentContext;pub use claude_flow::WorkflowStep;pub use claude_flow::WorkflowResult;pub use claude_flow::StepResult;pub use claude_flow::AgentCoordinator;pub use claude_flow::CoordinatorStats;pub use claude_flow::CostEstimator;pub use claude_flow::LatencyTracker;pub use claude_flow::LatencySample;pub use claude_flow::LatencyStats as ClaudeLatencyStats;pub use claude_flow::ComplexityFactors;pub use claude_flow::ComplexityWeights;pub use claude_flow::ComplexityScore;pub use claude_flow::TaskComplexityAnalyzer;pub use claude_flow::AnalyzerStats as ModelAnalyzerStats;pub use claude_flow::SelectionCriteria;pub use claude_flow::ModelRoutingDecision;pub use claude_flow::ModelSelector;pub use claude_flow::SelectorStats;pub use claude_flow::ModelRouter;pub use claude_flow::HooksIntegration;pub use claude_flow::HooksConfig;pub use claude_flow::PreTaskInput;pub use claude_flow::PreTaskResult;pub use claude_flow::PostTaskInput;pub use claude_flow::PostTaskResult;pub use claude_flow::PreEditInput;pub use claude_flow::PreEditResult;pub use claude_flow::PostEditInput;pub use claude_flow::PostEditResult;pub use claude_flow::SessionState as HooksSessionState;pub use claude_flow::SessionEndResult;pub use claude_flow::SessionMetrics;pub use claude_flow::PatternMatch;pub use claude_flow::QualityAssessment;pub use claude_flow::LearningMetrics;pub use optimization::InferenceMetrics;pub use optimization::MetricsCollector;pub use optimization::MetricsSnapshot;pub use optimization::MovingAverage;pub use optimization::LatencyHistogram;pub use optimization::RealtimeOptimizer;pub use optimization::RealtimeConfig;pub use optimization::BatchSizeStrategy;pub use optimization::KvCachePressurePolicy;pub use optimization::TokenBudgetAllocation;pub use optimization::SpeculativeConfig;pub use optimization::OptimizationDecision;pub use optimization::SonaLlm;pub use optimization::SonaLlmConfig;pub use optimization::TrainingSample;pub use optimization::AdaptationResult;pub use optimization::LearningLoopStats;pub use optimization::ConsolidationStrategy;pub use optimization::OptimizationTrigger;pub use tokenizer::RuvTokenizer;pub use tokenizer::ChatMessage;pub use tokenizer::ChatTemplate;pub use tokenizer::Role;pub use tokenizer::TokenizerSpecialTokens;pub use tokenizer::StreamingDecodeBuffer;pub use speculative::SpeculativeDecoder;pub use speculative::SpeculativeConfig as SpeculativeDecodingConfig;pub use speculative::SpeculativeStats;pub use speculative::AtomicSpeculativeStats;pub use speculative::VerificationResult;pub use speculative::SpeculationTree;pub use speculative::TreeNode;pub use speculative::softmax;pub use speculative::log_softmax;pub use speculative::sample_from_probs;pub use speculative::top_k_filter;pub use speculative::top_p_filter;pub use witness_log::WitnessLog;pub use witness_log::WitnessEntry;pub use witness_log::LatencyBreakdown;pub use witness_log::RoutingDecision;pub use witness_log::AsyncWriteConfig;pub use witness_log::WitnessLogStats;pub use gguf::GgufFile;pub use gguf::GgufModelLoader;pub use gguf::GgufHeader;pub use gguf::GgufValue;pub use gguf::GgufQuantType;pub use gguf::TensorInfo;pub use gguf::QuantizedTensor;pub use gguf::ModelConfig as GgufModelConfig;pub use gguf::GgufLoader;pub use gguf::LoadConfig;pub use gguf::LoadProgress;pub use gguf::LoadedWeights;pub use gguf::LoadedTensor;pub use gguf::TensorCategory;pub use gguf::TensorNameMapper;pub use gguf::StreamingLoader;pub use gguf::ModelInitializer;pub use gguf::ModelWeights;pub use gguf::LayerWeights;pub use gguf::WeightTensor;pub use gguf::QuantizedWeight;pub use gguf::ProgressModelBuilder;pub use hub::ModelDownloader;pub use hub::DownloadConfig;pub use hub::DownloadProgress;pub use hub::DownloadError;pub use hub::ChecksumVerifier;pub use hub::ModelUploader;pub use hub::UploadConfig;pub use hub::UploadProgress;pub use hub::UploadError;pub use hub::ModelMetadata;pub use hub::RuvLtraRegistry;pub use hub::ModelInfo as HubModelInfo;pub use hub::ModelSize;pub use hub::QuantizationLevel;pub use hub::HardwareRequirements;pub use hub::get_model_info;pub use hub::ModelCard;pub use hub::ModelCardBuilder;pub use hub::TaskType as HubTaskType;pub use hub::Framework;pub use hub::License;pub use hub::DatasetInfo;pub use hub::MetricResult;pub use hub::ProgressBar;pub use hub::ProgressIndicator;pub use hub::ProgressStyle;pub use hub::ProgressCallback;pub use hub::MultiProgress;pub use hub::HubError;pub use hub::default_cache_dir;pub use hub::get_hf_token;pub use serving::InferenceRequest;pub use serving::RequestId;pub use serving::Priority;pub use serving::RequestState;pub use serving::RunningRequest;pub use serving::CompletedRequest;pub use serving::FinishReason;pub use serving::TokenOutput;pub use serving::BatchedRequest;pub use serving::BatchStats;pub use serving::ScheduledBatch;pub use serving::IterationPlan;pub use serving::PrefillTask;pub use serving::DecodeTask;pub use serving::TokenBudget;pub use serving::KvCacheManager;pub use serving::KvCachePoolConfig;pub use serving::KvCacheAllocation;pub use serving::KvCacheManagerStats;pub use serving::ContinuousBatchScheduler;pub use serving::IterationScheduler;pub use serving::SchedulerConfig;pub use serving::SchedulerStats;pub use serving::RequestQueue;pub use serving::PreemptionMode;pub use serving::PriorityPolicy;pub use serving::ServingEngine;pub use serving::ServingEngineConfig;pub use serving::ServingMetrics;pub use serving::GenerationResult;pub use quantize::RuvltraQuantizer;pub use quantize::QuantConfig;pub use quantize::TargetFormat;pub use quantize::quantize_ruvltra_q4;pub use quantize::quantize_ruvltra_q5;pub use quantize::quantize_ruvltra_q8;pub use quantize::dequantize_for_ane;pub use quantize::estimate_memory_q4;pub use quantize::estimate_memory_q5;pub use quantize::estimate_memory_q8;pub use quantize::MemoryEstimate;pub use quantize::Q4KMBlock;pub use quantize::Q5KMBlock;pub use quantize::Q8Block;pub use quantize::QuantProgress;pub use quantize::QuantStats;pub use training::ClaudeTaskDataset;pub use training::ClaudeTaskExample;pub use training::TaskCategory;pub use training::TaskMetadata;pub use training::ComplexityLevel;pub use training::DomainType;pub use training::DatasetConfig;pub use training::AugmentationConfig;pub use training::DatasetGenerator;pub use training::DatasetStats;pub use training::GrpoConfig;pub use training::GrpoOptimizer;pub use training::GrpoSample;pub use training::GrpoStats;pub use training::GrpoUpdateResult;pub use training::GrpoBatch;pub use training::SampleGroup;pub use training::McpToolTrainer;pub use training::McpTrainingConfig;pub use training::ToolTrajectory;pub use training::TrajectoryStep;pub use training::TrajectoryBuilder;pub use training::StepBuilder;pub use training::TrajectoryMetadata;pub use training::TrainingResult;pub use training::TrainingStats;pub use training::TrainingCheckpoint;pub use training::EvaluationMetrics;pub use training::ToolCallDataset;pub use training::ToolCallExample;pub use training::ToolDatasetConfig;pub use training::ToolDatasetStats;pub use training::McpToolDef;pub use training::ToolParam;pub use training::ParamType;pub use training::DifficultyLevel;pub use training::DifficultyWeights;pub use training::McpToolCategory;pub use models::RuvLtraConfig;pub use models::AneOptimization;pub use models::QuantizationType;pub use models::MemoryLayout;pub use models::RuvLtraModel;pub use models::RuvLtraAttention;pub use models::RuvLtraMLP;pub use models::RuvLtraDecoderLayer;pub use models::RuvLtraModelInfo;pub use models::AneDispatcher;pub use capabilities::RuvectorCapabilities;pub use capabilities::HNSW_AVAILABLE;pub use capabilities::ATTENTION_AVAILABLE;pub use capabilities::GRAPH_AVAILABLE;pub use capabilities::GNN_AVAILABLE;pub use capabilities::SONA_AVAILABLE;pub use capabilities::SIMD_AVAILABLE;pub use capabilities::PARALLEL_AVAILABLE;pub use capabilities::gate_feature;pub use capabilities::gate_feature_or;pub use ruvector_integration::RuvectorIntegration;pub use ruvector_integration::IntegrationConfig;pub use ruvector_integration::IntegrationStats;pub use ruvector_integration::UnifiedIndex;pub use ruvector_integration::VectorMetadata;pub use ruvector_integration::IndexStats;pub use ruvector_integration::SearchResultWithMetadata;pub use ruvector_integration::IntelligenceLayer;pub use ruvector_integration::IntelligentRoutingDecision;pub use ruvector_integration::IntelligenceLayerStats;pub use quality::QualityMetrics;pub use quality::QualityWeights;pub use quality::QualityDimension;pub use quality::QualitySummary;pub use quality::TrendDirection;pub use quality::QualityScoringEngine;pub use quality::ScoringConfig;pub use quality::ScoringContext;pub use quality::QualityHistory;pub use quality::ComparisonResult;pub use quality::TrendAnalysis;pub use quality::ImprovementRecommendation;pub use quality::CoherenceValidator;pub use quality::CoherenceConfig;pub use quality::SemanticConsistencyResult;pub use quality::ContradictionResult;pub use quality::CoherenceViolation;pub use quality::LogicalFlowResult;pub use quality::DiversityAnalyzer;pub use quality::DiversityConfig;pub use quality::DiversityResult;pub use quality::DiversificationSuggestion;pub use quality::ModeCollapseResult;pub use quality::SchemaValidator;pub use quality::JsonSchemaValidator;pub use quality::TypeValidator;pub use quality::RangeValidator;pub use quality::FormatValidator;pub use quality::CombinedValidator;pub use quality::ValidationResult;pub use quality::ValidationError;pub use quality::ValidationCombinator;pub use context::AgenticMemory;pub use context::AgenticMemoryConfig;pub use context::MemoryType;pub use context::WorkingMemory;pub use context::WorkingMemoryConfig;pub use context::TaskContext;pub use context::ScratchpadEntry;pub use context::AttentionWeights;pub use context::EpisodicMemory;pub use context::EpisodicMemoryConfig;pub use context::Episode;pub use context::EpisodeMetadata;pub use context::EpisodeTrajectory;pub use context::CompressedEpisode;pub use context::IntelligentContextManager;pub use context::ContextManagerConfig;pub use context::PreparedContext;pub use context::PriorityScorer;pub use context::ContextElement;pub use context::ElementPriority;pub use context::SemanticToolCache;pub use context::SemanticCacheConfig;pub use context::CachedToolResult;pub use context::CacheStats;pub use context::ClaudeFlowMemoryBridge;pub use context::ClaudeFlowBridgeConfig;pub use context::SyncResult;pub use reflection::ReflectiveAgent;pub use reflection::ReflectionStrategy;pub use reflection::ReflectionConfig;pub use reflection::RetryConfig;pub use reflection::ExecutionContext;pub use reflection::ExecutionResult;pub use reflection::Reflection;pub use reflection::PreviousAttempt;pub use reflection::BaseAgent;pub use reflection::ReflectiveAgentStats;pub use reflection::ConfidenceChecker;pub use reflection::ConfidenceConfig;pub use reflection::ConfidenceLevel;pub use reflection::WeakPoint;pub use reflection::RevisionResult;pub use reflection::ConfidenceCheckRecord;pub use reflection::ConfidenceFactorWeights;pub use reflection::WeaknessType;pub use reflection::ErrorPatternLearner;pub use reflection::ErrorPatternLearnerConfig;pub use reflection::ErrorPattern;pub use reflection::ErrorCluster;pub use reflection::RecoveryStrategy;pub use reflection::RecoverySuggestion;pub use reflection::ErrorCategory;pub use reflection::RecoveryOutcome;pub use reflection::SimilarError;pub use reflection::ErrorLearnerStats;pub use reflection::Perspective;pub use reflection::CorrectnessChecker;pub use reflection::CompletenessChecker;pub use reflection::ConsistencyChecker;pub use reflection::CritiqueResult;pub use reflection::CritiqueIssue;pub use reflection::IssueCategory;pub use reflection::UnifiedCritique;pub use reflection::PerspectiveConfig;pub use reasoning_bank::ReasoningBank;pub use reasoning_bank::ReasoningBankConfig;pub use reasoning_bank::ReasoningBankStats;pub use reasoning_bank::Trajectory as ReasoningTrajectory;pub use reasoning_bank::TrajectoryStep as ReasoningTrajectoryStep;pub use reasoning_bank::TrajectoryRecorder;pub use reasoning_bank::TrajectoryId;pub use reasoning_bank::StepOutcome;pub use reasoning_bank::PatternStore;pub use reasoning_bank::PatternStoreConfig;pub use reasoning_bank::Pattern;pub use reasoning_bank::PatternCategory;pub use reasoning_bank::PatternSearchResult;pub use reasoning_bank::PatternStats;pub use reasoning_bank::Verdict as ReasoningVerdict;pub use reasoning_bank::RootCause;pub use reasoning_bank::VerdictAnalyzer;pub use reasoning_bank::FailurePattern as VerdictFailurePattern;pub use reasoning_bank::RecoveryStrategy as VerdictRecoveryStrategy;pub use reasoning_bank::PatternConsolidator;pub use reasoning_bank::ConsolidationConfig;pub use reasoning_bank::FisherInformation;pub use reasoning_bank::ImportanceScore;pub use reasoning_bank::MemoryDistiller;pub use reasoning_bank::DistillationConfig;pub use reasoning_bank::CompressedTrajectory;pub use reasoning_bank::KeyLesson;pub use rlm::RlmConfig;pub use rlm::RecursiveConfig;pub use rlm::RecursiveConfigBuilder;pub use rlm::AggregationStrategy;pub use rlm::ConfigValidationError;pub use rlm::DecompositionConfig;pub use rlm::RlmController;pub use rlm::RlmStats;pub use rlm::RlmStatsSnapshot;pub use rlm::QueryResult;pub use rlm::MemoryEntry as RlmMemoryEntry;pub use rlm::MemoryMetadata as RlmMemoryEntryMetadata;pub use rlm::SourceAttribution;pub use rlm::ControllerTokenUsage;pub use rlm::QueryDecomposer;pub use rlm::DecompositionResult;pub use rlm::DecomposerStats;pub use rlm::DecomposerStatsSnapshot;pub use rlm::DecomposerStrategy;pub use rlm::DecomposerSubQuery;pub use rlm::QueryType;pub use rlm::AnswerSynthesizer;pub use rlm::SynthesisResult;pub use rlm::RlmEnvironment;pub use rlm::NativeEnvironment;pub use rlm::EnvironmentConfig;pub use rlm::EnvironmentType;pub use rlm::RlmMemory;pub use rlm::MemoryConfig as RlmMemoryConfig;pub use rlm::MemorySearchResult as RlmMemorySearchResult;pub use rlm::MemoryStoreEntry;pub use rlm::MemoryStoreMetadata;pub use rlm::LlmBackendTrait;pub use rlm::MemoryStore as RlmMemoryStore;pub use rlm::TraitsGenerationParams;pub use rlm::TraitsGenerationOutput;pub use rlm::TraitsFinishReason;pub use rlm::TraitsMemorySpan;pub use rlm::TraitsMemoryMetadata;pub use rlm::TraitsQueryContext;pub use rlm::TraitsQueryDecomposition;pub use rlm::TraitsRlmAnswer;pub use rlm::TraitsSubAnswer;pub use rlm::TraitsSubQuery;pub use rlm::TraitsDecompositionStrategy;pub use rlm::TraitsTokenUsage;pub use rlm::RlmModelInfo;pub use rlm::MemoryId;pub use rlm::MemorySpan;pub use rlm::QueryId;pub use rlm::AnswerId;pub use rlm::Query;pub use rlm::QueryConstraints;pub use rlm::QueryContext;pub use rlm::QueryDecomposition;pub use rlm::DecompositionStrategy;pub use rlm::SubQuery;pub use rlm::SubAnswer;pub use rlm::RlmAnswer;pub use rlm::GenerationParams as RlmGenerationParams;pub use rlm::GenerationOutput as RlmGenerationOutput;pub use rlm::FinishReason as RlmFinishReason;pub use rlm::TokenUsage as RlmTokenUsage;pub use rlm::RuvLtraRlmBackend;pub use rlm::RuvLtraRlmConfig;pub use rlm::RuvLtraEnvironment;pub use rlm::RuvLtraEnvConfig;pub use rlm::KvCache as RlmKvCache;pub use rlm::KvCacheEntry as RlmKvCacheEntry;pub use rlm::KvCacheStats as RlmKvCacheStats;pub use rlm::EmbeddingPooling;pub use rlm::RuvLtraBackendStats;pub use rlm::RuvLtraMemoryStore;pub use types::*;
Modules§
- adapter_
manager - LoRA Adapter Manager
- autodetect
- Intelligent Auto-Detection System for RuvLLM
- backends
- LLM inference backends for RuvLLM
- capabilities
- Ruvector Capabilities Detection
- claude_
flow - Claude Flow Integration for RuvLTRA
- context
- Context Management System for RuvLLM
- error
- Error types for RuvLLM
- evaluation
- RuvLLM Evaluation Harness
- gguf
- GGUF Model Format Loader for RuvLLM
- hub
- HuggingFace Hub integration for RuvLTRA model management
- kernels
- NEON-Optimized LLM Kernels for Mac M4 Pro
- kv_
cache - Two-Tier KV Cache Implementation
- lora
- MicroLoRA Fine-tuning Pipeline for Real-time Per-request Adaptation
- memory_
pool - Memory Pool and Arena Allocator for High-Performance Inference
- models
- Model Architectures for RuvLLM
- optimization
- Real-time Optimization System for RuvLLM
- paged_
attention - Paged Attention Mechanism
- policy_
store - Policy Memory Store
- quality
- Multi-dimensional Quality Scoring Framework for RuvLLM
- quantize
- Quantization Pipeline for RuvLTRA Models
- reasoning_
bank - ReasoningBank - Production-grade learning from Claude trajectories
- reflection
- Self-Reflection Architecture for RuvLLM
- rlm
- Recursive Language Model (RLM) Integration
- ruvector_
integration - Ruvector Integration Layer
- serving
- Continuous Batching Serving Module
- session
- Session State Management
- session_
index - Session State Index
- sona
- SONA Learning Integration for RuvLLM
- speculative
- Speculative Decoding for Accelerated Inference
- tokenizer
- Tokenizer Integration for RuvLLM
- training
- Training Module
- types
- Common types used across RuvLLM
- witness_
log - Witness Log Index
Macros§
- with_
attention - with_
gnn - with_
graph - with_
hnsw - Feature availability check macros for conditional compilation
Structs§
- RuvLLM
Config - RuvLLM engine configuration.
- RuvLLM
Engine - Main RuvLLM engine for LLM inference with intelligent memory.