Crate ruvllm

Crate ruvllm 

Source
Expand description

§RuvLLM - LLM Serving Runtime with Ruvector Integration

RuvLLM is an edge-focused LLM serving runtime designed for portable, high-performance inference across heterogeneous hardware. It integrates with Ruvector for intelligent memory capabilities, enabling continuous self-improvement through SONA learning.

§Architecture

RuvLLM uses Ruvector as a unified memory layer with three distinct roles:

  • Policy Memory Store: Learned thresholds and parameters for runtime decisions
  • Session State Index: Multi-turn conversation state with KV cache references
  • Witness Log Index: Audit logging with semantic search capabilities

§Key Components

§Example

use ruvllm::{RuvLLMConfig, RuvLLMEngine};

// Create engine with default configuration
let config = RuvLLMConfig::default();
let engine = RuvLLMEngine::new(config)?;

// Create a session
let session = engine.create_session("user-123")?;

// Process a request
let response = engine.process(&session, "Hello, world!")?;

Re-exports§

pub use adapter_manager::AdapterManager;
pub use adapter_manager::LoraAdapter;
pub use adapter_manager::AdapterConfig;
pub use autodetect::SystemCapabilities;
pub use autodetect::Platform;
pub use autodetect::Architecture;
pub use autodetect::CpuFeatures;
pub use autodetect::GpuCapabilities;
pub use autodetect::GpuBackend;
pub use autodetect::CoreInfo;
pub use autodetect::ComputeBackend;
pub use autodetect::InferenceConfig;
pub use lora::MicroLoRA;
pub use lora::MicroLoraConfig;
pub use lora::TargetModule;
pub use lora::AdaptFeedback;
pub use lora::AdapterRegistry;
pub use lora::AdapterPool;
pub use lora::AdapterComposer;
pub use lora::CompositionStrategy;
pub use lora::TrainingPipeline;
pub use lora::TrainingConfig;
pub use lora::EwcRegularizer;
pub use lora::LearningRateSchedule;
pub use backends::create_backend;
pub use backends::DeviceType;
pub use backends::DType;
pub use backends::GenerateParams;
pub use backends::GeneratedToken;
pub use backends::LlmBackend;
pub use backends::ModelArchitecture;
pub use backends::ModelConfig;
pub use backends::ModelInfo;
pub use backends::Quantization;
pub use backends::SharedBackend;
pub use backends::SpecialTokens;
pub use backends::StreamEvent;
pub use backends::TokenStream;
pub use backends::Tokenizer;
pub use backends::CandleBackend;
pub use backends::AsyncTokenStream;
pub use backends::LlmBackendAsync;
pub use error::RuvLLMError;
pub use error::Result;
pub use kv_cache::TwoTierKvCache;
pub use kv_cache::KvCacheConfig;
pub use kv_cache::CacheTier;
pub use kv_cache::CacheQuantization;
pub use kv_cache::KvCacheStats;
pub use kv_cache::PooledKvCache;
pub use kv_cache::PooledKvBlock;
pub use kv_cache::PooledKvCacheStats;
pub use memory_pool::InferenceArena;
pub use memory_pool::ArenaStats;
pub use memory_pool::BufferPool;
pub use memory_pool::BufferSize;
pub use memory_pool::PooledBuffer;
pub use memory_pool::BufferPoolStats;
pub use memory_pool::ScratchSpaceManager;
pub use memory_pool::ScratchSpace;
pub use memory_pool::ScratchStats;
pub use memory_pool::MemoryManager;
pub use memory_pool::MemoryManagerConfig;
pub use memory_pool::MemoryManagerStats;
pub use memory_pool::CACHE_LINE_SIZE;
pub use memory_pool::DEFAULT_ALIGNMENT;
pub use paged_attention::PagedAttention;
pub use paged_attention::PagedAttentionConfig;
pub use paged_attention::PageTable;
pub use paged_attention::PageBlock;
pub use policy_store::PolicyStore;
pub use policy_store::PolicyEntry;
pub use policy_store::PolicyType;
pub use policy_store::QuantizationPolicy;
pub use policy_store::RouterPolicy;
pub use session::SessionManager;
pub use session::Session;
pub use session::SessionConfig;
pub use session_index::SessionIndex;
pub use session_index::SessionState;
pub use session_index::KvCacheReference;
pub use sona::SonaIntegration;
pub use sona::SonaConfig;
pub use sona::LearningLoop;
pub use claude_flow::ClaudeFlowAgent;
pub use claude_flow::ClaudeFlowTask;
pub use claude_flow::AgentRouter;
pub use claude_flow::AgentType;
pub use claude_flow::RoutingDecision as AgentRoutingDecision;
pub use claude_flow::TaskClassifier;
pub use claude_flow::TaskType;
pub use claude_flow::ClassificationResult;
pub use claude_flow::FlowOptimizer;
pub use claude_flow::OptimizationConfig;
pub use claude_flow::OptimizationResult;
pub use claude_flow::HnswRouter;
pub use claude_flow::HnswRouterConfig;
pub use claude_flow::HnswRouterStats;
pub use claude_flow::HnswRoutingResult;
pub use claude_flow::HnswDistanceMetric;
pub use claude_flow::TaskPattern;
pub use claude_flow::HybridRouter;
pub use claude_flow::ClaudeModel;
pub use claude_flow::MessageRole;
pub use claude_flow::ContentBlock;
pub use claude_flow::Message;
pub use claude_flow::ClaudeRequest;
pub use claude_flow::ClaudeResponse;
pub use claude_flow::UsageStats;
pub use claude_flow::StreamToken;
pub use claude_flow::StreamEvent as ClaudeStreamEvent;
pub use claude_flow::QualityMonitor;
pub use claude_flow::ResponseStreamer;
pub use claude_flow::StreamStats;
pub use claude_flow::ContextWindow;
pub use claude_flow::ContextManager;
pub use claude_flow::AgentState;
pub use claude_flow::AgentContext;
pub use claude_flow::WorkflowStep;
pub use claude_flow::WorkflowResult;
pub use claude_flow::StepResult;
pub use claude_flow::AgentCoordinator;
pub use claude_flow::CoordinatorStats;
pub use claude_flow::CostEstimator;
pub use claude_flow::LatencyTracker;
pub use claude_flow::LatencySample;
pub use claude_flow::LatencyStats as ClaudeLatencyStats;
pub use claude_flow::ComplexityFactors;
pub use claude_flow::ComplexityWeights;
pub use claude_flow::ComplexityScore;
pub use claude_flow::TaskComplexityAnalyzer;
pub use claude_flow::AnalyzerStats as ModelAnalyzerStats;
pub use claude_flow::SelectionCriteria;
pub use claude_flow::ModelRoutingDecision;
pub use claude_flow::ModelSelector;
pub use claude_flow::SelectorStats;
pub use claude_flow::ModelRouter;
pub use claude_flow::HooksIntegration;
pub use claude_flow::HooksConfig;
pub use claude_flow::PreTaskInput;
pub use claude_flow::PreTaskResult;
pub use claude_flow::PostTaskInput;
pub use claude_flow::PostTaskResult;
pub use claude_flow::PreEditInput;
pub use claude_flow::PreEditResult;
pub use claude_flow::PostEditInput;
pub use claude_flow::PostEditResult;
pub use claude_flow::SessionState as HooksSessionState;
pub use claude_flow::SessionEndResult;
pub use claude_flow::SessionMetrics;
pub use claude_flow::PatternMatch;
pub use claude_flow::QualityAssessment;
pub use claude_flow::LearningMetrics;
pub use optimization::InferenceMetrics;
pub use optimization::MetricsCollector;
pub use optimization::MetricsSnapshot;
pub use optimization::MovingAverage;
pub use optimization::LatencyHistogram;
pub use optimization::RealtimeOptimizer;
pub use optimization::RealtimeConfig;
pub use optimization::BatchSizeStrategy;
pub use optimization::KvCachePressurePolicy;
pub use optimization::TokenBudgetAllocation;
pub use optimization::SpeculativeConfig;
pub use optimization::OptimizationDecision;
pub use optimization::SonaLlm;
pub use optimization::SonaLlmConfig;
pub use optimization::TrainingSample;
pub use optimization::AdaptationResult;
pub use optimization::LearningLoopStats;
pub use optimization::ConsolidationStrategy;
pub use optimization::OptimizationTrigger;
pub use tokenizer::RuvTokenizer;
pub use tokenizer::ChatMessage;
pub use tokenizer::ChatTemplate;
pub use tokenizer::Role;
pub use tokenizer::TokenizerSpecialTokens;
pub use tokenizer::StreamingDecodeBuffer;
pub use speculative::SpeculativeDecoder;
pub use speculative::SpeculativeConfig as SpeculativeDecodingConfig;
pub use speculative::SpeculativeStats;
pub use speculative::AtomicSpeculativeStats;
pub use speculative::VerificationResult;
pub use speculative::SpeculationTree;
pub use speculative::TreeNode;
pub use speculative::softmax;
pub use speculative::log_softmax;
pub use speculative::sample_from_probs;
pub use speculative::top_k_filter;
pub use speculative::top_p_filter;
pub use witness_log::WitnessLog;
pub use witness_log::WitnessEntry;
pub use witness_log::LatencyBreakdown;
pub use witness_log::RoutingDecision;
pub use witness_log::AsyncWriteConfig;
pub use witness_log::WitnessLogStats;
pub use gguf::GgufFile;
pub use gguf::GgufModelLoader;
pub use gguf::GgufHeader;
pub use gguf::GgufValue;
pub use gguf::GgufQuantType;
pub use gguf::TensorInfo;
pub use gguf::QuantizedTensor;
pub use gguf::ModelConfig as GgufModelConfig;
pub use gguf::GgufLoader;
pub use gguf::LoadConfig;
pub use gguf::LoadProgress;
pub use gguf::LoadedWeights;
pub use gguf::LoadedTensor;
pub use gguf::TensorCategory;
pub use gguf::TensorNameMapper;
pub use gguf::StreamingLoader;
pub use gguf::ModelInitializer;
pub use gguf::ModelWeights;
pub use gguf::LayerWeights;
pub use gguf::WeightTensor;
pub use gguf::QuantizedWeight;
pub use gguf::ProgressModelBuilder;
pub use hub::ModelDownloader;
pub use hub::DownloadConfig;
pub use hub::DownloadProgress;
pub use hub::DownloadError;
pub use hub::ChecksumVerifier;
pub use hub::ModelUploader;
pub use hub::UploadConfig;
pub use hub::UploadProgress;
pub use hub::UploadError;
pub use hub::ModelMetadata;
pub use hub::RuvLtraRegistry;
pub use hub::ModelInfo as HubModelInfo;
pub use hub::ModelSize;
pub use hub::QuantizationLevel;
pub use hub::HardwareRequirements;
pub use hub::get_model_info;
pub use hub::ModelCard;
pub use hub::ModelCardBuilder;
pub use hub::TaskType as HubTaskType;
pub use hub::Framework;
pub use hub::License;
pub use hub::DatasetInfo;
pub use hub::MetricResult;
pub use hub::ProgressBar;
pub use hub::ProgressIndicator;
pub use hub::ProgressStyle;
pub use hub::ProgressCallback;
pub use hub::MultiProgress;
pub use hub::HubError;
pub use hub::default_cache_dir;
pub use hub::get_hf_token;
pub use serving::InferenceRequest;
pub use serving::RequestId;
pub use serving::Priority;
pub use serving::RequestState;
pub use serving::RunningRequest;
pub use serving::CompletedRequest;
pub use serving::FinishReason;
pub use serving::TokenOutput;
pub use serving::BatchedRequest;
pub use serving::BatchStats;
pub use serving::ScheduledBatch;
pub use serving::IterationPlan;
pub use serving::PrefillTask;
pub use serving::DecodeTask;
pub use serving::TokenBudget;
pub use serving::KvCacheManager;
pub use serving::KvCachePoolConfig;
pub use serving::KvCacheAllocation;
pub use serving::KvCacheManagerStats;
pub use serving::ContinuousBatchScheduler;
pub use serving::IterationScheduler;
pub use serving::SchedulerConfig;
pub use serving::SchedulerStats;
pub use serving::RequestQueue;
pub use serving::PreemptionMode;
pub use serving::PriorityPolicy;
pub use serving::ServingEngine;
pub use serving::ServingEngineConfig;
pub use serving::ServingMetrics;
pub use serving::GenerationResult;
pub use quantize::RuvltraQuantizer;
pub use quantize::QuantConfig;
pub use quantize::TargetFormat;
pub use quantize::quantize_ruvltra_q4;
pub use quantize::quantize_ruvltra_q5;
pub use quantize::quantize_ruvltra_q8;
pub use quantize::dequantize_for_ane;
pub use quantize::estimate_memory_q4;
pub use quantize::estimate_memory_q5;
pub use quantize::estimate_memory_q8;
pub use quantize::MemoryEstimate;
pub use quantize::Q4KMBlock;
pub use quantize::Q5KMBlock;
pub use quantize::Q8Block;
pub use quantize::QuantProgress;
pub use quantize::QuantStats;
pub use training::ClaudeTaskDataset;
pub use training::ClaudeTaskExample;
pub use training::TaskCategory;
pub use training::TaskMetadata;
pub use training::ComplexityLevel;
pub use training::DomainType;
pub use training::DatasetConfig;
pub use training::AugmentationConfig;
pub use training::DatasetGenerator;
pub use training::DatasetStats;
pub use training::GrpoConfig;
pub use training::GrpoOptimizer;
pub use training::GrpoSample;
pub use training::GrpoStats;
pub use training::GrpoUpdateResult;
pub use training::GrpoBatch;
pub use training::SampleGroup;
pub use training::McpToolTrainer;
pub use training::McpTrainingConfig;
pub use training::ToolTrajectory;
pub use training::TrajectoryStep;
pub use training::TrajectoryBuilder;
pub use training::StepBuilder;
pub use training::TrajectoryMetadata;
pub use training::TrainingResult;
pub use training::TrainingStats;
pub use training::TrainingCheckpoint;
pub use training::EvaluationMetrics;
pub use training::ToolCallDataset;
pub use training::ToolCallExample;
pub use training::ToolDatasetConfig;
pub use training::ToolDatasetStats;
pub use training::McpToolDef;
pub use training::ToolParam;
pub use training::ParamType;
pub use training::DifficultyLevel;
pub use training::DifficultyWeights;
pub use training::McpToolCategory;
pub use models::RuvLtraConfig;
pub use models::AneOptimization;
pub use models::QuantizationType;
pub use models::MemoryLayout;
pub use models::RuvLtraModel;
pub use models::RuvLtraAttention;
pub use models::RuvLtraMLP;
pub use models::RuvLtraDecoderLayer;
pub use models::RuvLtraModelInfo;
pub use models::AneDispatcher;
pub use capabilities::RuvectorCapabilities;
pub use capabilities::HNSW_AVAILABLE;
pub use capabilities::ATTENTION_AVAILABLE;
pub use capabilities::GRAPH_AVAILABLE;
pub use capabilities::GNN_AVAILABLE;
pub use capabilities::SONA_AVAILABLE;
pub use capabilities::SIMD_AVAILABLE;
pub use capabilities::PARALLEL_AVAILABLE;
pub use capabilities::gate_feature;
pub use capabilities::gate_feature_or;
pub use ruvector_integration::RuvectorIntegration;
pub use ruvector_integration::IntegrationConfig;
pub use ruvector_integration::IntegrationStats;
pub use ruvector_integration::UnifiedIndex;
pub use ruvector_integration::VectorMetadata;
pub use ruvector_integration::IndexStats;
pub use ruvector_integration::SearchResultWithMetadata;
pub use ruvector_integration::IntelligenceLayer;
pub use ruvector_integration::IntelligentRoutingDecision;
pub use ruvector_integration::IntelligenceLayerStats;
pub use quality::QualityMetrics;
pub use quality::QualityWeights;
pub use quality::QualityDimension;
pub use quality::QualitySummary;
pub use quality::TrendDirection;
pub use quality::QualityScoringEngine;
pub use quality::ScoringConfig;
pub use quality::ScoringContext;
pub use quality::QualityHistory;
pub use quality::ComparisonResult;
pub use quality::TrendAnalysis;
pub use quality::ImprovementRecommendation;
pub use quality::CoherenceValidator;
pub use quality::CoherenceConfig;
pub use quality::SemanticConsistencyResult;
pub use quality::ContradictionResult;
pub use quality::CoherenceViolation;
pub use quality::LogicalFlowResult;
pub use quality::DiversityAnalyzer;
pub use quality::DiversityConfig;
pub use quality::DiversityResult;
pub use quality::DiversificationSuggestion;
pub use quality::ModeCollapseResult;
pub use quality::SchemaValidator;
pub use quality::JsonSchemaValidator;
pub use quality::TypeValidator;
pub use quality::RangeValidator;
pub use quality::FormatValidator;
pub use quality::CombinedValidator;
pub use quality::ValidationResult;
pub use quality::ValidationError;
pub use quality::ValidationCombinator;
pub use context::AgenticMemory;
pub use context::AgenticMemoryConfig;
pub use context::MemoryType;
pub use context::WorkingMemory;
pub use context::WorkingMemoryConfig;
pub use context::TaskContext;
pub use context::ScratchpadEntry;
pub use context::AttentionWeights;
pub use context::EpisodicMemory;
pub use context::EpisodicMemoryConfig;
pub use context::Episode;
pub use context::EpisodeMetadata;
pub use context::EpisodeTrajectory;
pub use context::CompressedEpisode;
pub use context::IntelligentContextManager;
pub use context::ContextManagerConfig;
pub use context::PreparedContext;
pub use context::PriorityScorer;
pub use context::ContextElement;
pub use context::ElementPriority;
pub use context::SemanticToolCache;
pub use context::SemanticCacheConfig;
pub use context::CachedToolResult;
pub use context::CacheStats;
pub use context::ClaudeFlowMemoryBridge;
pub use context::ClaudeFlowBridgeConfig;
pub use context::SyncResult;
pub use reflection::ReflectiveAgent;
pub use reflection::ReflectionStrategy;
pub use reflection::ReflectionConfig;
pub use reflection::RetryConfig;
pub use reflection::ExecutionContext;
pub use reflection::ExecutionResult;
pub use reflection::Reflection;
pub use reflection::PreviousAttempt;
pub use reflection::BaseAgent;
pub use reflection::ReflectiveAgentStats;
pub use reflection::ConfidenceChecker;
pub use reflection::ConfidenceConfig;
pub use reflection::ConfidenceLevel;
pub use reflection::WeakPoint;
pub use reflection::RevisionResult;
pub use reflection::ConfidenceCheckRecord;
pub use reflection::ConfidenceFactorWeights;
pub use reflection::WeaknessType;
pub use reflection::ErrorPatternLearner;
pub use reflection::ErrorPatternLearnerConfig;
pub use reflection::ErrorPattern;
pub use reflection::ErrorCluster;
pub use reflection::RecoveryStrategy;
pub use reflection::RecoverySuggestion;
pub use reflection::ErrorCategory;
pub use reflection::RecoveryOutcome;
pub use reflection::SimilarError;
pub use reflection::ErrorLearnerStats;
pub use reflection::Perspective;
pub use reflection::CorrectnessChecker;
pub use reflection::CompletenessChecker;
pub use reflection::ConsistencyChecker;
pub use reflection::CritiqueResult;
pub use reflection::CritiqueIssue;
pub use reflection::IssueCategory;
pub use reflection::UnifiedCritique;
pub use reflection::PerspectiveConfig;
pub use reasoning_bank::ReasoningBank;
pub use reasoning_bank::ReasoningBankConfig;
pub use reasoning_bank::ReasoningBankStats;
pub use reasoning_bank::Trajectory as ReasoningTrajectory;
pub use reasoning_bank::TrajectoryStep as ReasoningTrajectoryStep;
pub use reasoning_bank::TrajectoryRecorder;
pub use reasoning_bank::TrajectoryId;
pub use reasoning_bank::StepOutcome;
pub use reasoning_bank::PatternStore;
pub use reasoning_bank::PatternStoreConfig;
pub use reasoning_bank::Pattern;
pub use reasoning_bank::PatternCategory;
pub use reasoning_bank::PatternSearchResult;
pub use reasoning_bank::PatternStats;
pub use reasoning_bank::Verdict as ReasoningVerdict;
pub use reasoning_bank::RootCause;
pub use reasoning_bank::VerdictAnalyzer;
pub use reasoning_bank::FailurePattern as VerdictFailurePattern;
pub use reasoning_bank::RecoveryStrategy as VerdictRecoveryStrategy;
pub use reasoning_bank::PatternConsolidator;
pub use reasoning_bank::ConsolidationConfig;
pub use reasoning_bank::FisherInformation;
pub use reasoning_bank::ImportanceScore;
pub use reasoning_bank::MemoryDistiller;
pub use reasoning_bank::DistillationConfig;
pub use reasoning_bank::CompressedTrajectory;
pub use reasoning_bank::KeyLesson;
pub use rlm::RlmConfig;
pub use rlm::RecursiveConfig;
pub use rlm::RecursiveConfigBuilder;
pub use rlm::AggregationStrategy;
pub use rlm::ConfigValidationError;
pub use rlm::DecompositionConfig;
pub use rlm::RlmController;
pub use rlm::RlmStats;
pub use rlm::RlmStatsSnapshot;
pub use rlm::QueryResult;
pub use rlm::MemoryEntry as RlmMemoryEntry;
pub use rlm::MemoryMetadata as RlmMemoryEntryMetadata;
pub use rlm::SourceAttribution;
pub use rlm::ControllerTokenUsage;
pub use rlm::QueryDecomposer;
pub use rlm::DecompositionResult;
pub use rlm::DecomposerStats;
pub use rlm::DecomposerStatsSnapshot;
pub use rlm::DecomposerStrategy;
pub use rlm::DecomposerSubQuery;
pub use rlm::QueryType;
pub use rlm::AnswerSynthesizer;
pub use rlm::SynthesisResult;
pub use rlm::RlmEnvironment;
pub use rlm::NativeEnvironment;
pub use rlm::EnvironmentConfig;
pub use rlm::EnvironmentType;
pub use rlm::RlmMemory;
pub use rlm::MemoryConfig as RlmMemoryConfig;
pub use rlm::MemorySearchResult as RlmMemorySearchResult;
pub use rlm::MemoryStoreEntry;
pub use rlm::MemoryStoreMetadata;
pub use rlm::LlmBackendTrait;
pub use rlm::MemoryStore as RlmMemoryStore;
pub use rlm::TraitsGenerationParams;
pub use rlm::TraitsGenerationOutput;
pub use rlm::TraitsFinishReason;
pub use rlm::TraitsMemorySpan;
pub use rlm::TraitsMemoryMetadata;
pub use rlm::TraitsQueryContext;
pub use rlm::TraitsQueryDecomposition;
pub use rlm::TraitsRlmAnswer;
pub use rlm::TraitsSubAnswer;
pub use rlm::TraitsSubQuery;
pub use rlm::TraitsDecompositionStrategy;
pub use rlm::TraitsTokenUsage;
pub use rlm::RlmModelInfo;
pub use rlm::MemoryId;
pub use rlm::MemorySpan;
pub use rlm::QueryId;
pub use rlm::AnswerId;
pub use rlm::Query;
pub use rlm::QueryConstraints;
pub use rlm::QueryContext;
pub use rlm::QueryDecomposition;
pub use rlm::DecompositionStrategy;
pub use rlm::SubQuery;
pub use rlm::SubAnswer;
pub use rlm::RlmAnswer;
pub use rlm::GenerationParams as RlmGenerationParams;
pub use rlm::GenerationOutput as RlmGenerationOutput;
pub use rlm::FinishReason as RlmFinishReason;
pub use rlm::TokenUsage as RlmTokenUsage;
pub use rlm::RuvLtraRlmBackend;
pub use rlm::RuvLtraRlmConfig;
pub use rlm::RuvLtraEnvironment;
pub use rlm::RuvLtraEnvConfig;
pub use rlm::KvCache as RlmKvCache;
pub use rlm::KvCacheEntry as RlmKvCacheEntry;
pub use rlm::KvCacheStats as RlmKvCacheStats;
pub use rlm::EmbeddingPooling;
pub use rlm::RuvLtraBackendStats;
pub use rlm::RuvLtraMemoryStore;
pub use types::*;

Modules§

adapter_manager
LoRA Adapter Manager
autodetect
Intelligent Auto-Detection System for RuvLLM
backends
LLM inference backends for RuvLLM
capabilities
Ruvector Capabilities Detection
claude_flow
Claude Flow Integration for RuvLTRA
context
Context Management System for RuvLLM
error
Error types for RuvLLM
evaluation
RuvLLM Evaluation Harness
gguf
GGUF Model Format Loader for RuvLLM
hub
HuggingFace Hub integration for RuvLTRA model management
kernels
NEON-Optimized LLM Kernels for Mac M4 Pro
kv_cache
Two-Tier KV Cache Implementation
lora
MicroLoRA Fine-tuning Pipeline for Real-time Per-request Adaptation
memory_pool
Memory Pool and Arena Allocator for High-Performance Inference
models
Model Architectures for RuvLLM
optimization
Real-time Optimization System for RuvLLM
paged_attention
Paged Attention Mechanism
policy_store
Policy Memory Store
quality
Multi-dimensional Quality Scoring Framework for RuvLLM
quantize
Quantization Pipeline for RuvLTRA Models
reasoning_bank
ReasoningBank - Production-grade learning from Claude trajectories
reflection
Self-Reflection Architecture for RuvLLM
rlm
Recursive Language Model (RLM) Integration
ruvector_integration
Ruvector Integration Layer
serving
Continuous Batching Serving Module
session
Session State Management
session_index
Session State Index
sona
SONA Learning Integration for RuvLLM
speculative
Speculative Decoding for Accelerated Inference
tokenizer
Tokenizer Integration for RuvLLM
training
Training Module
types
Common types used across RuvLLM
witness_log
Witness Log Index

Macros§

with_attention
with_gnn
with_graph
with_hnsw
Feature availability check macros for conditional compilation

Structs§

RuvLLMConfig
RuvLLM engine configuration.
RuvLLMEngine
Main RuvLLM engine for LLM inference with intelligent memory.