AllSource Core - High-Performance Event Store
AI-native event store built in Rust with columnar storage, schema validation, event replay, and stream processing
ðŊ What is AllSource?
AllSource is a high-performance event store designed for modern event-sourcing and CQRS architectures. Built with a polyglot architecture:
- Rust Core (this service): High-performance event store engine with columnar storage
- Go Control Plane (
services/control-plane): Orchestration, monitoring, and management layer
The Rust core provides blazing-fast event ingestion (469K events/sec) and sub-microsecond queries, while the Go control plane handles cluster coordination and operational tasks.
Current Version: v0.6.0 (Rust Core - Clean Architecture Phase 2) | v0.1.0 (Go Control Plane)
âĻ Features
ð Core Event Store (v0.1)
- Immutable Event Log: Append-only storage with complete audit trail
- Time-Travel Queries: Query entity state as of any timestamp
- Concurrent Indexing: Lock-free indexing using
DashMapfor O(1) lookups - Real-time Projections: Built-in materialized views with custom projection support
- High Performance: 469K+ events/sec throughput, sub-millisecond queries
- Type-Safe API: Strong typing with Rust's ownership system
ðū Persistence & Durability (v0.2)
- Parquet Columnar Storage: Apache Arrow-based storage for analytics
- Write-Ahead Log (WAL): Crash recovery with full durability guarantees
- Snapshot System: Point-in-time snapshots with automatic optimization
- Automatic Compaction: Background file merging for storage efficiency
- WebSocket Streaming: Real-time event broadcasting to connected clients
- Advanced Analytics: Event frequency, correlation analysis, statistical summaries
ð Schema Registry (v0.5)
- JSON Schema Validation: Enforce event contracts at ingestion time
- Schema Versioning: Automatic version management with compatibility checking
- Compatibility Modes: Backward, Forward, Full, or None
- Breaking Change Prevention: Validate schema evolution before deployment
- Subject Organization: Group schemas by domain or event type
ð Event Replay & Projections (v0.5)
- Point-in-Time Replay: Replay events from any timestamp
- Projection Rebuilding: Reconstruct materialized views with progress tracking
- Batch Processing: Configurable batch sizes for optimal performance
- Async Execution: Non-blocking background replay operations
- Cancellable Operations: Stop replays gracefully with proper cleanup
- Progress Metrics: Real-time statistics (events/sec, percentage complete)
⥠Stream Processing (v0.5)
- 6 Built-in Operators:
- Filter: eq, ne, gt, lt, contains operations
- Map: Transform field values (uppercase, lowercase, trim, math)
- Reduce: Aggregations (count, sum, avg, min, max) with grouping
- Window: Time-based aggregations (tumbling, sliding, session)
- Enrich: External data lookup and enrichment
- Branch: Conditional event routing
- Stateful Processing: Thread-safe state management for aggregations
- Window Buffers: Automatic time-based event eviction
- Pipeline Statistics: Track processing metrics per pipeline
- Integrated Processing: Events flow through pipelines during ingestion
ðïļ SierraDB-Inspired Production Patterns (NEW)
Based on battle-tested patterns from SierraDB, a production-grade event store:
PartitionKey - Fixed Partition Architecture
- â 32 fixed partitions for single-node deployment (scalable to 1024+)
- â Consistent hashing ensures same entity always maps to same partition
- â Sequential writes within partitions for ordering guarantees
- â Ready for horizontal scaling with node assignment
- â 6 comprehensive tests covering distribution and consistency
EventStream - Gapless Version Guarantees
- â Watermark system tracks "highest continuously confirmed sequence"
- â Prevents gaps that would break event sourcing guarantees
- â Optimistic locking prevents concurrent modification conflicts
- â Version numbers start at 1 and increment sequentially
- â 9 tests covering versioning, concurrency, and gap detection
EventStreamRepository - Infrastructure Implementation
- â Thread-safe in-memory implementation with parking_lot RwLock
- â Partition-aware stream storage and retrieval
- â Watermark tracking and gapless verification
- â Optimistic locking enforcement at repository level
- â 8 comprehensive tests covering all operations
StorageIntegrity - Corruption Prevention
- â SHA-256 checksums for data integrity
- â WAL segment verification
- â Parquet file integrity checking
- â Batch verification with progress reporting
- â 8 tests covering checksums and file verification
7-Day Stress Tests - Production Hardening
- â Continuous ingestion tests (7 days, 1 hour, 5 minutes configs)
- â Memory leak detection
- â Corruption detection over time
- â Partition balance monitoring
- â 4 tests + configurable long-running tests
Why These Patterns?
| Pattern | SierraDB's Lesson | Our Benefit |
|---|---|---|
| Fixed Partitions | Sequential writes enable gapless sequences | Horizontal scaling without complex coordination |
| Gapless Versions | Watermark prevents data gaps | Consistent event sourcing guarantees |
| Optimistic Locking | Detect concurrent modifications | Safe concurrent access without heavy locks |
Implemented:
- â Storage integrity checksums (prevent silent corruption)
- â 7-day continuous stress tests (production confidence)
Coming Next:
- ⥠Zero-copy deserialization (performance optimization)
- ð Batch processing optimizations
- ð Lock-free data structures
ð Performance Benchmarks
Measured on Apple Silicon M-series (release build):
| Operation | Throughput/Latency | Details |
|---|---|---|
| Event Ingestion | 442-469K events/sec | Single-threaded |
| Entity Query | 11.9 Ξs | Indexed lookup |
| Type Query | 2.4 ms | Cross-entity scan |
| State Reconstruction | 3.5 Ξs | With snapshots |
| State Reconstruction | 3.8 Ξs | Without snapshots |
| Concurrent Writes (8 workers) | 8.0 ms/batch | 100 events/batch |
| Parquet Batch Write | 3.5 ms | 1000 events |
| Snapshot Creation | 130 Ξs | Per entity |
| WAL Sync Writes | 413 ms | 100 syncs |
Test Coverage: 250 tests - 100% passing
- Domain Layer: 177 tests (Value Objects, Entities, Business Rules, SierraDB Patterns)
- PartitionKey: 6 tests (consistent hashing, distribution, node assignment)
- EventStream: 9 tests (gapless versioning, optimistic locking, watermarks)
- Application Layer: 20 tests (Use Cases, DTOs)
- Infrastructure Layer: 53 tests (API, Storage, Services, Repository Implementations)
- InMemoryEventStreamRepository: 8 tests (SierraDB pattern implementation)
- StorageIntegrity: 8 tests (checksum verification, WAL/Parquet integrity)
- 7-Day Stress Tests: 4 tests (long-running corruption detection)
ð§ API Endpoints (38 Total)
Core Event Store
# Health check
# Ingest event
# Query events
&limit=100
# Entity state
# Statistics
WebSocket Streaming (v0.2)
# Real-time event stream
Analytics (v0.2)
# Event frequency analysis
&bucket_size=3600
# Statistical summary
# Event correlation
&event_b=order.placed
Snapshots (v0.2)
# Create snapshot
# List snapshots
# Get latest snapshot
Compaction (v0.2)
# Trigger manual compaction
# Get compaction stats
Schema Registry (v0.5)
# Register schema
# List subjects
# Get schema
# List versions
# Validate event
# Set compatibility mode
Event Replay (v0.5)
# Start replay
# List replays
# Get progress
# Cancel replay
# Delete replay
Stream Processing Pipelines (v0.5)
# Register pipeline
# List pipelines
# Get pipeline
# Remove pipeline
# Get all stats
# Get pipeline stats
# Reset pipeline state
ð Project Structure (Clean Architecture)
Following Clean Architecture principles with clear separation of concerns:
services/core/src/
âââ main.rs # Application entry point
âââ lib.rs # Library exports
âââ error.rs # Error types and Result
â
âââ domain/ # ðïļ DOMAIN LAYER (Business Logic)
â âââ entities/ # Core business entities
â â âââ event.rs # Event entity (162 tests)
â â âââ tenant.rs # Multi-tenancy entity
â â âââ schema.rs # Schema registry entity
â â âââ projection.rs # Projection entity
â â âââ event_stream.rs # ð Gapless event stream (9 tests)
â âââ value_objects/ # Self-validating value objects
â âââ tenant_id.rs # Tenant identifier
â âââ event_type.rs # Event type validation
â âââ entity_id.rs # Entity identifier
â âââ partition_key.rs # ð Fixed partitioning (6 tests)
â
âââ application/ # ðŊ APPLICATION LAYER (Use Cases)
â âââ dto/ # Data Transfer Objects
â â âââ event_dto.rs # Event request/response DTOs
â â âââ tenant_dto.rs # Tenant DTOs
â â âââ schema_dto.rs # Schema DTOs
â â âââ projection_dto.rs # Projection DTOs
â âââ use_cases/ # Application business logic
â âââ ingest_event.rs # Event ingestion (3 tests)
â âââ query_events.rs # Event queries (4 tests)
â âââ manage_tenant.rs # Tenant management (5 tests)
â âââ manage_schema.rs # Schema operations (4 tests)
â âââ manage_projection.rs # Projection ops (4 tests)
â
âââ infrastructure/ # ð§ INFRASTRUCTURE LAYER (Technical)
âââ repositories/ # ð Repository implementations (SierraDB)
â âââ in_memory_event_stream_repository.rs # Thread-safe, partitioned (8 tests)
âââ persistence/ # ð Storage integrity (SierraDB)
â âââ storage_integrity.rs # Checksum verification (8 tests)
âââ api.rs # REST API endpoints (38 endpoints)
âââ store.rs # Event store implementation
âââ storage.rs # Parquet columnar storage
âââ wal.rs # Write-ahead log
âââ snapshot.rs # Snapshot management
âââ compaction.rs # Storage compaction
âââ index.rs # High-performance indexing
âââ projection.rs # Real-time projections
âââ analytics.rs # Analytics engine
âââ websocket.rs # WebSocket streaming
âââ schema.rs # Schema validation service
âââ replay.rs # Event replay engine
âââ pipeline.rs # Stream processing
âââ backup.rs # Backup management
âââ auth.rs # Authentication/Authorization
âââ rate_limit.rs # Rate limiting
âââ tenant.rs # Tenant service
âââ metrics.rs # Metrics collection
âââ middleware.rs # HTTP middleware
âââ config.rs # Configuration
âââ tenant_api.rs # Tenant API handlers
âââ auth_api.rs # Auth API handlers
tests/
âââ integration_tests.rs # End-to-end tests
âââ stress_tests/ # ð Long-running stress tests (SierraDB)
âââ seven_day_stress.rs # 7-day corruption detection (4 tests)
benches/
âââ performance_benchmarks.rs # Performance benchmarks
ðïļ Clean Architecture Benefits
Domain Layer (Core Business Logic)
- â Pure business rules with zero external dependencies
- â Value Objects enforce invariants at construction time
- â Entities contain rich domain behavior
- â SierraDB patterns for production-grade event streaming
- â 177 comprehensive domain tests (including 15 SierraDB tests)
Application Layer (Orchestration)
- â Use Cases coordinate domain entities
- â DTOs isolate external contracts from domain
- â Clear input/output boundaries
- â 20 use case tests covering all scenarios
Infrastructure Layer (Technical Details)
- â Pluggable implementations (can swap storage, APIs)
- â Framework and library dependencies isolated
- â Domain and Application layers remain pure
- â 37 infrastructure integration tests
ð Quick Start
Installation
# Clone the repository
# Build
# Run tests
# Run benchmarks
Running the Server
# Development mode
# Production mode (optimized)
# With debug logging
RUST_LOG=debug
# Custom port (modify main.rs)
# Default: 0.0.0.0:8080
Example: Ingest Events
Example: Query Events
# Get all events for an entity
# Time-travel query
# Query by type
Example: Register Schema (v0.5)
Example: Start Replay (v0.5)
Example: Create Pipeline (v0.5)
â Quality Gates
AllSource Core enforces strict quality gates to maintain code quality, consistency, and reliability.
See: QUALITY_GATES.md for complete documentation.
Quick Start
# Run all quality gates (recommended before commit)
# Auto-fix formatting and sorting
# Full CI pipeline locally
Quality Checks Enforced
- â Code Formatting (rustfmt) - Consistent style across codebase
- â Code Quality (clippy) - Zero warnings, catch anti-patterns
- â Dependency Sorting (cargo-sort) - Alphabetically sorted Cargo.toml
- â Test Coverage - All tests must pass
- â Build Verification - Release build must succeed
CI/CD Integration
Quality gates run automatically on:
- Every push to
mainordevelop - Every pull request
- Enforced before merge
Workflow: .github/workflows/rust-quality.yml
Configuration Files
.clippy.toml- Clippy linter configuration (MSRV: 1.70.0)rustfmt.toml- Code formatting rules (max width: 100)cargo-sort.toml- Dependency sorting configurationMakefile- Quality gate commands
ð§Š Testing
# Run all tests
# Run unit tests only
# Run integration tests
# Run specific test
# Run with output
# Run with logging
RUST_LOG=debug
ð Benchmarking
# Run all benchmarks
# Run specific benchmark suite
# View results
ðïļ Architecture
System Architecture
AllSource uses a polyglot architecture with specialized services:
âââââââââââââââââââââââââââââââââââââââââââ
â Go Control Plane (Port 8081) â
â âĒ Cluster Management â
â âĒ Metrics Aggregation â
â âĒ Operation Orchestration â
â âĒ Health Monitoring â
âââââââââââââââââââââââââââââââââââââââââââ
â HTTP Client
âž
âââââââââââââââââââââââââââââââââââââââââââ
â Rust Event Store (Port 8080) â
â âĒ Event Ingestion (469K/sec) â
â âĒ Query Engine (<12Ξs) â
â âĒ Schema Registry â
â âĒ Stream Processing â
â âĒ Event Replay â
âââââââââââââââââââââââââââââââââââââââââââ
â
âž
âââââââââââââââââââââââââââââââââââââââââââ
â Storage Layer â
â âĒ Parquet (Columnar) â
â âĒ WAL (Durability) â
â âĒ Snapshots (Point-in-time) â
â âĒ In-Memory Indexes â
âââââââââââââââââââââââââââââââââââââââââââ
Event Flow
When an event is ingested:
- Validation - Check event integrity
- WAL Write - Durable write-ahead log (v0.2)
- Indexing - Update entity/type indexes
- Projections - Update materialized views
- Pipelines - Real-time stream processing (v0.5)
- Parquet Storage - Columnar persistence (v0.2)
- In-Memory Store - Fast access layer
- WebSocket Broadcast - Real-time streaming (v0.2)
- Auto-Snapshots - Create snapshots if needed (v0.2)
Storage Architecture
Storage Layer:
âââ In-Memory Events (Vec<Event>)
âââ DashMap Indexes (entity_id, event_type)
âââ Parquet Files (columnar storage)
âââ WAL Segments (append-only logs)
âââ Snapshots (point-in-time state)
Concurrency Model
- Lock-free Indexes: DashMap for entity/type lookups
- RwLock: parking_lot RwLock for event list
- Atomic Counters: Lock-free statistics tracking
- Async Runtime: Tokio for background tasks (replay, compaction)
ðŊ Roadmap
â v0.1 - Core Event Store (COMPLETED)
- In-memory event storage
- DashMap-based indexing
- Entity state reconstruction
- Basic projections
- REST API
- Query by entity/type/time
â v0.2 - Persistence & Durability (COMPLETED)
- Parquet columnar storage
- Write-ahead log (WAL)
- Snapshot system
- Automatic compaction
- WebSocket streaming
- Advanced analytics
â v0.5 - Schema & Processing (COMPLETED)
- Schema registry with versioning
- Event replay engine
- Projection rebuilding
- Stream processing pipelines
- Stateful aggregations
- Window operations
â v0.6 - Clean Architecture Refactoring (PHASE 2 COMPLETED)
- Phase 1: Domain Layer (162 tests)
- Value Objects (TenantId, EventType, EntityId)
- Domain Entities (Event, Tenant, Schema, Projection)
- Business rule enforcement
- Self-validating types
- Phase 2: Application Layer (20 tests)
- DTOs for all operations
- Tenant management use cases
- Schema management use cases
- Projection management use cases
- Clean separation from domain
- Phase 3: Infrastructure Layer (IN PROGRESS)
- Repository pattern implementation
- API layer refactoring
- Service layer extraction
- Dependency injection
ð v0.7 - Performance & Optimization (PLANNED)
- Zero-copy deserialization optimization
- SIMD-accelerated queries
- Memory-mapped Parquet files
- Adaptive indexing strategies
- Query result caching
- Compression tuning
ð v0.8 - Advanced Features (PLANNED)
- Multi-tenancy support (Domain layer complete)
- Event encryption at rest
- Audit logging
- Retention policies
- Data archival
- Backup/restore (Partially implemented)
ð v1.0 - Distributed & Cloud-Native (PLANNED)
- Distributed replication
- Multi-region support
- Consensus protocol (Raft)
- Arrow Flight RPC
- Kubernetes operators
- Cloud-native deployment
- Horizontal scaling
- Load balancing
ðŪ Future Enhancements (BACKLOG)
- GraphQL API
- WASM plugin system
- Change Data Capture (CDC)
- Time-series optimization
- Machine learning integrations
- Real-time anomaly detection
- Event sourcing templates
- Visual query builder
ðŽ Technical Decisions
Why Rust?
- Performance: Zero-cost abstractions, no GC pauses
- Safety: Ownership model prevents data races
- Concurrency: Fearless concurrency with Send/Sync
- Ecosystem: Excellent libraries (Tokio, Axum, Arrow)
Why DashMap?
- Lock-free concurrent HashMap
- Better than
RwLock<HashMap>for reads - Sharded internally for minimal contention
- O(1) lookups with concurrent access
Why Apache Arrow/Parquet?
- Industry-standard columnar format
- Zero-copy data access
- SIMD-accelerated operations
- Excellent compression ratios
- Interoperable with DataFusion, Polars, DuckDB
Why Tokio + Axum?
- High-performance async runtime
- Type-safe request handlers
- Excellent ecosystem integration
- Low overhead, fast routing
Why parking_lot?
- Smaller and faster than std::sync::RwLock
- No poisoning - simpler error handling
- Better performance under contention
- Widely used in production Rust
ð Usage Examples
Programmatic API
use ;
use json;
// Create store
let store = new;
// Ingest events
let event = new;
store.ingest?;
// Query events
let request = QueryEventsRequest ;
let events = store.query?;
// Reconstruct state
let state = store.reconstruct_state?;
println!;
// Time-travel query
let timestamp = now - hours;
let past_state = store.reconstruct_state?;
println!;
Custom Projection
use Projection;
use Event;
use Value;
use RwLock;
use HashMap;
ð Troubleshooting
Port Already in Use
# Find process using port 8080
# Kill process
Slow Performance
# Always use release mode for benchmarks
# Check system resources
Memory Issues
# Monitor memory usage
RUST_LOG=allsource_core=debug
# Reduce batch sizes in config
# Adjust snapshot_config.max_events_before_snapshot
Test Failures
# Clean build
# Check for stale processes
# Verbose test output
ð Resources
- Event Sourcing Pattern
- CQRS Pattern
- Apache Arrow
- Parquet Format
- Tokio Async Runtime
- Axum Web Framework
ðĪ Contributing
Contributions are welcome! Areas of interest:
- Performance optimizations
- Additional projection types
- Query optimization
- Documentation improvements
- Bug fixes and tests
ð License
MIT License - see LICENSE file for details
AllSource Core - Event sourcing, done right
Built with ðĶ Rust | Clean Architecture | Made for Production
Version 0.6.0 | 469K events/sec | 219 tests passing | Phase 2 Complete