AllSource Core - High-Performance Event Store
AI-native event store built in Rust with columnar storage, schema validation, event replay, and stream processing
๐ฏ What is AllSource?
AllSource is a high-performance event store designed for modern event-sourcing and CQRS architectures. Built with a polyglot architecture:
- Rust Core (this service): High-performance event store engine with columnar storage
- Go Control Plane (
services/control-plane): Orchestration, monitoring, and management layer
The Rust core provides blazing-fast event ingestion (469K events/sec) and sub-microsecond queries, while the Go control plane handles cluster coordination and operational tasks.
Current Version: v0.7.1 ยท crates.io ยท docs.rs
Installation
allsource-core is distributed as a library crate via crates.io.
Add to Your Project
Or add to your Cargo.toml (pin to minor version for stability):
[]
# allsource-core: High-performance event store
# Pin to minor version - allows patch updates only
= "0.7"
Version Pinning Best Practice: We recommend
"0.7"(minor version) rather than"0.7.1"(exact) or"0"(major only). This allows automatic patch updates while avoiding breaking changes.
Usage Patterns
- Embedded Library - Import directly into your Rust application
- Standalone Server - Build your own binary using the library
- Serverless - Deploy in AWS Lambda or similar environments
โจ Features
๐ Core Event Store (v0.1)
- Immutable Event Log: Append-only storage with complete audit trail
- Time-Travel Queries: Query entity state as of any timestamp
- Concurrent Indexing: Lock-free indexing using
DashMapfor O(1) lookups - Real-time Projections: Built-in materialized views with custom projection support
- High Performance: 469K+ events/sec throughput, sub-millisecond queries
- Type-Safe API: Strong typing with Rust's ownership system
๐พ Persistence & Durability (v0.2)
- Parquet Columnar Storage: Apache Arrow-based storage for analytics
- Write-Ahead Log (WAL): Crash recovery with full durability guarantees
- Snapshot System: Point-in-time snapshots with automatic optimization
- Automatic Compaction: Background file merging for storage efficiency
- WebSocket Streaming: Real-time event broadcasting to connected clients
- Advanced Analytics: Event frequency, correlation analysis, statistical summaries
๐ Schema Registry (v0.5)
- JSON Schema Validation: Enforce event contracts at ingestion time
- Schema Versioning: Automatic version management with compatibility checking
- Compatibility Modes: Backward, Forward, Full, or None
- Breaking Change Prevention: Validate schema evolution before deployment
- Subject Organization: Group schemas by domain or event type
๐ Event Replay & Projections (v0.5)
- Point-in-Time Replay: Replay events from any timestamp
- Projection Rebuilding: Reconstruct materialized views with progress tracking
- Batch Processing: Configurable batch sizes for optimal performance
- Async Execution: Non-blocking background replay operations
- Cancellable Operations: Stop replays gracefully with proper cleanup
- Progress Metrics: Real-time statistics (events/sec, percentage complete)
๐ Security & Multi-tenancy (v0.7)
- Authentication: JWT tokens with RBAC, API keys for services
- Authorization: Role-based access control (Admin, Operator, Reader)
- Rate Limiting: Per-tenant configurable limits
- IP Filtering: Global allowlist/blocklist support
- Audit Logging: Comprehensive audit trail with PostgreSQL support
- Tenant Isolation: Repository-level isolation with quotas
- Serverless Ready: Graceful shutdown, PORT env var, optimized containers
โก Stream Processing (v0.5)
- 6 Built-in Operators:
- Filter: eq, ne, gt, lt, contains operations
- Map: Transform field values (uppercase, lowercase, trim, math)
- Reduce: Aggregations (count, sum, avg, min, max) with grouping
- Window: Time-based aggregations (tumbling, sliding, session)
- Enrich: External data lookup and enrichment
- Branch: Conditional event routing
- Stateful Processing: Thread-safe state management for aggregations
- Window Buffers: Automatic time-based event eviction
- Pipeline Statistics: Track processing metrics per pipeline
- Integrated Processing: Events flow through pipelines during ingestion
๐๏ธ Clean Architecture
The codebase follows a strict layered architecture for maintainability and testability:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Infrastructure Layer โ
โ (HTTP handlers, WebSocket, persistence, security) โ
โ infrastructure::web, infrastructure::persistence, โ
โ infrastructure::security, infrastructure::repositories โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Application Layer โ
โ (Use cases, services, DTOs) โ
โ application::use_cases, application::services, โ
โ application::dto โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Domain Layer โ
โ (Entities, value objects, repository traits) โ
โ domain::entities, domain::value_objects, โ
โ domain::repositories โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Module Organization:
| Layer | Path | Contents |
|---|---|---|
| Domain | domain::entities |
Event, Tenant, Schema, Projection, AuditEvent, EventStream |
| Domain | domain::value_objects |
EntityId, TenantId, EventType, PartitionKey |
| Domain | domain::repositories |
Repository trait definitions (no implementations) |
| Application | application::use_cases |
IngestEvent, QueryEvents, ManageTenant, ManageSchema |
| Application | application::services |
AuditLogger, Pipeline, Replay, Analytics |
| Application | application::dto |
Request/Response DTOs for API boundaries |
| Infrastructure | infrastructure::web |
HTTP handlers, WebSocket, API routes |
| Infrastructure | infrastructure::persistence |
Storage, WAL, Snapshots, Compaction, Index |
| Infrastructure | infrastructure::security |
Auth, Middleware, Rate Limiting |
| Infrastructure | infrastructure::repositories |
In-memory, PostgreSQL, RocksDB implementations |
Dependency Rule: Inner layers never depend on outer layers. Domain has zero external dependencies.
๐๏ธ SierraDB-Inspired Production Patterns (NEW)
Based on battle-tested patterns from SierraDB, a production-grade event store:
PartitionKey - Fixed Partition Architecture
- โ 32 fixed partitions for single-node deployment (scalable to 1024+)
- โ Consistent hashing ensures same entity always maps to same partition
- โ Sequential writes within partitions for ordering guarantees
- โ Ready for horizontal scaling with node assignment
- โ 6 comprehensive tests covering distribution and consistency
EventStream - Gapless Version Guarantees
- โ Watermark system tracks "highest continuously confirmed sequence"
- โ Prevents gaps that would break event sourcing guarantees
- โ Optimistic locking prevents concurrent modification conflicts
- โ Version numbers start at 1 and increment sequentially
- โ 9 tests covering versioning, concurrency, and gap detection
EventStreamRepository - Infrastructure Implementation
- โ Thread-safe in-memory implementation with parking_lot RwLock
- โ Partition-aware stream storage and retrieval
- โ Watermark tracking and gapless verification
- โ Optimistic locking enforcement at repository level
- โ 8 comprehensive tests covering all operations
StorageIntegrity - Corruption Prevention
- โ SHA-256 checksums for data integrity
- โ WAL segment verification
- โ Parquet file integrity checking
- โ Batch verification with progress reporting
- โ 8 tests covering checksums and file verification
7-Day Stress Tests - Production Hardening
- โ Continuous ingestion tests (7 days, 1 hour, 5 minutes configs)
- โ Memory leak detection
- โ Corruption detection over time
- โ Partition balance monitoring
- โ 4 tests + configurable long-running tests
Why These Patterns?
| Pattern | SierraDB's Lesson | Our Benefit |
|---|---|---|
| Fixed Partitions | Sequential writes enable gapless sequences | Horizontal scaling without complex coordination |
| Gapless Versions | Watermark prevents data gaps | Consistent event sourcing guarantees |
| Optimistic Locking | Detect concurrent modifications | Safe concurrent access without heavy locks |
Implemented:
- โ Storage integrity checksums (prevent silent corruption)
- โ 7-day continuous stress tests (production confidence)
Coming Next:
- โก Zero-copy deserialization (performance optimization)
- ๐ Batch processing optimizations
- ๐ Lock-free data structures
๐ Performance Benchmarks
Measured on Apple Silicon M-series (release build):
| Operation | Throughput/Latency | Details |
|---|---|---|
| Event Ingestion | 442-469K events/sec | Single-threaded |
| Entity Query | 11.9 ฮผs | Indexed lookup |
| Type Query | 2.4 ms | Cross-entity scan |
| State Reconstruction | 3.5 ฮผs | With snapshots |
| State Reconstruction | 3.8 ฮผs | Without snapshots |
| Concurrent Writes (8 workers) | 8.0 ms/batch | 100 events/batch |
| Parquet Batch Write | 3.5 ms | 1000 events |
| Snapshot Creation | 130 ฮผs | Per entity |
| WAL Sync Writes | 413 ms | 100 syncs |
Test Coverage: 250 tests - 100% passing
- Domain Layer: 177 tests (Value Objects, Entities, Business Rules, SierraDB Patterns)
- PartitionKey: 6 tests (consistent hashing, distribution, node assignment)
- EventStream: 9 tests (gapless versioning, optimistic locking, watermarks)
- Application Layer: 20 tests (Use Cases, DTOs)
- Infrastructure Layer: 53 tests (API, Storage, Services, Repository Implementations)
- InMemoryEventStreamRepository: 8 tests (SierraDB pattern implementation)
- StorageIntegrity: 8 tests (checksum verification, WAL/Parquet integrity)
- 7-Day Stress Tests: 4 tests (long-running corruption detection)
๐ง API Endpoints (38 Total)
Core Event Store
# Health check
# Ingest event
# Query events
&limit=100
# Entity state
# Statistics
WebSocket Streaming (v0.2)
# Real-time event stream
Analytics (v0.2)
# Event frequency analysis
&bucket_size=3600
# Statistical summary
# Event correlation
&event_b=order.placed
Snapshots (v0.2)
# Create snapshot
# List snapshots
# Get latest snapshot
Compaction (v0.2)
# Trigger manual compaction
# Get compaction stats
Schema Registry (v0.5)
# Register schema
# List subjects
# Get schema
# List versions
# Validate event
# Set compatibility mode
Event Replay (v0.5)
# Start replay
# List replays
# Get progress
# Cancel replay
# Delete replay
Stream Processing Pipelines (v0.5)
# Register pipeline
# List pipelines
# Get pipeline
# Remove pipeline
# Get all stats
# Get pipeline stats
# Reset pipeline state
๐ Project Structure (Clean Architecture)
Following Clean Architecture principles with clear separation of concerns:
services/core/src/
โโโ main.rs # Application entry point
โโโ lib.rs # Library exports
โโโ error.rs # Error types and Result
โ
โโโ domain/ # ๐๏ธ DOMAIN LAYER (Business Logic)
โ โโโ entities/ # Core business entities
โ โ โโโ event.rs # Event entity (162 tests)
โ โ โโโ tenant.rs # Multi-tenancy entity
โ โ โโโ schema.rs # Schema registry entity
โ โ โโโ projection.rs # Projection entity
โ โ โโโ event_stream.rs # ๐ Gapless event stream (9 tests)
โ โโโ value_objects/ # Self-validating value objects
โ โโโ tenant_id.rs # Tenant identifier
โ โโโ event_type.rs # Event type validation
โ โโโ entity_id.rs # Entity identifier
โ โโโ partition_key.rs # ๐ Fixed partitioning (6 tests)
โ
โโโ application/ # ๐ฏ APPLICATION LAYER (Use Cases)
โ โโโ dto/ # Data Transfer Objects
โ โ โโโ event_dto.rs # Event request/response DTOs
โ โ โโโ tenant_dto.rs # Tenant DTOs
โ โ โโโ schema_dto.rs # Schema DTOs
โ โ โโโ projection_dto.rs # Projection DTOs
โ โโโ use_cases/ # Application business logic
โ โโโ ingest_event.rs # Event ingestion (3 tests)
โ โโโ query_events.rs # Event queries (4 tests)
โ โโโ manage_tenant.rs # Tenant management (5 tests)
โ โโโ manage_schema.rs # Schema operations (4 tests)
โ โโโ manage_projection.rs # Projection ops (4 tests)
โ
โโโ infrastructure/ # ๐ง INFRASTRUCTURE LAYER (Technical)
โโโ repositories/ # ๐ Repository implementations (SierraDB)
โ โโโ in_memory_event_stream_repository.rs # Thread-safe, partitioned (8 tests)
โโโ persistence/ # ๐ Storage integrity (SierraDB)
โ โโโ storage_integrity.rs # Checksum verification (8 tests)
โโโ api.rs # REST API endpoints (38 endpoints)
โโโ store.rs # Event store implementation
โโโ storage.rs # Parquet columnar storage
โโโ wal.rs # Write-ahead log
โโโ snapshot.rs # Snapshot management
โโโ compaction.rs # Storage compaction
โโโ index.rs # High-performance indexing
โโโ projection.rs # Real-time projections
โโโ analytics.rs # Analytics engine
โโโ websocket.rs # WebSocket streaming
โโโ schema.rs # Schema validation service
โโโ replay.rs # Event replay engine
โโโ pipeline.rs # Stream processing
โโโ backup.rs # Backup management
โโโ auth.rs # Authentication/Authorization
โโโ rate_limit.rs # Rate limiting
โโโ tenant.rs # Tenant service
โโโ metrics.rs # Metrics collection
โโโ middleware.rs # HTTP middleware
โโโ config.rs # Configuration
โโโ tenant_api.rs # Tenant API handlers
โโโ auth_api.rs # Auth API handlers
tests/
โโโ integration_tests.rs # End-to-end tests
โโโ stress_tests/ # ๐ Long-running stress tests (SierraDB)
โโโ seven_day_stress.rs # 7-day corruption detection (4 tests)
benches/
โโโ performance_benchmarks.rs # Performance benchmarks
๐๏ธ Clean Architecture Benefits
Domain Layer (Core Business Logic)
- โ Pure business rules with zero external dependencies
- โ Value Objects enforce invariants at construction time
- โ Entities contain rich domain behavior
- โ SierraDB patterns for production-grade event streaming
- โ 177 comprehensive domain tests (including 15 SierraDB tests)
Application Layer (Orchestration)
- โ Use Cases coordinate domain entities
- โ DTOs isolate external contracts from domain
- โ Clear input/output boundaries
- โ 20 use case tests covering all scenarios
Infrastructure Layer (Technical Details)
- โ Pluggable implementations (can swap storage, APIs)
- โ Framework and library dependencies isolated
- โ Domain and Application layers remain pure
- โ 37 infrastructure integration tests
๐ Quick Start
Option 1: Use as a Library (Recommended)
Add to your Cargo.toml:
[]
= "0.7" # Pin to minor version
Or:
Option 2: Build from Source
# Clone the repository
# Build
# Run tests
# Run benchmarks
Running the Server
# Development mode
# Production mode (optimized)
# With debug logging
RUST_LOG=debug
# Custom port (modify main.rs)
# Default: 0.0.0.0:8080
Example: Ingest Events
Example: Query Events
# Get all events for an entity
# Time-travel query
# Query by type
Example: Register Schema (v0.5)
Example: Start Replay (v0.5)
Example: Create Pipeline (v0.5)
โ Quality Gates
AllSource Core enforces strict quality gates to maintain code quality, consistency, and reliability.
See: QUALITY_GATES.md for complete documentation.
Quick Start
# Run all quality gates (recommended before commit)
# Auto-fix formatting and sorting
# Full CI pipeline locally
Quality Checks Enforced
- โ Code Formatting (rustfmt) - Consistent style across codebase
- โ Code Quality (clippy) - Zero warnings, catch anti-patterns
- โ Dependency Sorting (cargo-sort) - Alphabetically sorted Cargo.toml
- โ Test Coverage - All tests must pass
- โ Build Verification - Release build must succeed
CI/CD Integration
Quality gates run automatically on:
- Every push to
mainordevelop - Every pull request
- Enforced before merge
Workflow: .github/workflows/rust-quality.yml
Configuration Files
.clippy.toml- Clippy linter configuration (MSRV: 1.70.0)rustfmt.toml- Code formatting rules (max width: 100)cargo-sort.toml- Dependency sorting configurationMakefile- Quality gate commands
๐งช Testing
# Run all tests
# Run unit tests only
# Run integration tests
# Run specific test
# Run with output
# Run with logging
RUST_LOG=debug
๐ Benchmarking
# Run all benchmarks
# Run specific benchmark suite
# View results
๐๏ธ Architecture
System Architecture
AllSource uses a polyglot architecture with specialized services:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Go Control Plane (Port 8081) โ
โ โข Cluster Management โ
โ โข Metrics Aggregation โ
โ โข Operation Orchestration โ
โ โข Health Monitoring โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ HTTP Client
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Rust Event Store (Port 8080) โ
โ โข Event Ingestion (469K/sec) โ
โ โข Query Engine (<12ฮผs) โ
โ โข Schema Registry โ
โ โข Stream Processing โ
โ โข Event Replay โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Storage Layer โ
โ โข Parquet (Columnar) โ
โ โข WAL (Durability) โ
โ โข Snapshots (Point-in-time) โ
โ โข In-Memory Indexes โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Event Flow
When an event is ingested:
- Validation - Check event integrity
- WAL Write - Durable write-ahead log (v0.2)
- Indexing - Update entity/type indexes
- Projections - Update materialized views
- Pipelines - Real-time stream processing (v0.5)
- Parquet Storage - Columnar persistence (v0.2)
- In-Memory Store - Fast access layer
- WebSocket Broadcast - Real-time streaming (v0.2)
- Auto-Snapshots - Create snapshots if needed (v0.2)
Storage Architecture
Storage Layer:
โโโ In-Memory Events (Vec<Event>)
โโโ DashMap Indexes (entity_id, event_type)
โโโ Parquet Files (columnar storage)
โโโ WAL Segments (append-only logs)
โโโ Snapshots (point-in-time state)
Concurrency Model
- Lock-free Indexes: DashMap for entity/type lookups
- RwLock: parking_lot RwLock for event list
- Atomic Counters: Lock-free statistics tracking
- Async Runtime: Tokio for background tasks (replay, compaction)
๐ฏ Roadmap
โ v0.1 - Core Event Store (COMPLETED)
- In-memory event storage
- DashMap-based indexing
- Entity state reconstruction
- Basic projections
- REST API
- Query by entity/type/time
โ v0.2 - Persistence & Durability (COMPLETED)
- Parquet columnar storage
- Write-ahead log (WAL)
- Snapshot system
- Automatic compaction
- WebSocket streaming
- Advanced analytics
โ v0.5 - Schema & Processing (COMPLETED)
- Schema registry with versioning
- Event replay engine
- Projection rebuilding
- Stream processing pipelines
- Stateful aggregations
- Window operations
โ v0.6 - Clean Architecture Refactoring (PHASE 2 COMPLETED)
- Phase 1: Domain Layer (162 tests)
- Value Objects (TenantId, EventType, EntityId)
- Domain Entities (Event, Tenant, Schema, Projection)
- Business rule enforcement
- Self-validating types
- Phase 2: Application Layer (20 tests)
- DTOs for all operations
- Tenant management use cases
- Schema management use cases
- Projection management use cases
- Clean separation from domain
- Phase 3: Infrastructure Layer (IN PROGRESS)
- Repository pattern implementation
- API layer refactoring
- Service layer extraction
- Dependency injection
โ v0.7 - Security & Cloud-Native (COMPLETED)
- Security Infrastructure
- JWT authentication with role-based access control (RBAC)
- API key authentication for service-to-service
- Rate limiting per tenant
- IP filtering (global allowlist/blocklist)
- Request ID tracking for audit trail
- Multi-tenancy with Quotas
- Tenant isolation at repository level
- Configurable quotas (events/day, storage, rate limits)
- Tenant activation/deactivation
- Audit Logging
- Comprehensive audit events for all operations
- PostgreSQL audit repository
- In-memory audit repository for testing
- Serverless & Cloud Support
- Graceful shutdown (SIGTERM handling)
- PORT environment variable support
- Fly.io deployment configurations
- Google Cloud Run configurations
- Helm charts for Kubernetes
- Performance Optimizations
- Arena-based memory pooling (2-5ns allocations)
- SIMD JSON parsing
- Lock-free batch processing
๐ v0.8 - Advanced Features (PLANNED)
- Multi-tenancy support (Complete)
- Audit logging (Complete)
- Event encryption at rest
- Retention policies
- Data archival
- Backup/restore enhancements
๐ v1.0 - Distributed & Cloud-Native (PLANNED)
- Distributed replication
- Multi-region support
- Consensus protocol (Raft)
- Arrow Flight RPC
- Kubernetes operators
- Cloud-native deployment
- Horizontal scaling
- Load balancing
๐ฎ Future Enhancements (BACKLOG)
- GraphQL API
- WASM plugin system
- Change Data Capture (CDC)
- Time-series optimization
- Machine learning integrations
- Real-time anomaly detection
- Event sourcing templates
- Visual query builder
๐ฌ Technical Decisions
Why Rust?
- Performance: Zero-cost abstractions, no GC pauses
- Safety: Ownership model prevents data races
- Concurrency: Fearless concurrency with Send/Sync
- Ecosystem: Excellent libraries (Tokio, Axum, Arrow)
Why DashMap?
- Lock-free concurrent HashMap
- Better than
RwLock<HashMap>for reads - Sharded internally for minimal contention
- O(1) lookups with concurrent access
Why Apache Arrow/Parquet?
- Industry-standard columnar format
- Zero-copy data access
- SIMD-accelerated operations
- Excellent compression ratios
- Interoperable with DataFusion, Polars, DuckDB
Why Tokio + Axum?
- High-performance async runtime
- Type-safe request handlers
- Excellent ecosystem integration
- Low overhead, fast routing
Why parking_lot?
- Smaller and faster than std::sync::RwLock
- No poisoning - simpler error handling
- Better performance under contention
- Widely used in production Rust
๐ Usage Examples
Programmatic API
use ;
use json;
// Create store
let store = new;
// Ingest events
let event = new;
store.ingest?;
// Query events
let request = QueryEventsRequest ;
let events = store.query?;
// Reconstruct state
let state = store.reconstruct_state?;
println!;
// Time-travel query
let timestamp = now - hours;
let past_state = store.reconstruct_state?;
println!;
Custom Projection
use Projection;
use Event;
use Value;
use RwLock;
use HashMap;
๐ Troubleshooting
Port Already in Use
# Find process using port 8080
# Kill process
Slow Performance
# Always use release mode for benchmarks
# Check system resources
Memory Issues
# Monitor memory usage
RUST_LOG=allsource_core=debug
# Reduce batch sizes in config
# Adjust snapshot_config.max_events_before_snapshot
Test Failures
# Clean build
# Check for stale processes
# Verbose test output
๐ Resources
- Event Sourcing Pattern
- CQRS Pattern
- Apache Arrow
- Parquet Format
- Tokio Async Runtime
- Axum Web Framework
๐ค Contributing
Contributions are welcome! Areas of interest:
- Performance optimizations
- Additional projection types
- Query optimization
- Documentation improvements
- Bug fixes and tests
๐ License
MIT License - see LICENSE file for details
AllSource Core - Event sourcing, done right
Built with ๐ฆ Rust | Clean Architecture | Made for Production
Version 0.7.1 | 469K events/sec | 470+ tests passing | Security & Cloud-Native