# Changelog
All notable changes to AllSource Core will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [0.8.0] - 2026-02-03
### Added
#### Clean Architecture Refactoring (v1.1)
**Domain Layer** - Complete domain-driven design implementation:
- **Value Objects**: `ArticleId`, `CreatorId`, `EmbeddingVector`, `EventId`, `ForkId`, `Money`, `ProjectionName`, `SchemaSubject`, `StreamName`, `TransactionId`, `Version`, `WalletAddress`
- **Entities**: `AccessToken`, `Creator`, `EventStoreFork`, `PaywallArticle`, `Transaction`
- **Aggregates**: `EventStream` with watermark tracking
**Application Layer** - Use cases and services:
- **Use Cases**: `ManageAccess`, `ManageArticle`, `ManageCreator`, `ManageFork`, `ProcessPayment`, `SemanticSearch`
- **Services**: `CreatorCoordinator`, `EventCoordinator`, `PaymentCoordinator`, `VectorSearch`
- **DTOs**: Complete DTO layer for all entities (AccessToken, Article, Common, Creator, Event, Filter, Fork, Transaction, VectorSearch)
**Infrastructure Layer** - Repository implementations:
- **Repositories**: In-memory implementations for AccessToken, Article, Creator, EventStream, Fork, Transaction, VectorSearch
- **RocksDB**: Event stream repository with persistence
- **Web Handlers**: Article, Creator, Fork, Payment handlers with full CRUD
- **DI Container**: `ServiceContainer` with `ContainerBuilder` for dependency injection
#### Production Hardening (SierraDB Patterns)
**Storage Integrity** (`storage_integrity.rs`):
- SHA-256 checksums for data verification
- Per-segment WAL integrity checks
- Per-file Parquet checksums
- Corruption detection on startup
**Partition Monitoring**:
- Per-partition event count tracking
- Partition write latency histograms
- Hot partition detection (>2x average load alerts)
- Prometheus metrics exposure
**7-Day Stress Test Suite** (`tests/stress_tests/seven_day_stress.rs`):
- Configurable duration (7 days default, 1 hour short mode)
- 10K events/sec sustained load
- Memory leak detection
- Partition balance verification
- Watermark consistency checks
#### Native Search Capabilities
**Vector Search Engine** (`vector_search.rs`):
- Semantic similarity search using embeddings
- HNSW index for fast approximate nearest neighbor search
- Event embedding generation
- Configurable similarity thresholds
**Integration with MCP**:
- `semantic_search_events` tool for AI agents
- `hybrid_search` combining vector and keyword search
#### Web API Enhancements
**New Endpoints**:
- `POST /api/v1/articles` - Create article
- `GET /api/v1/articles/:id` - Get article
- `PUT /api/v1/articles/:id` - Update article
- `DELETE /api/v1/articles/:id` - Delete article
- `POST /api/v1/creators` - Create creator
- `GET /api/v1/creators/:id` - Get creator
- `POST /api/v1/forks` - Create event store fork
- `GET /api/v1/forks/:id` - Get fork state
- `POST /api/v1/payments` - Process payment
- `GET /api/v1/search/semantic` - Semantic search
#### Development Mode Support
**`ALLSOURCE_DEV_MODE` Environment Variable**:
- Bypass authentication and rate limiting for local development
- Auto-injects admin context (`dev-user`, `dev-tenant`, `Admin` role)
- Warning logged on startup when enabled
- Documented in `docs/SECURITY.md`
### Changed
- Refactored persistence layer to use repository pattern
- Updated WebSocket handlers for cleaner separation of concerns
- Improved schema validation with new `SchemaSubject` value object
- Enhanced pipeline processing with domain events
### Fixed
- **Middleware Order**: Fixed critical bug where rate limit middleware ran before auth middleware
- In Tower/Axum, layers execute bottom-to-top
- Rate limit middleware now correctly runs after auth middleware
- Auth middleware properly populates `AuthContext` before rate limiting checks
### Performance
- Maintained 726K events/sec throughput
- Clean architecture adds minimal overhead (~2-3%)
- Repository pattern enables easier testing and mocking
### Technical Details
- 15+ new domain value objects with validation
- 6+ new entities following DDD patterns
- 7+ new repository implementations
- Full test coverage for domain layer
- SOLID principles throughout
---
## [0.7.0] - 2025-12-06
### Added
#### Serverless & Cloud-Native Support
- **Graceful Shutdown**: Proper SIGTERM/SIGINT signal handling for serverless platforms
- `shutdown_signal()` handler in `api_v1.rs`
- `with_graceful_shutdown()` integration with Axum server
- Clean shutdown logging for observability
- **PORT Environment Variable**: Standard serverless port configuration
- Fallback chain: `ALLSOURCE_PORT` → `PORT` (Cloud Run, Fly.io standard)
- `HOST` environment variable support
- **Optimized Dockerfile**: Serverless-ready container image
- `cargo-chef` for optimal dependency caching
- Stripped binaries for smaller images (~50% reduction)
- `tini` init system for proper signal propagation
- `MALLOC_ARENA_MAX=2` for reduced memory footprint
#### Cloud Deployment Configurations
- **Fly.io**: `fly.toml` with auto-scaling and health checks
- **Google Cloud Run**: Knative service YAML with startup probes
- **Helm Charts**: Complete Kubernetes deployment (`deploy/helm/allsource/`)
- Core deployment, service, PVC
- Query Service deployment with secrets
- Ingress, ServiceMonitor, PodDisruptionBudget
- **Kustomize**: Standalone K8s manifests (`deploy/k8s/`)
- **Docker Compose**: Full stack with Prometheus/Grafana monitoring
#### Projection State API (Query Service Integration)
- `GET /api/v1/projections` - List all projections
- `GET /api/v1/projections/:name` - Get projection details
- `GET /api/v1/projections/:name/:entity_id/state` - Get entity state
- `PUT /api/v1/projections/:name/:entity_id/state` - Save entity state
- `POST /api/v1/projections/:name/bulk` - Bulk get states
- DashMap-backed storage with **11.9μs** access latency
### Fixed
- **PostgreSQL Audit Repository**: `IpAddr` type compatibility with sqlx
- Converted to `Option<String>` for binding and reading
- **PostgreSQL Audit Repository**: Lifetime issue with `actor_id_only`
- Fixed by converting to owned `String`
### Changed
- Container images now use non-root user (UID 1000)
- Health checks disabled in Dockerfile (platforms provide their own)
- Default port changed to 8080 for serverless compatibility
---
## [0.6.0] - 2025-12-05
### Added
#### SIMD-Accelerated JSON Parsing (`simd_json.rs`)
- Zero-copy JSON parsing using `simd-json` library
- 2-3x faster parsing with SIMD instructions (AVX2, SSE4.2, NEON)
- `SimdJsonParser` for high-throughput deserialization
- `ZeroCopyJson` for read-only field access without full deserialization
- `BatchEventParser` for efficient batch processing
- `SimdJsonStats` for tracking parsing throughput and errors
- **Performance: 824K events/sec, 112 MB/s throughput**
#### Lock-Free Data Structures (`lock_free/`)
- `LockFreeEventQueue`: MPMC queue using crossbeam ArrayQueue
- ~10-20ns push/pop operations
- **Performance: 41M push/sec, 3.4M pop/sec**
- `ShardedEventQueue`: Distributed queue for high contention scenarios
- 16 shards by default for cache-line optimization
- Batch push/pop operations
- **Performance: 1.06M events/sec with 4 threads**
- `LockFreeMetrics`: Atomic counters for zero-contention monitoring
- Min/max/avg latency tracking with CAS operations
- ~5-10ns per metric update
#### Batch Processing Pipeline (`batch_processor.rs`)
- `BatchProcessor`: High-throughput ingestion combining SIMD + lock-free queues
- Configurable batch sizes (1K-50K events)
- Three configuration presets:
- `default()`: Balanced for general use
- `high_throughput()`: Maximum events/sec (50K batch, 10M queue)
- `low_latency()`: Quick responses (1K batch, 100K queue)
- `RawEventData`: Efficient event deserialization struct
- **Performance: 335K events/sec (single), 726K events/sec (4 threads)**
#### Arena Memory Pooling (`arena_pool.rs`)
- Thread-local arena pools using `bumpalo` allocator
- `get_arena()`: Get recycled arena from pool (~10-20ns)
- `PooledArena`: RAII wrapper with automatic pool return
- `ScopedArena`: Convenient scoped allocation pattern
- `SizedBufferPool`: Pre-allocated buffers for specific sizes
- 99%+ arena recycle rate
- **Performance: 28.5M allocations/sec**
#### Performance Validation Tests (`performance_test.rs`)
- Comprehensive benchmark suite for all optimizations
- Tests for SIMD parsing, queue throughput, batch processing
- Concurrent performance validation
- Sustained throughput testing
### Changed
- Added `simd-json = "0.14"` dependency for SIMD acceleration
- Added `bumpalo = "3.16"` dependency for arena allocation
- Added `From<SimdJsonError>` impl for `AllSourceError`
- Clean Architecture module organization maintained
### Performance
- **Lock-Free Queue (push)**: 41.1M ops/sec
- **Arena Allocations**: 28.5M allocs/sec
- **Lock-Free Queue (pop)**: 3.4M ops/sec
- **Sharded Queue (concurrent)**: 2.5M ops/sec
- **SIMD JSON Parsing**: 824K events/sec
- **Full Pipeline (4 threads)**: 726K events/sec
- **Sustained Throughput**: 418K events/sec
- **Batch Processor**: 335K events/sec
### Technical Details
- All new modules follow Clean Architecture patterns
- Full test coverage with 57+ new tests
- Debug and release mode performance thresholds
- Thread-safe implementations with `Send + Sync`
---
## [0.5.0] - 2025-10-20
### Added
#### Schema Registry
- JSON Schema-based event validation system
- Automatic schema versioning with compatibility checking
- 4 compatibility modes: None, Backward, Forward, Full
- Subject-based schema organization
- 6 new REST API endpoints for schema management:
- `POST /api/v1/schemas` - Register schema
- `GET /api/v1/schemas` - List subjects
- `GET /api/v1/schemas/:subject` - Get schema
- `GET /api/v1/schemas/:subject/versions` - List versions
- `POST /api/v1/schemas/validate` - Validate event
- `PUT /api/v1/schemas/:subject/compatibility` - Set compatibility mode
#### Event Replay Engine
- Point-in-time event replay functionality
- Projection rebuilding with progress tracking
- Configurable batch processing
- Async background execution using Tokio
- Cancellable replay operations
- 5 replay statuses: Pending, Running, Completed, Failed, Cancelled
- Real-time progress metrics (events/sec, percentage complete)
- 5 new REST API endpoints for replay management:
- `POST /api/v1/replay` - Start replay
- `GET /api/v1/replay` - List replays
- `GET /api/v1/replay/:replay_id` - Get progress
- `POST /api/v1/replay/:replay_id/cancel` - Cancel replay
- `DELETE /api/v1/replay/:replay_id` - Delete replay
#### Stream Processing Pipelines
- 6 pipeline operators:
- **Filter**: eq, ne, gt, lt, contains operations
- **Map**: uppercase, lowercase, trim, multiply, add transformations
- **Reduce**: count, sum, avg, min, max aggregations with grouping
- **Window**: tumbling, sliding, session windows for time-based aggregations
- **Enrich**: external data lookup and enrichment (placeholder)
- **Branch**: conditional event routing
- Stateful processing with thread-safe state management
- Window buffers with automatic time-based eviction
- Pipeline statistics tracking
- Integrated pipeline processing into event ingestion flow
- 7 new REST API endpoints for pipeline management:
- `POST /api/v1/pipelines` - Register pipeline
- `GET /api/v1/pipelines` - List pipelines
- `GET /api/v1/pipelines/:pipeline_id` - Get pipeline
- `DELETE /api/v1/pipelines/:pipeline_id` - Remove pipeline
- `GET /api/v1/pipelines/stats` - All pipeline stats
- `GET /api/v1/pipelines/:pipeline_id/stats` - Pipeline stats
- `PUT /api/v1/pipelines/:pipeline_id/reset` - Reset state
### Changed
- Enhanced event ingestion flow to include pipeline processing
- Updated `ProjectionManager::get_projection()` to return cloned `Arc` instead of reference
- Improved event ingestion performance by 4-14% with pipeline integration optimizations
### Performance
- Ingestion: 442-469K events/sec (single-threaded)
- Entity query: 11.9 μs
- State reconstruction: 3.5 μs (with snapshots)
- 48 tests passing (33 unit + 15 integration)
---
## [0.2.0] - 2025-01-15
### Added
#### Persistent Storage
- Apache Parquet columnar storage for events
- Write-Ahead Log (WAL) for crash recovery and durability
- Automatic compaction with 3 strategies:
- Size-based compaction
- Count-based compaction
- Age-based compaction
- Point-in-time snapshot system
- Automatic snapshot creation based on configurable thresholds
#### Real-time Streaming
- WebSocket server for real-time event broadcasting
- Client connection management
- Event subscription and filtering
#### Advanced Analytics
- Event frequency analysis with time bucketing
- Event correlation analysis
- Statistical summaries (count, avg, min, max)
- Time-window aggregations
#### API Endpoints
- 18 new REST API endpoints:
- WebSocket: `WS /api/v1/events/stream`
- Analytics: `/api/v1/analytics/*` (3 endpoints)
- Snapshots: `/api/v1/snapshots/*` (3 endpoints)
- Compaction: `/api/v1/compaction/*` (2 endpoints)
### Changed
- Event ingestion now writes to WAL first for durability
- State reconstruction optimized with snapshot fallback
- Enhanced storage architecture with multiple layers
### Performance
- 10-15% improvement in ingestion throughput with Parquet batching
- 100x faster state reconstruction with snapshots
- Zero data loss on crashes with WAL
---
## [0.1.0] - 2024-12-01
### Added
#### Core Event Store
- In-memory event storage with `Vec<Event>`
- Immutable append-only event log
- Event ID generation with UUID v7
- ISO 8601 timestamp support
- JSON payload storage
#### Indexing System
- DashMap-based concurrent indexing
- Entity ID index (O(1) lookup)
- Event type index (O(1) lookup)
- Event ID index (direct access)
- Thread-safe concurrent updates
#### Query Engine
- Query by entity ID
- Query by event type
- Time-travel queries with `as_of` parameter
- Time range filtering with `since`/`until`
- Result limiting
- Entity state reconstruction
#### Projections
- Real-time projection system
- Built-in projections:
- EntitySnapshotProjection (current state per entity)
- EventCounterProjection (event type statistics)
- Custom projection trait for user-defined projections
#### REST API
- 8 initial REST API endpoints:
- `GET /health` - Health check
- `POST /api/v1/events` - Ingest event
- `GET /api/v1/events/query` - Query events
- `GET /api/v1/entities/:entity_id/state` - Get entity state
- `GET /api/v1/entities/:entity_id/snapshot` - Get snapshot
- `GET /api/v1/stats` - System statistics
#### Error Handling
- Comprehensive error types
- HTTP status code mapping
- Type-safe error handling with `Result<T, AllSourceError>`
#### Testing
- 10+ unit tests
- 5+ integration tests
- Performance benchmarks with Criterion
### Performance
- 100K+ events/sec ingestion
- Sub-millisecond entity queries
- Concurrent read/write support
---
## [Unreleased]
### Planned for v0.9 - Production Hardening
- [ ] Dependency injection container completion (US-009)
- [ ] Storage integrity checks with checksums (US-010)
- [ ] Partition monitoring and alerting (US-011)
- [ ] 7-day continuous stress test suite (US-012)
- [ ] SIMD event filtering (US-025)
### Planned for v1.0 - Distributed & Cloud-Native
- [ ] Distributed replication (Raft consensus)
- [ ] Multi-region support
- [ ] Horizontal scaling
- [ ] Arrow Flight RPC
- [ ] Kubernetes operators
- [ ] Load balancing
- [ ] Prometheus metrics enhancements
- [ ] OpenTelemetry tracing improvements
Target: 1M+ events/sec (single node), 10M+ events/sec (distributed)
### Future Considerations
- [ ] GraphQL API
- [ ] WASM plugin system
- [ ] Change Data Capture (CDC)
- [ ] Time-series optimization
- [ ] Machine learning integrations
- [ ] Real-time anomaly detection
- [ ] Event sourcing templates
- [ ] Visual query builder
---
## Version History
| [0.8.0] | 2026-02-03 | ✅ Current | Clean Architecture, DDD, Vector Search, Native Search |
| [0.7.0] | 2025-12-06 | ✅ Stable | Serverless support, Projection State API |
| [0.6.0] | 2025-12-05 | ✅ Stable | SIMD JSON, lock-free queues, batch processing, arena pools |
| [0.5.0] | 2025-10-20 | ✅ Stable | Schema registry, event replay, stream processing |
| [0.2.0] | 2025-01-15 | ✅ Stable | Parquet storage, WAL, snapshots, analytics |
| [0.1.0] | 2024-12-01 | ✅ Stable | Core event store, indexing, projections |
---
## Upgrade Notes
### Upgrading from 0.5.0 to 0.6.0
**Breaking Changes**: None
**New Features**: All performance optimizations are opt-in and transparent.
**Dependencies**:
- Added `simd-json = "0.14"` for SIMD-accelerated JSON parsing
- Added `bumpalo = "3.16"` for arena memory allocation
**New Modules** (all in `infrastructure::persistence`):
- `simd_json`: SIMD JSON parsing utilities
- `lock_free::sharded_queue`: High-throughput sharded queue
- `batch_processor`: Batch processing pipeline
- `arena_pool`: Thread-local arena memory pools
- `performance_test`: Performance validation tests
**Migration Steps**:
1. Update dependencies: `cargo update`
2. Rebuild: `cargo build --release`
3. Run tests: `cargo test`
4. No data migration required
5. Optionally integrate new batch processing APIs for higher throughput
**Performance Improvements**:
- JSON parsing: 2-3x faster with SIMD
- Queue operations: 10-100x faster with lock-free structures
- Memory allocation: 5-10x faster with arena pooling
- Full pipeline: 726K events/sec (up from 469K)
### Upgrading from 0.2.0 to 0.5.0
**Breaking Changes**: None
**New Features**: All new features are opt-in and don't affect existing functionality.
**Configuration**:
- New `SchemaRegistryConfig` added to `EventStoreConfig` (defaults provided)
- New managers: `ReplayManager` and `PipelineManager` (automatically initialized)
**API Changes**:
- 12 new API endpoints (all additive)
- Existing endpoints unchanged
**Migration Steps**:
1. Update dependencies: `cargo update`
2. Rebuild: `cargo build --release`
3. Run tests: `cargo test`
4. No data migration required
### Upgrading from 0.1.0 to 0.2.0
**Breaking Changes**: None
**New Features**: All new features are opt-in.
**Configuration**:
- New optional storage configuration for Parquet persistence
- New optional WAL configuration
- New optional snapshot configuration
**Migration Steps**:
1. Update dependencies
2. Rebuild application
3. Optionally configure persistent storage
4. No data migration required for in-memory mode
---
## Contributing
Please read [CONTRIBUTING.md](CONTRIBUTING.md) for details on our code of conduct and the process for submitting pull requests.
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.