allsource-core 0.19.2

High-performance event store core built in Rust
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
# Changelog

All notable changes to AllSource Core will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.8.0] - 2026-02-03

### Added

#### Clean Architecture Refactoring (v1.1)

**Domain Layer** - Complete domain-driven design implementation:
- **Value Objects**: `ArticleId`, `CreatorId`, `EmbeddingVector`, `EventId`, `ForkId`, `Money`, `ProjectionName`, `SchemaSubject`, `StreamName`, `TransactionId`, `Version`, `WalletAddress`
- **Entities**: `AccessToken`, `Creator`, `EventStoreFork`, `PaywallArticle`, `Transaction`
- **Aggregates**: `EventStream` with watermark tracking

**Application Layer** - Use cases and services:
- **Use Cases**: `ManageAccess`, `ManageArticle`, `ManageCreator`, `ManageFork`, `ProcessPayment`, `SemanticSearch`
- **Services**: `CreatorCoordinator`, `EventCoordinator`, `PaymentCoordinator`, `VectorSearch`
- **DTOs**: Complete DTO layer for all entities (AccessToken, Article, Common, Creator, Event, Filter, Fork, Transaction, VectorSearch)

**Infrastructure Layer** - Repository implementations:
- **Repositories**: In-memory implementations for AccessToken, Article, Creator, EventStream, Fork, Transaction, VectorSearch
- **RocksDB**: Event stream repository with persistence
- **Web Handlers**: Article, Creator, Fork, Payment handlers with full CRUD
- **DI Container**: `ServiceContainer` with `ContainerBuilder` for dependency injection

#### Production Hardening (SierraDB Patterns)

**Storage Integrity** (`storage_integrity.rs`):
- SHA-256 checksums for data verification
- Per-segment WAL integrity checks
- Per-file Parquet checksums
- Corruption detection on startup

**Partition Monitoring**:
- Per-partition event count tracking
- Partition write latency histograms
- Hot partition detection (>2x average load alerts)
- Prometheus metrics exposure

**7-Day Stress Test Suite** (`tests/stress_tests/seven_day_stress.rs`):
- Configurable duration (7 days default, 1 hour short mode)
- 10K events/sec sustained load
- Memory leak detection
- Partition balance verification
- Watermark consistency checks

#### Native Search Capabilities

**Vector Search Engine** (`vector_search.rs`):
- Semantic similarity search using embeddings
- HNSW index for fast approximate nearest neighbor search
- Event embedding generation
- Configurable similarity thresholds

**Integration with MCP**:
- `semantic_search_events` tool for AI agents
- `hybrid_search` combining vector and keyword search

#### Web API Enhancements

**New Endpoints**:
- `POST /api/v1/articles` - Create article
- `GET /api/v1/articles/:id` - Get article
- `PUT /api/v1/articles/:id` - Update article
- `DELETE /api/v1/articles/:id` - Delete article
- `POST /api/v1/creators` - Create creator
- `GET /api/v1/creators/:id` - Get creator
- `POST /api/v1/forks` - Create event store fork
- `GET /api/v1/forks/:id` - Get fork state
- `POST /api/v1/payments` - Process payment
- `GET /api/v1/search/semantic` - Semantic search

#### Development Mode Support

**`ALLSOURCE_DEV_MODE` Environment Variable**:
- Bypass authentication and rate limiting for local development
- Auto-injects admin context (`dev-user`, `dev-tenant`, `Admin` role)
- Warning logged on startup when enabled
- Documented in `docs/SECURITY.md`

### Changed
- Refactored persistence layer to use repository pattern
- Updated WebSocket handlers for cleaner separation of concerns
- Improved schema validation with new `SchemaSubject` value object
- Enhanced pipeline processing with domain events

### Fixed
- **Middleware Order**: Fixed critical bug where rate limit middleware ran before auth middleware
  - In Tower/Axum, layers execute bottom-to-top
  - Rate limit middleware now correctly runs after auth middleware
  - Auth middleware properly populates `AuthContext` before rate limiting checks

### Performance
- Maintained 726K events/sec throughput
- Clean architecture adds minimal overhead (~2-3%)
- Repository pattern enables easier testing and mocking

### Technical Details
- 15+ new domain value objects with validation
- 6+ new entities following DDD patterns
- 7+ new repository implementations
- Full test coverage for domain layer
- SOLID principles throughout

---

## [0.7.0] - 2025-12-06

### Added

#### Serverless & Cloud-Native Support
- **Graceful Shutdown**: Proper SIGTERM/SIGINT signal handling for serverless platforms
  - `shutdown_signal()` handler in `api_v1.rs`
  - `with_graceful_shutdown()` integration with Axum server
  - Clean shutdown logging for observability
- **PORT Environment Variable**: Standard serverless port configuration
  - Fallback chain: `ALLSOURCE_PORT``PORT` (Cloud Run, Fly.io standard)
  - `HOST` environment variable support
- **Optimized Dockerfile**: Serverless-ready container image
  - `cargo-chef` for optimal dependency caching
  - Stripped binaries for smaller images (~50% reduction)
  - `tini` init system for proper signal propagation
  - `MALLOC_ARENA_MAX=2` for reduced memory footprint

#### Cloud Deployment Configurations
- **Fly.io**: `fly.toml` with auto-scaling and health checks
- **Google Cloud Run**: Knative service YAML with startup probes
- **Helm Charts**: Complete Kubernetes deployment (`deploy/helm/allsource/`)
  - Core deployment, service, PVC
  - Query Service deployment with secrets
  - Ingress, ServiceMonitor, PodDisruptionBudget
- **Kustomize**: Standalone K8s manifests (`deploy/k8s/`)
- **Docker Compose**: Full stack with Prometheus/Grafana monitoring

#### Projection State API (Query Service Integration)
- `GET /api/v1/projections` - List all projections
- `GET /api/v1/projections/:name` - Get projection details
- `GET /api/v1/projections/:name/:entity_id/state` - Get entity state
- `PUT /api/v1/projections/:name/:entity_id/state` - Save entity state
- `POST /api/v1/projections/:name/bulk` - Bulk get states
- DashMap-backed storage with **11.9μs** access latency

### Fixed
- **PostgreSQL Audit Repository**: `IpAddr` type compatibility with sqlx
  - Converted to `Option<String>` for binding and reading
- **PostgreSQL Audit Repository**: Lifetime issue with `actor_id_only`
  - Fixed by converting to owned `String`

### Changed
- Container images now use non-root user (UID 1000)
- Health checks disabled in Dockerfile (platforms provide their own)
- Default port changed to 8080 for serverless compatibility

---

## [0.6.0] - 2025-12-05

### Added

#### SIMD-Accelerated JSON Parsing (`simd_json.rs`)
- Zero-copy JSON parsing using `simd-json` library
- 2-3x faster parsing with SIMD instructions (AVX2, SSE4.2, NEON)
- `SimdJsonParser` for high-throughput deserialization
- `ZeroCopyJson` for read-only field access without full deserialization
- `BatchEventParser` for efficient batch processing
- `SimdJsonStats` for tracking parsing throughput and errors
- **Performance: 824K events/sec, 112 MB/s throughput**

#### Lock-Free Data Structures (`lock_free/`)
- `LockFreeEventQueue`: MPMC queue using crossbeam ArrayQueue
  - ~10-20ns push/pop operations
  - **Performance: 41M push/sec, 3.4M pop/sec**
- `ShardedEventQueue`: Distributed queue for high contention scenarios
  - 16 shards by default for cache-line optimization
  - Batch push/pop operations
  - **Performance: 1.06M events/sec with 4 threads**
- `LockFreeMetrics`: Atomic counters for zero-contention monitoring
  - Min/max/avg latency tracking with CAS operations
  - ~5-10ns per metric update

#### Batch Processing Pipeline (`batch_processor.rs`)
- `BatchProcessor`: High-throughput ingestion combining SIMD + lock-free queues
- Configurable batch sizes (1K-50K events)
- Three configuration presets:
  - `default()`: Balanced for general use
  - `high_throughput()`: Maximum events/sec (50K batch, 10M queue)
  - `low_latency()`: Quick responses (1K batch, 100K queue)
- `RawEventData`: Efficient event deserialization struct
- **Performance: 335K events/sec (single), 726K events/sec (4 threads)**

#### Arena Memory Pooling (`arena_pool.rs`)
- Thread-local arena pools using `bumpalo` allocator
- `get_arena()`: Get recycled arena from pool (~10-20ns)
- `PooledArena`: RAII wrapper with automatic pool return
- `ScopedArena`: Convenient scoped allocation pattern
- `SizedBufferPool`: Pre-allocated buffers for specific sizes
- 99%+ arena recycle rate
- **Performance: 28.5M allocations/sec**

#### Performance Validation Tests (`performance_test.rs`)
- Comprehensive benchmark suite for all optimizations
- Tests for SIMD parsing, queue throughput, batch processing
- Concurrent performance validation
- Sustained throughput testing

### Changed
- Added `simd-json = "0.14"` dependency for SIMD acceleration
- Added `bumpalo = "3.16"` dependency for arena allocation
- Added `From<SimdJsonError>` impl for `AllSourceError`
- Clean Architecture module organization maintained

### Performance
- **Lock-Free Queue (push)**: 41.1M ops/sec
- **Arena Allocations**: 28.5M allocs/sec
- **Lock-Free Queue (pop)**: 3.4M ops/sec
- **Sharded Queue (concurrent)**: 2.5M ops/sec
- **SIMD JSON Parsing**: 824K events/sec
- **Full Pipeline (4 threads)**: 726K events/sec
- **Sustained Throughput**: 418K events/sec
- **Batch Processor**: 335K events/sec

### Technical Details
- All new modules follow Clean Architecture patterns
- Full test coverage with 57+ new tests
- Debug and release mode performance thresholds
- Thread-safe implementations with `Send + Sync`

---

## [0.5.0] - 2025-10-20

### Added

#### Schema Registry
- JSON Schema-based event validation system
- Automatic schema versioning with compatibility checking
- 4 compatibility modes: None, Backward, Forward, Full
- Subject-based schema organization
- 6 new REST API endpoints for schema management:
  - `POST /api/v1/schemas` - Register schema
  - `GET /api/v1/schemas` - List subjects
  - `GET /api/v1/schemas/:subject` - Get schema
  - `GET /api/v1/schemas/:subject/versions` - List versions
  - `POST /api/v1/schemas/validate` - Validate event
  - `PUT /api/v1/schemas/:subject/compatibility` - Set compatibility mode

#### Event Replay Engine
- Point-in-time event replay functionality
- Projection rebuilding with progress tracking
- Configurable batch processing
- Async background execution using Tokio
- Cancellable replay operations
- 5 replay statuses: Pending, Running, Completed, Failed, Cancelled
- Real-time progress metrics (events/sec, percentage complete)
- 5 new REST API endpoints for replay management:
  - `POST /api/v1/replay` - Start replay
  - `GET /api/v1/replay` - List replays
  - `GET /api/v1/replay/:replay_id` - Get progress
  - `POST /api/v1/replay/:replay_id/cancel` - Cancel replay
  - `DELETE /api/v1/replay/:replay_id` - Delete replay

#### Stream Processing Pipelines
- 6 pipeline operators:
  - **Filter**: eq, ne, gt, lt, contains operations
  - **Map**: uppercase, lowercase, trim, multiply, add transformations
  - **Reduce**: count, sum, avg, min, max aggregations with grouping
  - **Window**: tumbling, sliding, session windows for time-based aggregations
  - **Enrich**: external data lookup and enrichment (placeholder)
  - **Branch**: conditional event routing
- Stateful processing with thread-safe state management
- Window buffers with automatic time-based eviction
- Pipeline statistics tracking
- Integrated pipeline processing into event ingestion flow
- 7 new REST API endpoints for pipeline management:
  - `POST /api/v1/pipelines` - Register pipeline
  - `GET /api/v1/pipelines` - List pipelines
  - `GET /api/v1/pipelines/:pipeline_id` - Get pipeline
  - `DELETE /api/v1/pipelines/:pipeline_id` - Remove pipeline
  - `GET /api/v1/pipelines/stats` - All pipeline stats
  - `GET /api/v1/pipelines/:pipeline_id/stats` - Pipeline stats
  - `PUT /api/v1/pipelines/:pipeline_id/reset` - Reset state

### Changed
- Enhanced event ingestion flow to include pipeline processing
- Updated `ProjectionManager::get_projection()` to return cloned `Arc` instead of reference
- Improved event ingestion performance by 4-14% with pipeline integration optimizations

### Performance
- Ingestion: 442-469K events/sec (single-threaded)
- Entity query: 11.9 μs
- State reconstruction: 3.5 μs (with snapshots)
- 48 tests passing (33 unit + 15 integration)

---

## [0.2.0] - 2025-01-15

### Added

#### Persistent Storage
- Apache Parquet columnar storage for events
- Write-Ahead Log (WAL) for crash recovery and durability
- Automatic compaction with 3 strategies:
  - Size-based compaction
  - Count-based compaction
  - Age-based compaction
- Point-in-time snapshot system
- Automatic snapshot creation based on configurable thresholds

#### Real-time Streaming
- WebSocket server for real-time event broadcasting
- Client connection management
- Event subscription and filtering

#### Advanced Analytics
- Event frequency analysis with time bucketing
- Event correlation analysis
- Statistical summaries (count, avg, min, max)
- Time-window aggregations

#### API Endpoints
- 18 new REST API endpoints:
  - WebSocket: `WS /api/v1/events/stream`
  - Analytics: `/api/v1/analytics/*` (3 endpoints)
  - Snapshots: `/api/v1/snapshots/*` (3 endpoints)
  - Compaction: `/api/v1/compaction/*` (2 endpoints)

### Changed
- Event ingestion now writes to WAL first for durability
- State reconstruction optimized with snapshot fallback
- Enhanced storage architecture with multiple layers

### Performance
- 10-15% improvement in ingestion throughput with Parquet batching
- 100x faster state reconstruction with snapshots
- Zero data loss on crashes with WAL

---

## [0.1.0] - 2024-12-01

### Added

#### Core Event Store
- In-memory event storage with `Vec<Event>`
- Immutable append-only event log
- Event ID generation with UUID v7
- ISO 8601 timestamp support
- JSON payload storage

#### Indexing System
- DashMap-based concurrent indexing
- Entity ID index (O(1) lookup)
- Event type index (O(1) lookup)
- Event ID index (direct access)
- Thread-safe concurrent updates

#### Query Engine
- Query by entity ID
- Query by event type
- Time-travel queries with `as_of` parameter
- Time range filtering with `since`/`until`
- Result limiting
- Entity state reconstruction

#### Projections
- Real-time projection system
- Built-in projections:
  - EntitySnapshotProjection (current state per entity)
  - EventCounterProjection (event type statistics)
- Custom projection trait for user-defined projections

#### REST API
- 8 initial REST API endpoints:
  - `GET /health` - Health check
  - `POST /api/v1/events` - Ingest event
  - `GET /api/v1/events/query` - Query events
  - `GET /api/v1/entities/:entity_id/state` - Get entity state
  - `GET /api/v1/entities/:entity_id/snapshot` - Get snapshot
  - `GET /api/v1/stats` - System statistics

#### Error Handling
- Comprehensive error types
- HTTP status code mapping
- Type-safe error handling with `Result<T, AllSourceError>`

#### Testing
- 10+ unit tests
- 5+ integration tests
- Performance benchmarks with Criterion

### Performance
- 100K+ events/sec ingestion
- Sub-millisecond entity queries
- Concurrent read/write support

---

## [Unreleased]

### Planned for v0.9 - Production Hardening

- [ ] Dependency injection container completion (US-009)
- [ ] Storage integrity checks with checksums (US-010)
- [ ] Partition monitoring and alerting (US-011)
- [ ] 7-day continuous stress test suite (US-012)
- [ ] SIMD event filtering (US-025)

### Planned for v1.0 - Distributed & Cloud-Native

- [ ] Distributed replication (Raft consensus)
- [ ] Multi-region support
- [ ] Horizontal scaling
- [ ] Arrow Flight RPC
- [ ] Kubernetes operators
- [ ] Load balancing
- [ ] Prometheus metrics enhancements
- [ ] OpenTelemetry tracing improvements

Target: 1M+ events/sec (single node), 10M+ events/sec (distributed)

### Future Considerations

- [ ] GraphQL API
- [ ] WASM plugin system
- [ ] Change Data Capture (CDC)
- [ ] Time-series optimization
- [ ] Machine learning integrations
- [ ] Real-time anomaly detection
- [ ] Event sourcing templates
- [ ] Visual query builder

---

## Version History

| Version | Date | Status | Highlights |
|---------|------|--------|------------|
| [0.8.0] | 2026-02-03 | ✅ Current | Clean Architecture, DDD, Vector Search, Native Search |
| [0.7.0] | 2025-12-06 | ✅ Stable | Serverless support, Projection State API |
| [0.6.0] | 2025-12-05 | ✅ Stable | SIMD JSON, lock-free queues, batch processing, arena pools |
| [0.5.0] | 2025-10-20 | ✅ Stable | Schema registry, event replay, stream processing |
| [0.2.0] | 2025-01-15 | ✅ Stable | Parquet storage, WAL, snapshots, analytics |
| [0.1.0] | 2024-12-01 | ✅ Stable | Core event store, indexing, projections |

---

## Upgrade Notes

### Upgrading from 0.5.0 to 0.6.0

**Breaking Changes**: None

**New Features**: All performance optimizations are opt-in and transparent.

**Dependencies**:
- Added `simd-json = "0.14"` for SIMD-accelerated JSON parsing
- Added `bumpalo = "3.16"` for arena memory allocation

**New Modules** (all in `infrastructure::persistence`):
- `simd_json`: SIMD JSON parsing utilities
- `lock_free::sharded_queue`: High-throughput sharded queue
- `batch_processor`: Batch processing pipeline
- `arena_pool`: Thread-local arena memory pools
- `performance_test`: Performance validation tests

**Migration Steps**:
1. Update dependencies: `cargo update`
2. Rebuild: `cargo build --release`
3. Run tests: `cargo test`
4. No data migration required
5. Optionally integrate new batch processing APIs for higher throughput

**Performance Improvements**:
- JSON parsing: 2-3x faster with SIMD
- Queue operations: 10-100x faster with lock-free structures
- Memory allocation: 5-10x faster with arena pooling
- Full pipeline: 726K events/sec (up from 469K)

### Upgrading from 0.2.0 to 0.5.0

**Breaking Changes**: None

**New Features**: All new features are opt-in and don't affect existing functionality.

**Configuration**:
- New `SchemaRegistryConfig` added to `EventStoreConfig` (defaults provided)
- New managers: `ReplayManager` and `PipelineManager` (automatically initialized)

**API Changes**:
- 12 new API endpoints (all additive)
- Existing endpoints unchanged

**Migration Steps**:
1. Update dependencies: `cargo update`
2. Rebuild: `cargo build --release`
3. Run tests: `cargo test`
4. No data migration required

### Upgrading from 0.1.0 to 0.2.0

**Breaking Changes**: None

**New Features**: All new features are opt-in.

**Configuration**:
- New optional storage configuration for Parquet persistence
- New optional WAL configuration
- New optional snapshot configuration

**Migration Steps**:
1. Update dependencies
2. Rebuild application
3. Optionally configure persistent storage
4. No data migration required for in-memory mode

---

## Contributing

Please read [CONTRIBUTING.md](CONTRIBUTING.md) for details on our code of conduct and the process for submitting pull requests.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.