Embeddenator TestKit
Comprehensive testing utilities, performance benchmarking, and integration tests for embeddenator VSA operations.
Independent component extracted from the Embeddenator monolithic repository. Part of the Embeddenator workspace.
Repository: https://github.com/tzervas/embeddenator-testkit
Features
- Test Data Generators: Create random and deterministic sparse vectors for reproducible testing
- Performance Metrics: Granular timing, memory tracking, and throughput measurements
- Integrity Validation: Verify VSA operation properties and detect data corruption
- Chaos Injection: Test resilience with bitflip injection, erasures, and noise
- Test Fixtures: Generate synthetic datasets with various patterns and sizes
- Test Harness: Manage temporary directories and coordinate complex test scenarios
- Integration Tests: Cross-component testing (enabled via
integrationfeature)
Installation
Add to your Cargo.toml:
[]
= { = "../embeddenator-testkit" }
Usage Examples
Generate Test Vectors
use *;
use thread_rng;
// Generate random sparse vector
let mut rng = thread_rng;
let vec = random_sparse_vec;
// Generate deterministic vector for reproducible tests
let vec = deterministic_sparse_vec;
Performance Measurement
use *;
let mut metrics = new;
// Time an operation
metrics.start_timing;
let result = a.bind;
metrics.stop_timing;
// Or use closure
let result = metrics.time_operation;
println!;
Integrity Validation
use *;
let validator = new;
// Validate bitsliced vector invariants
let report = validator.validate_bitsliced;
assert!;
// Validate bind commutativity
let report = validator.validate_bind_invariants;
println!;
Chaos Testing
use *;
let injector = new;
// Inject bitflips for resilience testing
let mut vec = vec.clone;
let flipped = injector.inject_bitflips;
// Create corrupted copy
let corrupted = injector.corrupt_copy; // 5% error rate
Test Dataset Generation
use *;
// Create test data with specific pattern
let data = create_test_data; // 100MB
// Create test harness with automatic cleanup
let harness = new;
let dataset_dir = harness.create_dataset; // 500MB dataset
// Create specific files
let file = harness.create_file;
Module Overview
generators
random_sparse_vec()- Generate random sparse vectorsdeterministic_sparse_vec()- Reproducible vector generationsparse_dot()- Reference dot product implementationgenerate_noise_pattern()- Synthetic noise data
metrics
TestMetrics- Performance measurement and statisticsTimingStats- Timing analysis (mean, median, percentiles)
integrity
IntegrityValidator- Verify VSA operation propertiesIntegrityReport- Validation results and diagnostics
chaos
ChaosInjector- Inject errors for resilience testing- Bitflip, erasure, and corruption utilities
fixtures
TestDataPattern- Data pattern typescreate_test_data()- Generate test datacreate_test_dataset()- Multi-file test datasets
harness
TestHarness- Unified test management- Temporary directory handling
- Performance metric collection
Migrated from Monolithic Repo
This testkit extracts and consolidates test utilities from:
embeddenator/src/testing/mod.rs- Performance metrics and integrity validationembeddenator/tests/common/bt_migration.rs- Vector generators and helpersembeddenator/tests/qa_comprehensive.rs- Test harness and dataset generation- Various test modules - Fixture patterns and utilities
Performance Baselines
Example measurements from v0.20.0-alpha.1 on Intel i7-14700K, 46GB RAM. Your results will vary based on hardware, data patterns, and system load.
- Bundle: ~43ns (sparse), ~32µs (dense packed)
- Bind: ~11ns (sparse), ~20µs (dense packed)
- Cosine: ~7ns (sparse), ~14µs (dense packed)
- Ingestion: ~15 MB/s (2GB dataset)
- Extraction: ~41 MB/s (2GB dataset)
Run
cargo benchto establish baselines for your system.
Testing
Run the testkit's own tests:
Run integration tests (cross-component workflows):
Run with performance output visible:
License
MIT