embeddenator-testkit 0.21.0

Comprehensive testing utilities and performance benchmarking for embeddenator VSA operations
Documentation

Embeddenator TestKit

Comprehensive testing utilities, performance benchmarking, and integration tests for embeddenator VSA operations.

Independent component extracted from the Embeddenator monolithic repository. Part of the Embeddenator workspace.

Repository: https://github.com/tzervas/embeddenator-testkit

Features

  • Test Data Generators: Create random and deterministic sparse vectors for reproducible testing
  • Performance Metrics: Granular timing, memory tracking, and throughput measurements
  • Integrity Validation: Verify VSA operation properties and detect data corruption
  • Chaos Injection: Test resilience with bitflip injection, erasures, and noise
  • Test Fixtures: Generate synthetic datasets with various patterns and sizes
  • Test Harness: Manage temporary directories and coordinate complex test scenarios
  • Integration Tests: Cross-component testing (enabled via integration feature)

Installation

Add to your Cargo.toml:

[dev-dependencies]
embeddenator-testkit = { path = "../embeddenator-testkit" }

Usage Examples

Generate Test Vectors

use embeddenator_testkit::*;
use rand::thread_rng;

// Generate random sparse vector
let mut rng = thread_rng();
let vec = random_sparse_vec(&mut rng, 10000, 200);

// Generate deterministic vector for reproducible tests
let vec = deterministic_sparse_vec(10000, 200, 42);

Performance Measurement

use embeddenator_testkit::*;

let mut metrics = TestMetrics::new("bind_operation");

// Time an operation
metrics.start_timing();
let result = a.bind(&b);
metrics.stop_timing();

// Or use closure
let result = metrics.time_operation(|| a.bind(&b));

println!("{}", metrics.summary());

Integrity Validation

use embeddenator_testkit::*;

let validator = IntegrityValidator::new();

// Validate bitsliced vector invariants
let report = validator.validate_bitsliced(&vec);
assert!(report.is_ok());

// Validate bind commutativity
let report = validator.validate_bind_invariants(&a, &b);
println!("{}", report.summary());

Chaos Testing

use embeddenator_testkit::*;

let injector = ChaosInjector::new(42);

// Inject bitflips for resilience testing
let mut vec = vec.clone();
let flipped = injector.inject_bitflips(&mut vec, 10);

// Create corrupted copy
let corrupted = injector.corrupt_copy(&vec, 0.05); // 5% error rate

Test Dataset Generation

use embeddenator_testkit::*;

// Create test data with specific pattern
let data = create_test_data(100, TestDataPattern::Random); // 100MB

// Create test harness with automatic cleanup
let harness = TestHarness::new();
let dataset_dir = harness.create_dataset(500); // 500MB dataset

// Create specific files
let file = harness.create_file("test.bin", b"test data");

Module Overview

generators

  • random_sparse_vec() - Generate random sparse vectors
  • deterministic_sparse_vec() - Reproducible vector generation
  • sparse_dot() - Reference dot product implementation
  • generate_noise_pattern() - Synthetic noise data

metrics

  • TestMetrics - Performance measurement and statistics
  • TimingStats - Timing analysis (mean, median, percentiles)

integrity

  • IntegrityValidator - Verify VSA operation properties
  • IntegrityReport - Validation results and diagnostics

chaos

  • ChaosInjector - Inject errors for resilience testing
  • Bitflip, erasure, and corruption utilities

fixtures

  • TestDataPattern - Data pattern types
  • create_test_data() - Generate test data
  • create_test_dataset() - Multi-file test datasets

harness

  • TestHarness - Unified test management
  • Temporary directory handling
  • Performance metric collection

Migrated from Monolithic Repo

This testkit extracts and consolidates test utilities from:

  • embeddenator/src/testing/mod.rs - Performance metrics and integrity validation
  • embeddenator/tests/common/bt_migration.rs - Vector generators and helpers
  • embeddenator/tests/qa_comprehensive.rs - Test harness and dataset generation
  • Various test modules - Fixture patterns and utilities

Performance Baselines

Example measurements from v0.20.0-alpha.1 on Intel i7-14700K, 46GB RAM. Your results will vary based on hardware, data patterns, and system load.

  • Bundle: ~43ns (sparse), ~32µs (dense packed)
  • Bind: ~11ns (sparse), ~20µs (dense packed)
  • Cosine: ~7ns (sparse), ~14µs (dense packed)
  • Ingestion: ~15 MB/s (2GB dataset)
  • Extraction: ~41 MB/s (2GB dataset)

Run cargo bench to establish baselines for your system.

Testing

Run the testkit's own tests:

cargo test --manifest-path embeddenator-testkit/Cargo.toml

Run integration tests (cross-component workflows):

cargo test --manifest-path embeddenator-testkit/Cargo.toml --features integration

Run with performance output visible:

cargo test --features integration --test performance_integration -- --show-output

License

MIT