ruvector-profiler
Memory, power, and latency profiling hooks with CSV emitters — the observability layer for attention benchmarking.
| Dimension | What It Measures | Output |
|---|---|---|
| Memory | RSS, KV-cache, activations, temp buffers | MemoryReport + CSV |
| Power | Wattage samples, trapezoidal energy integration | EnergyResult + CSV |
| Latency | p50/p95/p99, mean, std | LatencyStats + CSV |
| Config | SHA-256 fingerprint of all parameters | Reproducibility hash |
Overview
This crate instruments benchmark runs with three profiling dimensions -- memory
pressure, energy consumption, and latency distribution -- and exports results to
CSV files for downstream analysis. It is the observability layer in the ruvector
attention benchmarking pipeline, sitting between the attention operators
(ruvector-attn-mincut) and the analysis/plotting stage.
Every benchmark run is tagged with a SHA-256 config fingerprint so that results are reproducible and auditable across machines.
Modules
| Module | Purpose |
|---|---|
memory |
MemoryTracker with RSS snapshots and peak tracking |
power |
PowerTracker with PowerSource trait and trapezoidal integration |
latency |
LatencyStats computing p50/p95/p99 from LatencyRecord samples |
csv_emitter |
write_results_csv, write_latency_csv, write_memory_csv |
config_hash |
BenchConfig with SHA-256 fingerprinting for reproducibility |
Usage Example: Full Benchmark Loop
use *;
// Tag this run with a reproducible fingerprint
let config = BenchConfig ;
println!;
// Set up trackers
let mut mem = new;
let source = MockPowerSource ;
let mut pwr = new;
let mut latencies = Vecnew;
for i in 0..1000
// Aggregate
let stats = compute_latency_stats;
let report = mem.report;
let energy = pwr.energy;
println!;
// Export to CSV
write_latency_csv.unwrap;
write_memory_csv.unwrap;
Memory Profiling
MemoryTracker captures RSS snapshots via /proc/self/status on Linux (zero
fallback on other platforms). Each MemorySnapshot records:
| Field | Description |
|---|---|
peak_rss_bytes |
Resident set size at capture time |
kv_cache_bytes |
Estimated KV-cache allocation |
activation_bytes |
Activation tensor memory |
temp_buffer_bytes |
Temporary working buffers |
timestamp_us |
Microsecond UNIX timestamp |
MemoryTracker::report() aggregates snapshots into a MemoryReport with
peak_rss, mean_rss, kv_cache_total, and activation_total.
Power Profiling
PowerTracker collects wattage readings from any PowerSource implementation.
Energy is computed via trapezoidal integration over the sample timeline, yielding
an EnergyResult with total_joules, mean_watts, peak_watts, and
duration_s. A MockPowerSource is provided for deterministic tests.
use PowerSource;
Latency Profiling
compute_latency_stats takes a slice of LatencyRecord and returns
LatencyStats with p50_us, p95_us, p99_us, mean_us, std_us, and
sample count n. Records need not be pre-sorted.
CSV Output Formats
write_results_csv -- Aggregate summary
setting,coherence_delta,kv_cache_reduction,peak_mem_reduction,energy_reduction,p95_latency_us,accuracy
mincut_l0.5_t2,-0.003,0.25,0.18,0.12,1150,0.994
write_latency_csv -- Per-sample latency
sample_id,wall_time_us,kernel_time_us,seq_len
0,850,780,128
write_memory_csv -- Per-snapshot memory
timestamp_us,peak_rss_bytes,kv_cache_bytes,activation_bytes,temp_buffer_bytes
1700000000,4194304,1048576,2097152,524288
Config Fingerprinting
BenchConfig captures all parameters defining a benchmark run. config_hash
produces a 64-character SHA-256 hex digest of the JSON-serialized config.
use ;
let config = BenchConfig ;
assert_eq!;
Integration with run_mincut_bench.sh
The scripts/run_mincut_bench.sh script orchestrates a full benchmark run:
run_mincut_bench.sh
+-- cargo build --release (-p attn-mincut, coherence, profiler)
+-- Baseline softmax run --> baseline.csv
+-- Grid search (lambda x tau) --> per-setting CSV + witness JSONL
+-- Aggregate metrics --> results.csv
+-- Pack witness bundle --> witness.rvf
CSV files follow the schemas above. Use config_hash to link results back to
their exact configuration.
Step 1: Set up config and trackers
use *;
let config = BenchConfig ;
println!;
let mut mem_tracker = new;
let power_source = MockPowerSource ;
let mut power_tracker = new;
Step 2: Run benchmark loop
let mut latencies = Vecnew;
for i in 0..1000
Step 3: Export results
let stats = compute_latency_stats;
let report = mem_tracker.report;
let energy = power_tracker.energy;
write_latency_csv.unwrap;
write_memory_csv.unwrap;
println!;
Step 4: Use the benchmark script
# Full grid search: 1000 samples x 6 settings
# Custom grid
Expected output structure
results/mincut-bench/
csv/
baseline.csv # Softmax reference
mincut_l0.3_t0.csv # Per-setting results
mincut_l0.3_t2.csv
...
results.csv # Aggregate comparison
witness/
mincut_l0.3_t0.jsonl # SHA-256 witness chains
witness.rvf # RVF-packed bundle
figs/ # Generated plots
Related Crates
| Crate | Role |
|---|---|
ruvector-attn-mincut |
Attention operators being profiled |
ruvector-coherence |
Quality metrics fed into ResultRow |
ruvector-solver |
Sublinear solvers for graph analytics |
License
Licensed under the MIT License.