{
"train": [
{
"message": "feat(format): APR v2 migration + trueno 0.10.1 ecosystem update (Refs PAR-001)\n\n- Migrate APR format from v1 (APRN) to v2 (APR2) magic\n- Update trueno 0.9.0 β 0.10.1 (thiserror 2.x compatibility)\n- Update renacer 0.8 β 0.9.1\n- Fix integration tests for v2 format (INT-01b, CC1)\n- Bump version to 0.20.2\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>\n",
"label": "IntegrationFailures",
"confidence": 0.75,
"commit_hash": "2187cfbcfb3dc65b80c04b88aab6dee0dc890b0f",
"author": "noah.gift@gmail.com",
"timestamp": 1767289417,
"lines_added": 344,
"lines_removed": 259,
"files_changed": 13,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "fix: Add -j2 parallelism limit to prevent OOM in test targets\n\nFive-Whys Root Cause Analysis:\n1. Why OOM? β Tests consume more memory than available\n2. Why high memory? β Multiple tests run in parallel, each allocating ML matrices\n3. Why high parallelism? β Default = num_cpus, no explicit limit\n4. Why large allocations per test? β ML library with tensors, property tests\n5. Why no limit set? β Missing -j flag to constrain parallelism\n\nChanges:\n- test-fast: Add -j 2 for nextest, --test-threads=2 for cargo test\n- test: Add -j 2 for nextest, --test-threads=2 for cargo test\n- coverage: Reduce -j 8 to -j 2 (LLVM instrumentation ~2x overhead)\n- coverage-full: Reduce -j 8 to -j 2\n\nAlso fixes clippy lints in llama_tokenizer.rs:\n- Add #[allow(dead_code)] for reserved fields (scores, pad_token_id)\n- Inline format args in format! calls\n- Change skip_value to return usize instead of Result<usize>\n- Use range patterns (4..=6) instead of OR patterns (4 | 5 | 6)\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>\n",
"label": "ASTTransform",
"confidence": 0.85,
"commit_hash": "8bd1137f3ea844185c8bfe1c072bddb9d3547ef7",
"author": "noah.gift@gmail.com",
"timestamp": 1766842377,
"lines_added": 24,
"lines_removed": 23,
"files_changed": 2,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "docs(spec): Update GQA status to FIXED, document FFN gate limitation\n\n- GQA attention: FIXED (realizar commit 0fd76d6, aprender commit 8d78335)\n - Added group_size calculation for QβKV head mapping\n - apply_rope() now GQA-aware with num_heads_in_x parameter\n - TinyLlama 1.1B (32 q_heads, 4 kv_heads) no longer panics\n\n- FFN Gate (SwiGLU): Documented as known limitation\n - OwnedQuantizedLayer missing ffn_gate_weight\n - Causes garbage output (model runs but FFN broken)\n - Documented 5-step fix plan in spec\n - Workaround: Use QuantizedGGUFTransformer\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>\n",
"label": "ASTTransform",
"confidence": 0.85,
"commit_hash": "22f776995232fda43a4613d5df68100af838937e",
"author": "noah.gift@gmail.com",
"timestamp": 1766779941,
"lines_added": 18,
"lines_removed": 7,
"files_changed": 1,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "fix(chat): Add GQA fallback and progress indicators for GGUF generation\n\n- Store model_path in ChatSession for mmap-based loading\n- Discovered GQA bug in realizar's causal_attention (panics on TinyLlama)\n- Fall back to QuantizedGGUFTransformer for GQA models (simplified attention)\n- Add clear progress indicator showing layers, hidden_dim, token limit\n- Note GQA models with simplified attention warning\n- Limit max_tokens to 16 for CPU (O(nΒ²) without KV cache)\n\nThe tokenizer now works correctly - \"Hello\" -> [15043] (single token).\nGeneration runs but output quality is limited due to simplified attention\n(no RoPE position encoding or causal mask in QuantizedGGUFTransformer).\n\nProper attention requires fixing realizar's causal_attention for GQA models\nwhere num_kv_heads < num_heads (TinyLlama: 4 kv_heads vs 32 q_heads).\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>\n",
"label": "ASTTransform",
"confidence": 0.85,
"commit_hash": "7fb24a27417302746c8d6590e2288b0a5d741543",
"author": "noah.gift@gmail.com",
"timestamp": 1766777646,
"lines_added": 27,
"lines_removed": 2,
"files_changed": 1,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "fix(tokenizer): Correct SentencePiece-style word boundary encoding\n\nThe LlamaTokenizer now properly normalizes input text for SentencePiece:\n- Prepends β to entire input\n- Replaces spaces with β for word boundaries\n- \"Hello, world!\" β \"βHelloβ,βworldβ!\" β [15043, 29892, 3186, 29991]\n\nThis fixes the double-space issue in decoded text by normalizing upfront\ninstead of checking per-character space prefixes.\n\nAlso adds integration test for TinyLlama tokenizer validation.\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>\n",
"label": "TraitBounds",
"confidence": 0.8,
"commit_hash": "364591dfd43771316bffff480f4a497ab9d64a8a",
"author": "noah.gift@gmail.com",
"timestamp": 1766776612,
"lines_added": 68,
"lines_removed": 17,
"files_changed": 2,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "fix: Resolve clippy warnings and format code\n\n- converter.rs: Collapse nested if statement\n- qwen2/mod.rs: Replace unwrap() with expect() + descriptive messages\n- gguf.rs: Remove redundant closures, combine match arms, use range\n patterns, use From traits for casts, add TensorDataMap type alias\n- Auto-format all files with cargo fmt\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>\n",
"label": "ASTTransform",
"confidence": 0.85,
"commit_hash": "ec05d20a06ae70cfb984ebdd3f95a035980f8ab3",
"author": "noah.gift@gmail.com",
"timestamp": 1766762323,
"lines_added": 627,
"lines_removed": 587,
"files_changed": 34,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "test(regularization): Add 35+ tests for coverage improvement\n\nAdded tests for:\n- StochasticDepth::mode() getter and DropMode variants\n- SpecAugment::default() and with_mask_value()\n- RandAugment apply_single for all AugmentationType variants\n- Mixup::mix_labels() and alpha edge cases\n- CutMix sample edge cases\n- Clone and Debug impls for all regularization types\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>\n",
"label": "ASTTransform",
"confidence": 0.85,
"commit_hash": "e1d4c5c601bcdc1b74515e290877c51596ef3e2d",
"author": "noah.gift@gmail.com",
"timestamp": 1766613255,
"lines_added": 243,
"lines_removed": 0,
"files_changed": 1,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "test(error): Add coverage tests for helper methods and traits\n\n- dimension_mismatch, index_out_of_bounds, empty_input helpers\n- PartialEq<&str> implementation tests\n- Error::source() for Io and non-Io variants\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>\n",
"label": "TraitBounds",
"confidence": 0.8,
"commit_hash": "daf661f41b8c67dc6a1b4caba2dad833b3da1586",
"author": "noah.gift@gmail.com",
"timestamp": 1766611057,
"lines_added": 59,
"lines_removed": 0,
"files_changed": 1,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "test(linear_model): Add 50+ coverage tests for serialization and edge cases\n\n- SafeTensors save/load tests for LinearRegression, Ridge, Lasso, ElasticNet\n- Binary save/load tests for all model types\n- Unfitted model save error handling tests\n- Getter methods (alpha, with_intercept, l1_ratio)\n- Builder pattern tests for all models\n- Error handling for empty data and dimension mismatch\n- Debug and Clone trait implementations\n- soft_threshold function edge cases\n- add_intercept_column helper function\n- Multivariate regression scenarios\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>\n",
"label": "ASTTransform",
"confidence": 0.85,
"commit_hash": "87ee5a39ebbd6a865f06da55e2a213a1f5a03b97",
"author": "noah.gift@gmail.com",
"timestamp": 1766609106,
"lines_added": 489,
"lines_removed": 0,
"files_changed": 1,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "test: Add coverage tests to reach ~94% coverage\n\n- audio/mod.rs: 15 tests for DecodedAudio and AudioError types\n- format/sharded.rs: 25 tests for ShardIndex, ShardCache, ImportConfig\n- models/qwen2/mod.rs: 30 tests for Embedding, MLP, DecoderLayer, KVCache\n- nn/normalization.rs: 35 tests for LayerNorm, BatchNorm, GroupNorm, RMSNorm\n- nn/optim.rs: Additional optimizer tests\n- nn/transformer.rs: 30 tests for MHA, GQA, RoPE, encoder/decoder layers\n- text/tokenize.rs: Additional tokenizer tests\n- time_series/mod.rs: Additional ARIMA tests\n- transfer/mod.rs: Additional transfer learning tests\n- tree/mod.rs: 45 tests for DecisionTree, RandomForest, GradientBoosting\n\nCoverage improved from 92.87% to 93.93%\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>\n",
"label": "ASTTransform",
"confidence": 0.85,
"commit_hash": "4b980592882675754e3ec9e633d74317d4f25209",
"author": "noah.gift@gmail.com",
"timestamp": 1766567527,
"lines_added": 2683,
"lines_removed": 2,
"files_changed": 11,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "fix: Address clippy warnings for pedantic lints\n\n- Use u32::from() instead of as casts for lossless conversions (safetensors.rs)\n- Use writeln! instead of write! with \\n suffix (golden.rs via qwen2/mod.rs)\n- Use map_or() instead of map().unwrap_or() (golden.rs, qwen2/mod.rs)\n- Inline format string arguments (velocity.rs, docs.rs, security.rs, qwen2/mod.rs)\n- Use array instead of vec! for fixed-size iteration (security.rs)\n- Remove unnecessary let binding before return (qwen2/mod.rs)\n- Add #[allow(clippy::struct_field_names)] for ML naming conventions (qwen2/mod.rs)\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>\n",
"label": "SecurityVulnerabilities",
"confidence": 0.9,
"commit_hash": "a9b39cf5436576566fa954b54046d26837e00a74",
"author": "noah.gift@gmail.com",
"timestamp": 1766511757,
"lines_added": 256,
"lines_removed": 93,
"files_changed": 5,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "fix(chat): Implement lazy model initialization to prevent OOM\n\nProblem:\n- Qwen2Model::new() allocated ~2.5GB random tensors\n- Then load_from_apr() allocated ~2.5GB more for loaded weights\n- Peak memory = 5GB before old tensors dropped β OOM crash\n\nSolution:\n- Add placeholder() constructors to Linear, RMSNorm, GroupedQueryAttention\n- Add Embedding::placeholder() and Qwen2DecoderLayer::placeholder()\n- Add Qwen2Model::new_uninitialized() using 1-element placeholder tensors\n- Update chat.rs to use new_uninitialized() for APR/SafeTensors loading\n- Fix weight tying: lm_head shares weights with embed_tokens\n\nPer Native Library Mandate (Spec Β§2.4):\n- All loading uses mmap via bundle::MappedFile\n- Peak memory now ~2.5GB (loaded weights only)\n- Successfully loads all 219 tensors from APR file\n\nFiles changed:\n- src/nn/linear.rs: Add Linear::placeholder()\n- src/nn/normalization.rs: Add RMSNorm::placeholder()\n- src/nn/transformer.rs: Add GroupedQueryAttention::placeholder()\n- src/models/qwen2/mod.rs: Add placeholder constructors + new_uninitialized()\n- crates/apr-cli/src/commands/chat.rs: Use new_uninitialized() for real weights\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>\n",
"label": "ASTTransform",
"confidence": 0.85,
"commit_hash": "5a1cf8e19b59e71ff654912074db1f1b7086f9a7",
"author": "noah.gift@gmail.com",
"timestamp": 1766491915,
"lines_added": 709,
"lines_removed": 172,
"files_changed": 8,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "spec(chat): Add Native Library Mandate (Zero Ad-Hoc policy)\n\nCRITICAL requirement: All implementations MUST use existing aprender\ninfrastructure instead of ad-hoc code. Ad-hoc implementations are a\nmajor bug vector (e.g., fs::read OOM vs bundle::MappedFile mmap).\n\nNew checklist items (G4-G6):\n- G4: Native I/O (MappedFile vs fs::read)\n- G5: Native Format (.apr vs raw SafeTensors)\n- G6: Native Errors (AprenderError vs String)\n\nTotal points: 150 β 155\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>\n",
"label": "ASTTransform",
"confidence": 0.85,
"commit_hash": "56986cef4ba87843980ca8d9b49e5a34e5ded40e",
"author": "noah.gift@gmail.com",
"timestamp": 1766487038,
"lines_added": 37,
"lines_removed": 10,
"files_changed": 1,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "feat(qa): Add robustness/security (N1-N20) and docs/examples (O1-O20) verification\n\nImplements 40 new QA checklist items for Sections N and O:\n\nSection N (Robustness & Security):\n- N1-N2: Fuzzing infrastructure verification\n- N3: Mutation score >80% check\n- N4-N5: Thread/memory sanitizer readiness\n- N6-N9: Panic safety, error propagation, OOM, FD leak checks\n- N10: Path traversal prevention (cross-platform)\n- N11-N13: Dependency audit, replay/timing attack resistance\n- N14-N15: XSS injection prevention, WASM sandboxing\n- N16-N20: Disk full, network timeout, golden trace, WASM32 limits, NaN/Inf handling\n\nSection O (Documentation & Examples):\n- O1-O5: Example listing and compilation verification\n- O6: Public API usage in examples\n- O7-O9: mdBook build and link validation\n- O10-O15: README, CLI help, manpages, changelog, contributing, license\n- O16-O20: Error handling, progress bars, WASM/TensorLogic/Audio docs\n\nNew files:\n- src/qa/security.rs: 26 security tests (N1-N20)\n- src/qa/docs.rs: 20 documentation tests (O1-O20)\n- examples/whisper_transcribe.rs: ASR pipeline demo\n- examples/qwen_chat.rs: Qwen2-0.5B configuration demo\n- examples/logic_family_tree.rs: TensorLogic family tree demo\n\nSpec updated to v1.13.0 (200/200 points verified).\nAll 4810 tests pass.\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>\n",
"label": "SecurityVulnerabilities",
"confidence": 0.95,
"commit_hash": "69635e362b76ac514cbe527e0f4068f3760a349a",
"author": "noah.gift@gmail.com",
"timestamp": 1766427931,
"lines_added": 2126,
"lines_removed": 56,
"files_changed": 8,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "fix(audio): Fix ALSA build errors and test conditions\n\n- Use safe CStr::from_bytes_with_nul instead of unsafe variant\n- Add Debug impl for AlsaBackend\n- Remove unused 'frames' variable\n- Use EPIPE error code -32 directly instead of nix import\n- Make tests conditional on audio-alsa feature:\n - test_list_devices_stub β only without ALSA\n - test_list_devices_alsa β only with ALSA\n - test_audio_capture_open_not_implemented β only without ALSA\n - test_audio_capture_open_alsa β only with ALSA\n\nAll 4784 tests pass with audio-alsa feature.\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>\n",
"label": "OwnershipBorrow",
"confidence": 0.85,
"commit_hash": "2c12deeb36afe85753ca6f8f8baebfc905a7b176",
"author": "noah.gift@gmail.com",
"timestamp": 1766404364,
"lines_added": 42,
"lines_removed": 10,
"files_changed": 1,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "feat(audio,format,speech): Add EOY 2025 spec tests and features (GH-129, GH-130, A11, B1-B10)\n\n- GH-129: Add actionable import error messages with parse_import_error()\n detecting 404/401/403/429 errors with fix suggestions\n- GH-130: Add MockCaptureSource and BufferCaptureSource for testing\n audio pipelines without hardware (sine, noise, impulse signals)\n- A11: Add audio clipping detection with detect_clipping(), has_nan(),\n and validate_audio() functions in mel module\n- B1-B10: Add explicit Popperian falsification tests for VAD spec\n including segment ordering, energy bounds, and duration filtering\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>\n",
"label": "TraitBounds",
"confidence": 0.8,
"commit_hash": "79df0cbfe74290d911a572a99e0c3cb3132b6c48",
"author": "noah.gift@gmail.com",
"timestamp": 1766397080,
"lines_added": 1173,
"lines_removed": 10,
"files_changed": 5,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "fix: Resolve lint warnings and fix validate exit code for corrupted files\n\n- Fixed clippy warnings in lint.rs (unnested or-patterns, map/unwrap_or)\n- Fixed ExportFormat::from_str to implement FromStr trait properly\n- Fixed MergeStrategy::from_str to implement FromStr trait properly\n- Fixed validate command to return failure exit code on corrupted files\n- Fixed IoError -> FormatError for non-existent error variant\n- Fixed std::fs -> fs unnecessary qualification\n- Added #[allow(clippy::too_many_lines)] to apr_merge\n\nAll tests pass: 4119 lib + 26 apr-cli tests\n\n(Refs #119)\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>\n",
"label": "StdlibMapping",
"confidence": 0.8,
"commit_hash": "cb6320868ba91a80d50902bff849407854745726",
"author": "noah.gift@gmail.com",
"timestamp": 1765879660,
"lines_added": 1859,
"lines_removed": 35,
"files_changed": 8,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "feat(format): Implement 100-point APR validation QA checklist (Refs APR-QA-100)\n\nImplements the Master Falsification QA Checklist from APR-SPEC.md Section 11\nwith EXTREME TDD approach - all tests written before implementation.\n\n## Validation Module (src/format/validation.rs)\n- TensorStats: Compute statistics (mean, std, min, max, NaN/Inf detection)\n- ValidationCheck: Individual check result with status and points\n- ValidationReport: Complete report with category scores and grading\n- AprValidator: Run all 100-point validation checks\n- AprHeader: Parse and validate APR file headers\n\n## 36 Unit Tests Covering:\n- Section A: Format & Structural Integrity (magic, header, version, flags)\n- Section B: Tensor Physics (NaN, Inf, LayerNorm mean/bias, zero tensors)\n- Section C: Tooling (L2 diff, merge average)\n- Section D: Conversion (roundtrip tolerance, name normalization)\n\n## CLI Updates (apr-cli)\n- Updated validate command to use new validation module\n- Added --min-score flag for CI/CD quality gates\n- Added colored output with progress bars per category\n- Shows grade (A+ to F) based on total score\n\nKey validation: Catches LayerNorm weight mean=11 bug (should be ~1.0)\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>\n",
"label": "ASTTransform",
"confidence": 0.85,
"commit_hash": "1e7691d2b501f5fdd347b63f63ed663d70171305",
"author": "noah.gift@gmail.com",
"timestamp": 1765876375,
"lines_added": 2334,
"lines_removed": 337,
"files_changed": 6,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "feat: Add apr-cli crate and improve test coverage to 92.63% (Refs #105)\n\n- Add apr-cli crate with inspect, debug, validate, diff, and tensors commands\n- Add 150+ new tests across bundle/format.rs, decomposition/ica.rs, embed/tiny.rs, format/gguf.rs, nn/optim.rs, text/vectorize.rs\n- Fix flaky latency tests by adding #[ignore] attribute\n- Add book documentation for apr-cli tool\n- Update apr_with_metadata example\n- Remove stray book/Cargo.toml that was causing workspace issues\n\nTest coverage improved from 91.48% to 92.63% line coverage\nTotal tests: 4297 passing\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>\n",
"label": "OwnershipBorrow",
"confidence": 0.85,
"commit_hash": "bfec7eda46724fde26dd32f93aee1191e9a78ba9",
"author": "noah.gift@gmail.com",
"timestamp": 1765851278,
"lines_added": 7489,
"lines_removed": 51,
"files_changed": 34,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "feat(tsp): Implement aprender-tsp sub-crate with local .apr models (Refs #80)\n\nComplete implementation of TSP solver sub-crate per docs/specifications/tsp-solver-sub-crate.md:\n\nCore Components:\n- TspInstance: Problem representation with TSPLIB/CSV parsers\n- TspSolution: Tour representation with validation\n- TspError: Comprehensive error types with actionable hints\n\nMetaheuristic Solvers (4 algorithms):\n- ACO (Ant Colony Optimization): Pheromone-based construction\n- Tabu Search: Memory-guided 2-opt local search\n- Genetic Algorithm: Order crossover + 2-opt mutation\n- Hybrid: GA exploration + Tabu refinement + ACO intensification\n\nModel Persistence:\n- .apr binary format with CRC32 checksum validation\n- Algorithm-specific parameter serialization\n- Training metadata (instances, gap, time)\n\nCLI Interface:\n- train: Train models from TSPLIB/CSV instances\n- solve: Solve instances using trained models\n- benchmark: Evaluate model quality against instances\n- info: Display model information\n\nQuality:\n- 99 tests (unit + doc)\n- Clippy clean\n- EXTREME TDD methodology\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "TypeErrors",
"confidence": 0.75,
"commit_hash": "d2b932daaafa195ded6eb27409140c5a29d87b35",
"author": "noah.gift@gmail.com",
"timestamp": 1764409304,
"lines_added": 4505,
"lines_removed": 1,
"files_changed": 14,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "test: Add property tests and coverage improvements (Refs #80)\n\nMetaheuristics property tests (6 tests, 50 cases each):\n- prop_de_produces_finite_value: DE produces valid results\n- prop_solution_within_bounds: Solutions respect search space\n- prop_history_monotonic: Convergence history is non-increasing\n- prop_sphere_nonnegative: Benchmark invariant\n- prop_binary_ga_valid_bits: Binary GA produces 0/1 values\n- prop_cmaes_positive_sigma: CMA-ES maintains stability\n\nText module coverage improvements:\n- stem.rs: Porter step 4 suffix tests\n- vectorize.rs: Builder method tests\n\n3085 tests passing in ~4s.\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "TraitBounds",
"confidence": 0.8,
"commit_hash": "2b1edf9856b8d5a237d200b58d57137c1308b56b",
"author": "noah.gift@gmail.com",
"timestamp": 1764389206,
"lines_added": 188,
"lines_removed": 0,
"files_changed": 3,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "test(aprender-shell): Add chaos resilience tests and Makefile integration (Refs #99)\n\nIntegrate renacer chaos testing framework for robustness validation:\n\nMakefile targets:\n- `make chaos-test` - Run CI-safe chaos tests via renacer\n- `make chaos-test-full` - Full chaos suite including aggressive mode\n- `make chaos-test-lite` - Lightweight tests when renacer unavailable\n\nChaos resilience tests (CLI_021):\n- Empty model file handling\n- Truncated model file handling\n- Wrong magic bytes handling\n- Very long prefix inputs (10KB)\n- Special characters (ANSI, command injection attempts)\n- Unicode edge cases (emoji, CJK, RTL, BOM, zero-width)\n- Concurrent file access (5 threads Γ 10 reads)\n- Rapid sequential calls (50 iterations)\n\nAll tests validate graceful degradation - no panics, no crashes,\nmeaningful error messages under resource pressure.\n\nCI workflow already configured in .github/workflows/ci.yml (chaos job)\nScript exists at crates/aprender-shell/scripts/chaos-baseline.sh\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "ConcurrencyBugs",
"confidence": 0.8,
"commit_hash": "3243f1f59914b958a2725a77331bdeacfaa6994b",
"author": "noah.gift@gmail.com",
"timestamp": 1764358803,
"lines_added": 281,
"lines_removed": 5,
"files_changed": 2,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "feat(text): Add subword tokenizers (BPE, WordPiece, Unigram) (Refs #103)\n\nImplement subword tokenization for LLMs.\n\nNew tokenizers:\n- BpeTokenizer: Byte Pair Encoding (GPT, LLaMA, Mistral)\n - train() from corpus with configurable vocab_size\n - encode()/decode() for token ID conversion\n - Custom special tokens support (<unk>, <s>, </s>, <pad>)\n\n- WordPieceTokenizer: BERT-style tokenization\n - train() with WordPiece scoring criterion\n - ## continuation prefix for subword units\n - Greedy longest-match-first encoding\n\n- UnigramTokenizer: SentencePiece/T5-style\n - Probabilistic subword segmentation\n - Viterbi algorithm for optimal tokenization\n - Log probability scoring with β word boundary\n\nAll tokenizers:\n- Implement Tokenizer trait for consistency\n- Full encode/decode roundtrip support\n- Zero-unwrap safety (Cloudflare-class)\n- 63 tests (comprehensive unit tests)\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "TraitBounds",
"confidence": 0.8,
"commit_hash": "34d5c4035ae05bfd1e4f69a7d1087ea2c99149ed",
"author": "noah.gift@gmail.com",
"timestamp": 1764356713,
"lines_added": 1735,
"lines_removed": 137,
"files_changed": 1,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "feat(citl): Add GNN-based error encoder with program-feedback graph (Refs maintenance)\n\nImplement GNNErrorEncoder that integrates Graph Neural Networks for\ncontext-aware error embedding generation, following Yasunaga & Liang (2020).\n\nArchitecture:\n- GCNConv β SAGE β GCNConv message passing stack\n- Program-feedback graph construction from diagnostics\n- AST token extraction with Rust tokenizer\n- Node type embeddings for heterogeneous graph\n\nKey components:\n- GNNErrorEncoder: Full encoding pipeline with 3-layer GNN\n- build_graph(): Constructs ProgramFeedbackGraph from diagnostic + source\n- encode_graph(): Applies GNN layers with mean pooling\n- NodeType/EdgeType: Graph structure representation\n\nNode types: Diagnostic, ExpectedType, FoundType, AST, Suggestion\nEdge types: Expects, Found, DiagnosticRefers, AstChild, DataFlow, ControlFlow\n\n22 comprehensive tests covering:\n- Graph building with/without type info\n- Encoding correctness and normalization\n- Similar vs different error embedding distances\n- Tokenization and feature extraction\n- Hash consistency and structure variation\n\nUses the GNN layers (GCNConv, SAGEConv) added in commit b782165.\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "ASTTransform",
"confidence": 0.85,
"commit_hash": "46c5224a5e740e2adda2f0c60ee0799dd417f5cd",
"author": "noah.gift@gmail.com",
"timestamp": 1764353337,
"lines_added": 985,
"lines_removed": 1,
"files_changed": 2,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "feat(nn): Add Graph Neural Network layers for AST/CFG analysis (Refs maintenance)\n\nImplement three foundational GNN layers for code structure analysis:\n\n- GCNConv: Graph Convolutional Network with symmetric normalization\n - Kipf & Welling (2017) architecture: D^(-1/2) A D^(-1/2) X W\n - Optional bias and self-loop addition\n - Configurable normalization\n\n- SAGEConv: GraphSAGE with multiple aggregation strategies\n - Mean, Max, Sum, LSTM aggregation methods\n - Separate neighbor and self transformations\n - Hamilton et al. (2017) inductive learning\n\n- GATConv: Graph Attention Network with multi-head attention\n - VeliΔkoviΔ et al. (2018) attention mechanism\n - LeakyReLU activation with configurable negative slope\n - Concatenation or averaging of attention heads\n\nSupporting infrastructure:\n- AdjacencyMatrix: COO format sparse graph representation\n- MessagePassing trait for extensible GNN architectures\n- 39 comprehensive tests covering all layer types\n\nThese layers enable AST/CFG-based code analysis for CITL error\nclassification, supporting the error prediction pipeline.\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "ASTTransform",
"confidence": 0.90000004,
"commit_hash": "b7821653328543562a971fd2b478f7a766e13337",
"author": "noah.gift@gmail.com",
"timestamp": 1764351892,
"lines_added": 1535,
"lines_removed": 0,
"files_changed": 2,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "fix(tree): Add feature count validation to prevent index out of bounds\n\n- Add n_features field to DecisionTreeClassifier\n- Validate feature count in predict() with clear error message\n- Set n_features during fit() for trained models\n- Use serde(default) for backward compatibility\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "TraitBounds",
"confidence": 0.8,
"commit_hash": "3fac48ea692c87eae365c14ce578d61a24e645db",
"author": "noah.gift@gmail.com",
"timestamp": 1764340175,
"lines_added": 22,
"lines_removed": 3,
"files_changed": 1,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "feat(citl): Add Compiler-in-the-Loop Learning module with neural encoder\n\n- Implement neural encoder using Transformer architecture for error embeddings\n- Add contrastive learning with InfoNCE loss for semantic similarity\n- Create pattern library with 21 error templates (E0308, E0382, E0502, etc.)\n- Build iterative fix loop with metrics tracking and confidence scoring\n- Add comprehensive test suite (197 tests) covering all components\n- Include criterion benchmarks for encoder and pattern matching\n- Write book chapter with usage examples and architecture overview\n- Fix 23 broken documentation links across book chapters\n- Add required-features for shell_encryption_demo example\n\nNote: Example clippy warnings (format strings) excluded from this commit.\nLibrary code passes all clippy checks with -D warnings.\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "ASTTransform",
"confidence": 0.90000004,
"commit_hash": "8dd1b66dc053c928ebb1faa1c8fe8626b8873215",
"author": "noah.gift@gmail.com",
"timestamp": 1764279922,
"lines_added": 13220,
"lines_removed": 44,
"files_changed": 48,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "ci: Add Makefile lint to catch missing --all-features\n\nPrevents the recurring bug where coverage shows 0% because\n--all-features gets accidentally removed from the coverage target.\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "OwnershipBorrow",
"confidence": 0.85,
"commit_hash": "fa21ec33794044e0c2d0a3b665ae022b16e32455",
"author": "noah.gift@gmail.com",
"timestamp": 1764260639,
"lines_added": 15,
"lines_removed": 0,
"files_changed": 1,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "fix: Resolve CI failures (clippy, tests)\n\n- hf_hub/mod.rs: Replace unwrap() with expect() (disallowed-methods)\n- hf_hub/mod.rs: Use char '.' instead of string \".\" (single_char_pattern)\n- stopwords.rs: Remove redundant is_empty check (const_is_empty)\n- format/mod.rs: Fix large file tests using Compression::None and unique values\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "OwnershipBorrow",
"confidence": 0.85,
"commit_hash": "aca7b16b7d5232f395953a69217c56532f2cab94",
"author": "noah.gift@gmail.com",
"timestamp": 1764260218,
"lines_added": 17,
"lines_removed": 15,
"files_changed": 3,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "fix: Resolve trivial cast lint and update hero image for v0.11\n\n- Fix trivial cast lint error in mmap.rs:611 that broke CI\n- Update hero image: 17 β 18 model types (MoE added)\n- Update hero image version: v0.9 β v0.11\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "ASTTransform",
"confidence": 0.85,
"commit_hash": "413093f47ff4f5e25bb3d9e01f83e81423790adf",
"author": "noah.gift@gmail.com",
"timestamp": 1764259731,
"lines_added": 7,
"lines_removed": 6,
"files_changed": 2,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "feat: Add Mixture of Experts ensemble and future ML specs\n\n- Add src/ensemble/ module with MoE, SoftmaxGating, MoeConfig\n- Add ModelType::MixtureOfExperts (0x0040) to format\n- Add examples/mixture_of_experts.rs runnable example\n- Add book/src/examples/mixture-of-experts.md documentation\n- Update model-format.md with MoE section and model type\n- Fix Makefile coverage (move config before clean for sccache)\n- Add docs/specifications/more-learning-specs.md (34 sections)\n - GAN, VAE, Diffusion, Contrastive, GNN, Meta-learning\n - Transfer learning for transpiler ecosystem\n - Distillation ingestion from entrenar\n - Code-specific ML for depyler oracle\n\nRefs #101\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "ASTTransform",
"confidence": 0.85,
"commit_hash": "940c86f8fd32b42f5bc6673bcc7908e67aa55c6e",
"author": "noah.gift@gmail.com",
"timestamp": 1764258298,
"lines_added": 2158,
"lines_removed": 3,
"files_changed": 12,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "docs: Address code review feedback on AutoML Synthetic Data spec (Refs #74)\n\nResolves all 3 action items from Gemini review (Toyota/NASA/Startup personas):\n\n[NASA] Sandbox V&V for Code Translation:\n- Added SandboxExecutor to CodeTranslationGenerator\n- quality_score() now tests functional correctness (40% weight)\n- Addresses Codex hallucination issue (compiles != correct)\n\n[Toyota] Andon Mechanism (Jidoka):\n- Added AndonHandler trait with DefaultAndon implementation\n- Halts pipeline if rejection rate >90%\n- Alerts on quality drift below baseline\n\n[Startup] Decoupled Roadmap:\n- Shell SLM: v0.14.0 (MVP - tractable structured prediction)\n- Code Oracle: v0.15.0 (experimental - AI-Complete)\n- Added EXPERIMENTAL warning to CodeTranslationGenerator\n\nUpdated risk matrix with 3 new mitigations.\nSpec version bumped to 1.1.0.\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "ComprehensionBugs",
"confidence": 0.8,
"commit_hash": "a99a69f7200181115375b3305c0d3e5f5fe6a976",
"author": "noah.gift@gmail.com",
"timestamp": 1764183380,
"lines_added": 84,
"lines_removed": 12,
"files_changed": 2,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "test(integration): Add 6 property-based integration tests for combined features\n\n- Full metadata stack (name + description + training) roundtrip\n- All model types (7 variants) roundtrip correctly\n- Large model (1000-5000 floats) stress test with data integrity\n- Distillation + License combined feature persistence\n- Overwrite preserves latest model\n- File size scales with data size\n\nEXTREME TDD: Integration tests verify combined feature correctness.\nTotal property tests: 6 (integration) + 8 (errors) + 7 (metadata) + 9 (license) + 10 (distillation) + 11 (security) + 23 (format) + 20 (quantize) + 19 (gguf) = 113\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "SecurityVulnerabilities",
"confidence": 0.9,
"commit_hash": "a9ca2bcc054de095b9b82dac532f2cc3fa85104b",
"author": "noah.gift@gmail.com",
"timestamp": 1764164547,
"lines_added": 243,
"lines_removed": 0,
"files_changed": 1,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "test(error): Add 8 property-based tests for error handling/robustness\n\n- Invalid magic bytes rejected\n- Truncated header (< 32 bytes) rejected\n- Invalid model type byte rejected\n- Invalid compression byte rejected\n- CRC mismatch detection on corrupted files\n- Empty file rejected\n- Random bytes rejected\n- Format version matches FORMAT_VERSION constant\n\nEXTREME TDD: Error handling now has property coverage for robustness.\nTotal property tests: 8 (errors) + 7 (metadata) + 9 (license) + 10 (distillation) + 11 (security) + 23 (format) + 20 (quantize) + 19 (gguf) = 107\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "SecurityVulnerabilities",
"confidence": 0.9,
"commit_hash": "19f15ba0e92b27651e254d1869840b823c3fbea1",
"author": "noah.gift@gmail.com",
"timestamp": 1764164413,
"lines_added": 178,
"lines_removed": 0,
"files_changed": 1,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "perf: Fix slow test targets (encryption tests 60s β 1.6s)\n\nPROBLEM:\n- Encryption property tests took 60+ seconds each due to Argon2id\n- `make coverage` and `make test` were unusably slow for development\n- Violated Toyota Way fast feedback principle\n\nSOLUTION:\n1. Reduce encryption proptest cases from 256 to 3 (unit tests cover functionality)\n2. Add performance-optimized test targets to Makefile:\n - `make test-fast`: <30s (unit tests, no slow features)\n - `make test`: <2min (all tests)\n - `make test-full`: comprehensive (all features, full property cases)\n - `make coverage-fast`: <5min (skip slow encryption features)\n - `make coverage`: alias to coverage-fast for dev workflow\n - `make coverage-full`: all features (CI only)\n\nRESULTS:\n- Fast tests: 2 seconds\n- Encryption property tests: 1.65s (was 60s+ per test)\n- Development workflow now matches bashrs pattern\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "ASTTransform",
"confidence": 0.85,
"commit_hash": "3a75e9dd453f189d60ca2561e28f5b36385395c2",
"author": "noah.gift@gmail.com",
"timestamp": 1764163390,
"lines_added": 384,
"lines_removed": 10,
"files_changed": 2,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "test(quantize): Add 20 property-based tests for quantization (Β§6.2)\n\nEXTREME TDD property tests covering:\n\nQuantType (3 tests):\n- prop_quant_type_roundtrip: Q8_0/Q4_0 roundtrip via u8\n- prop_invalid_quant_type_none: Invalid values return None\n- prop_bits_per_weight_positive: Always positive for valid types\n\nQ8_0 Quantization (6 tests):\n- prop_q8_0_preserves_count: Element count preserved through roundtrip\n- prop_q8_0_block_count: Block count is ceiling division\n- prop_q8_0_size_matches_blocks: Quantized size = blocks Γ Q8_0_BLOCK_BYTES\n- prop_q8_0_error_bounded: MSE < 1% for normalized data\n- prop_q8_0_zeros: Zero values stay approximately zero\n- prop_q8_0_compression_ratio: ~3.76x for full blocks\n\nQ4_0 Quantization (5 tests):\n- prop_q4_0_preserves_count: Element count preserved\n- prop_q4_0_block_count: Block count is ceiling division\n- prop_q4_0_size_matches_blocks: Quantized size = blocks Γ Q4_0_BLOCK_BYTES\n- prop_q4_0_compression_ratio: ~7.1x for full blocks\n- prop_q4_0_zeros: Zero values stay approximately zero\n\nCross-Quantizer (3 tests):\n- prop_shape_preserved: Shape preserved through quantization\n- prop_num_elements: num_elements matches shape product\n- prop_original_size_bytes: 4x num_elements\n\nMSE Helper (3 tests):\n- prop_mse_identical: MSE(a, a) = 0\n- prop_mse_symmetric: MSE(a, b) = MSE(b, a)\n- prop_mse_nonnegative: MSE >= 0\n\nTotal quantize tests: 40 (20 unit + 20 property)\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "TraitBounds",
"confidence": 0.8,
"commit_hash": "abd681125a9aee259f871b8521d596a9e5ee29b1",
"author": "noah.gift@gmail.com",
"timestamp": 1764162767,
"lines_added": 289,
"lines_removed": 0,
"files_changed": 1,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "fix: Eliminate clippy warnings in library code (GH-41)\n\nReplace unwrap() with expect() for descriptive error messages:\n- autograd/ops.rs: gradient test assertions\n- autograd/tensor.rs: gradient accumulation tests\n- format/mod.rs: error assertion tests (expect_err)\n- nn/container.rs: module access tests\n\nFix format strings with inline variables:\n- autograd/ops.rs: assertion messages\n- nn/activation.rs: softmax sum assertion\n- nn/dropout.rs: channel value assertions\n- nn/init.rs: bound check assertions\n\nUse RangeInclusive::contains() instead of manual bounds:\n- nn/activation.rs: tanh bounds, softmax probability bounds\n- nn/init.rs: xavier uniform bounds\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "TraitBounds",
"confidence": 0.8,
"commit_hash": "961cf7f858102ef9ed3af1267b957524b1c5524c",
"author": "noah.gift@gmail.com",
"timestamp": 1764152221,
"lines_added": 260,
"lines_removed": 46,
"files_changed": 8,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "feat(format): Add X25519 recipient encryption + spec v1.5.0\n\nEncryption:\n- Add save_for_recipient() and load_as_recipient() for asymmetric encryption\n- X25519 ECDH key agreement + HKDF-SHA256 + AES-256-GCM\n- Forward secrecy via ephemeral sender keys\n- 4 new tests for X25519 roundtrip, wrong key, cross-mode rejection\n\nSpec v1.5.0 (merged with review feedback):\n- Β§1.0: WASM compatibility as HARD REQUIREMENT (spec gate)\n- Β§4: Model Cards [Mitchell2019] for standardized reporting\n- Β§5.1: Andon error protocols (stop-the-line on failure)\n- Β§5.3: X25519 recipient encryption details\n- Β§6.2: Quantization flag (Bit 5) [Jacob2018]\n- Ecosystem coordination with alimentar .ald format\n- WASM appendix with CI integration template\n\nBook:\n- Updated model-format.md with WASM patterns, X25519 examples\n- Added ecosystem integration diagram\n- Feature flags table with WASM status\n\nDependencies:\n- x25519-dalek 2.0 (static_secrets)\n- hkdf 0.12, sha2 0.10 (for HKDF-SHA256)\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "IntegrationFailures",
"confidence": 0.75,
"commit_hash": "11b45289599640e5977230ee1931244112aa0895",
"author": "noah.gift@gmail.com",
"timestamp": 1764150922,
"lines_added": 1046,
"lines_removed": 516,
"files_changed": 5,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "fix: Correct static vs instance method calls in text entities and summarize\n\n- Fix EntityExtractor::extract_urls/mentions/hashtags/capitalized_words to use proper static syntax\n- Fix EntityExtractor::extract_emails to use static syntax (no &self parameter)\n- Remove incorrect ? operator from hybrid_scores call in summarize.rs\n\nThese fixes resolve compilation errors where Self:: was used instead of EntityExtractor::\nfor truly static methods, and where instance method syntax was used for static methods.\n",
"label": "OwnershipBorrow",
"confidence": 0.85,
"commit_hash": "71f3e0ebae63224cec5b7a0b266b5596fe033536",
"author": "noah.gift@gmail.com",
"timestamp": 1764021465,
"lines_added": 1989,
"lines_removed": 6,
"files_changed": 12,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "feat: Add topic modeling (LDA) and sentiment analysis\n\nImplemented advanced NLP techniques for document understanding:\n\n**Sentiment Analysis (src/text/sentiment.rs - 380 LOC):**\n- Lexicon-based sentiment scoring\n- 65-word default dictionary (positive/negative/intensifiers)\n- Polarity classification (Positive/Negative/Neutral)\n- Customizable lexicons and thresholds\n- Normalized scores by document length\n- 6 unit tests passing\n\n**Topic Modeling (src/text/topic.rs - 410 LOC):**\n- Latent Dirichlet Allocation (LDA)\n- Simplified variational inference algorithm\n- Document-topic distribution (mixture of topics per doc)\n- Topic-word distribution (word probabilities per topic)\n- Top words extraction for topic interpretation\n- Reproducible with random seed\n- 2 unit tests passing\n\n**Combined Example (examples/topic_sentiment_analysis.rs):**\n- Example 1: Sentiment analysis on product reviews\n - 5 reviews with positive/negative/neutral classification\n - Score distribution analysis\n- Example 2: Topic modeling on electronics reviews\n - 3 topics discovered from 6 documents\n - Top 5 words per topic\n - Document-topic distribution visualization\n- Example 3: Combined topic + sentiment analysis\n - Topic discovery + sentiment scoring\n - Topic-sentiment correlation analysis\n - Actionable insights for product improvement\n\n**Key Features:**\n- Zero unwrap() calls (Cloudflare-class safety)\n- Result-based error handling\n- Integration with existing text pipeline\n- Matrix<f64> compatibility\n\n**Text Module Stats:**\n- 74 tests passing (62 preprocessing + 4 vectorizers + 6 sentiment + 2 topic)\n- Comprehensive NLP toolkit ready for production\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "IntegrationFailures",
"confidence": 0.75,
"commit_hash": "dd597dbd77ad62e06f59b1797ea103bbd2975c32",
"author": "noah.gift@gmail.com",
"timestamp": 1764019203,
"lines_added": 1116,
"lines_removed": 0,
"files_changed": 4,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "feat: Implement TF-IDF and text classification (Refs #70)\n\nAdded complete text vectorization and classification pipeline:\n\n**Vectorization (src/text/vectorize.rs):**\n- CountVectorizer: Bag of Words representation\n- TfidfVectorizer: TF-IDF weighted features\n- Vocabulary building and document transformation\n- 4 unit tests passing\n\n**Text Classification Example:**\n- Example 1: Sentiment analysis with Bag of Words + Gaussian NB\n- Example 2: Topic classification with TF-IDF + Logistic Regression\n- Example 3: Full preprocessing pipeline (tokenize β stop words β stem β TF-IDF β classify)\n- 100% accuracy on demo datasets\n\n**Book Chapter:**\n- TF-IDF theory and formulas\n- CountVectorizer vs TfidfVectorizer comparison\n- Gaussian Naive Bayes vs Logistic Regression\n- Complete code examples with explanations\n- Best practices and real-world applications\n\n**Key Features:**\n- Zero unwrap() calls (Cloudflare-class safety)\n- Result-based error handling\n- Integration with existing text preprocessing\n- Supports max_features vocabulary limiting\n- Matrix<f64> output compatible with classifiers\n\nUnblocks NLP-004 in defect-classifier project for GitHub commit history classification.\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "ASTTransform",
"confidence": 0.85,
"commit_hash": "c74dcf4b16f68dd6fd1fb0f2de2c95771f678576",
"author": "noah.gift@gmail.com",
"timestamp": 1764018699,
"lines_added": 1507,
"lines_removed": 0,
"files_changed": 5,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "feat(time_series): Implement ARIMA time series forecasting model (Refs Implement ARIMA (Auto-Regressive Integrated Moving Average))\n\n- Add time_series module with ARIMA(p,d,q) implementation\n- Support auto-regressive (AR), integrated (I), and moving average (MA) components\n- Implement differencing for stationarity\n- Implement forecast() for multi-step ahead predictions\n- Add 11 unit tests + 8 doctests (19 tests total)\n- Zero unwrap() calls (Cloudflare-class safety)\n- Result-based error handling with AprenderError\n\nARIMA features:\n- AR parameter estimation using Yule-Walker equations\n- Differencing up to order d\n- Integration (reverse differencing) for forecasting\n- Intercept/constant term estimation\n- Residual calculation for MA component\n\nRefs: Box-Jenkins (1976), Hyndman-Athanasopoulos (2018)\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "ASTTransform",
"confidence": 0.85,
"commit_hash": "6660d47c2318a750e54279d0fdf5767eacba8320",
"author": "noah.gift@gmail.com",
"timestamp": 1764016849,
"lines_added": 524,
"lines_removed": 0,
"files_changed": 2,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "docs(specs): Add comprehensive NLP models and techniques specification (Refs docs-nlp-spec)\n\nAdd 55KB specification covering text classification, feature extraction (TF-IDF,\nn-grams, word embeddings, transformers), sklearn best practices, UC Berkeley NLP\nprinciples, and implementation roadmap. Includes 15 peer-reviewed academic\nreferences from top SE/ML conferences to guide future NLP integration in aprender.\n\nAlso fix all clippy warnings across codebase:\n- Add #[allow(non_snake_case)] for mathematical matrix notation\n- Replace .max().min() with .clamp()\n- Use std::f32::consts::FRAC_1_SQRT_2 instead of approximation\n- Fix loop indexing with enumerate()\n- Add #[allow(clippy::too_many_lines)] for example functions\n- Fix unused return values with let _\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "ASTTransform",
"confidence": 0.85,
"commit_hash": "8f785f7826b4c80addbe8b4cf6da11198879154e",
"author": "noah.gift@gmail.com",
"timestamp": 1764014569,
"lines_added": 1905,
"lines_removed": 139,
"files_changed": 6,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "feat(optim): Implement Projected Gradient Descent for constrained optimization\n\nAdd comprehensive Projected Gradient Descent optimizer for Phase 3:\nConstrained Optimization.\n\n**Implementation** (266 lines):\n- Algorithm: x_{k+1} = P_C(x_k - Ξ±βf(x_k))\n- Supports any convex set C with efficient projection operator\n- Optional backtracking line search for adaptive step size\n- O(1/k) convergence for convex objectives\n- Linear convergence for strongly convex objectives\n\n**Key Features**:\n- Flexible projection operator (user-defined)\n- Backtracking line search via `with_line_search(beta)`\n- Convergence tracking with gradient norm computation\n- Comprehensive error handling and status reporting\n\n**Supported Constraints**:\n- Non-negative: x β₯ 0 (via prox::nonnegative)\n- Box constraints: l β€ x β€ u (via prox::project_box)\n- L2 ball: βxββ β€ r (via prox::project_l2_ball)\n- Any convex set with projection operator\n\n**Tests** (8 comprehensive tests):\n1. test_projected_gd_nonnegative_constraint - Non-negative projection\n2. test_projected_gd_box_constraints - Box-constrained optimization\n3. test_projected_gd_l2_ball - L2 ball projection (βxββ β€ 1)\n4. test_projected_gd_with_line_search - Adaptive step sizing\n5. test_projected_gd_quadratic - Constrained quadratic programming\n6. test_projected_gd_convergence_tracking - Status and metrics\n7. test_projected_gd_max_iterations - MaxIterations status\n8. test_projected_gd_unconstrained_equivalent - Identity projection\n\nAll tests passing: analytical solution verification, constraint satisfaction,\nconvergence tracking, and line search validation.\n\n**Applications**:\n- Non-negative least squares\n- Portfolio optimization with bounds\n- Image denoising with positivity constraints\n- Sparse coding with L2 ball constraints\n\nTotal: 266 lines implementation + 273 lines tests = 539 lines\nPhase 3: Constrained Optimization - 1/3 complete\n",
"label": "ASTTransform",
"confidence": 0.85,
"commit_hash": "10afc22bd568faeddccd1e68fb1228b9702ceb41",
"author": "noreply@anthropic.com",
"timestamp": 1763917802,
"lines_added": 544,
"lines_removed": 0,
"files_changed": 1,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
}
],
"validation": [
{
"message": "feat(optim): Implement Coordinate Descent for high-dimensional optimization\n\nAdds Coordinate Descent - the industry standard for Lasso regression\n(scikit-learn's default) and high-dimensional problems where n β« m.\n\n**Coordinate Descent Implementation (244 lines)**:\n- Optimizes one coordinate at a time (O(n) vs O(nΒ²) for full Hessian)\n- Cyclic and randomized coordinate selection\n- Simple API: user provides coordinate update function\n- No line search needed for many problems (e.g., Lasso)\n- Cache-friendly memory access patterns\n\n**Key Advantages**:\n- Much faster than full gradient when n β« m (thousands of features)\n- Handles non-differentiable objectives (L1, box constraints)\n- Closed-form coordinate updates for common problems\n- Proven workhorse: scikit-learn default for Lasso/ElasticNet\n\n**Use Cases**:\n- Lasso regression: O(n) soft-thresholding per coordinate\n- Elastic Net: L1 + L2 regularization\n- SVM: Sequential Minimal Optimization (SMO) variant\n- High-dimensional statistics: genomics, text, recommendation systems\n\n**Tests (11 new, all passing)**:\n- Simple quadratic (closed-form solution)\n- Soft-thresholding (Lasso-style)\n- Box projection (constrained optimization)\n- Alternating minimization\n- Max iterations, convergence tracking\n- Multidimensional (5D), immediate convergence\n\n**Test Results**: 1133 total tests pass (1122 existing + 11 new)\n\n**Algorithm**:\n```\nfor k = 1, 2, ..., max_iter:\n for i = 1, 2, ..., n (cyclic or random):\n xα΅’ β argmin f(xβ, ..., xα΅’ββ, xα΅’, xα΅’ββ, ..., xβ)\n```\n\n**Flexible API Design**:\n```rust\n// User provides coordinate update closure\nlet update = |x: &mut Vector<f32>, i: usize| {\n x[i] = closed_form_solution(i); // Problem-specific\n};\n\nlet result = coord_descent.minimize(update, x0);\n```\n\n**Toyota Way Compliance**:\n- Zero unwrap() calls\n- Proper convergence detection\n- Comprehensive error handling\n- Clean separation of concerns\n\n**References**:\n- Wright (2015). \"Coordinate descent algorithms.\" Math Programming.\n- Friedman et al. (2010). \"Regularization paths for GLMs via coordinate descent.\"\n\nPhase 2 Progress: FISTA β, Coordinate Descent β\nNext: ADMM (distributed optimization) or comprehensive examples\n",
"label": "ASTTransform",
"confidence": 0.85,
"commit_hash": "35c27f5796955084ec46f9808109e34d07a01918",
"author": "noreply@anthropic.com",
"timestamp": 1763915785,
"lines_added": 441,
"lines_removed": 0,
"files_changed": 1,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "feat(optim): Implement FISTA and proximal operators for convex optimization\n\nBegins Phase 2 (Convex Optimization) with FISTA (Fast Iterative Shrinkage-\nThresholding Algorithm) - the gold-standard for L1-regularized problems.\n\n**FISTA Implementation (374 lines)**:\n- Nesterov-accelerated proximal gradient method\n- O(1/kΒ²) convergence vs O(1/k) for standard proximal gradient\n- Composite optimization: minimize f(x) + g(x) where f smooth, g simple\n- Toyota Way: Proper error handling, no unwrap(), clean convergence tracking\n\n**Proximal Operators Module (180 lines)**:\n- soft_threshold: L1 regularization (Lasso, compressed sensing)\n- nonnegative: Non-negative least squares\n- project_l2_ball: Constrained optimization with radius constraints\n- project_box: Box constraints (element-wise bounds)\n\n**Use Cases Enabled**:\n- Lasso regression: sparse linear models with L1 penalty\n- Elastic Net: combined L1 + L2 regularization\n- Non-negative matrix factorization\n- Compressed sensing / sparse signal recovery\n- Constrained optimization (box, ball, simplex)\n\n**Tests (15 new, all passing)**:\n- 7 proximal operator tests (soft-threshold, projections)\n- 8 FISTA tests (L1-regularized, constrained, convergence)\n- Test coverage: edge cases, analytical solutions, convergence tracking\n\n**Test Results**: 1122 total tests pass (1107 existing + 15 new)\n\n**Key Features**:\n- Multiple proximal operators for different regularization types\n- Automatic Nesterov acceleration (no manual tuning)\n- Works with any smooth + simple composite objective\n- Comprehensive docstrings with mathematical formulations\n- All proximal operators have O(n) complexity\n\n**Mathematical Foundation**:\n```\nFISTA minimizes: f(x) + g(x)\nwhere:\n f: smooth (has Lipschitz continuous gradient)\n g: convex, \"simple\" (easy proximal operator)\n\nProximal operator: prox_g(v) = argmin_x { g(x) + Β½βx - vβΒ² }\n```\n\n**References**:\n- Beck & Teboulle (2009). \"A fast iterative shrinkage-thresholding\n algorithm for linear inverse problems.\" SIAM J. Imaging Sciences.\n\nNext: ADMM (distributed optimization), Coordinate Descent\n",
"label": "ASTTransform",
"confidence": 0.85,
"commit_hash": "4d47b671d7a28e17953697e70470963e68b26f92",
"author": "noreply@anthropic.com",
"timestamp": 1763914267,
"lines_added": 651,
"lines_removed": 0,
"files_changed": 1,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "feat(optim): Implement L-BFGS optimizer with unified trait architecture\n\nAdd comprehensive batch optimization infrastructure to support quasi-Newton\nmethods alongside existing stochastic optimizers (SGD, Adam).\n\n## Core Architecture Changes\n\n- **OptimizationResult**: Result type with solution, convergence status,\n diagnostics (gradient norm, iterations, elapsed time)\n- **ConvergenceStatus**: Enum tracking optimizer state (Converged,\n MaxIterations, Stalled, NumericalError, Running, UserTerminated)\n- **Unified Optimizer trait**: Supports both `step()` for stochastic and\n `minimize()` for batch optimization with compile-time enforcement\n\n## Line Search Implementations\n\n- **BacktrackingLineSearch**: Armijo condition for sufficient decrease\n- **WolfeLineSearch**: Armijo + curvature conditions for robust step sizing\n- **LineSearch trait**: Unified interface for line search strategies\n\n## L-BFGS Optimizer\n\n- Limited-memory BFGS with two-loop recursion algorithm\n- History size m (5-20) for memory-efficient quasi-Newton approximation\n- Wolfe line search for robust convergence\n- Handles convergence detection, numerical errors, stalled progress\n- Comprehensive gradient norm tracking and timing\n\n## Test Coverage\n\n- 58 new tests added (line search: 19, L-BFGS: 12, core types: 3)\n- Test coverage includes:\n - Simple quadratic, multidimensional, Rosenbrock, sphere functions\n - Different history sizes, max iterations, numerical stability\n - Error handling (NaN detection, stalled progress)\n - Reset functionality and state management\n\n## Technical Highlights\n\n- Zero external dependencies beyond existing trueno\n- Type-safe separation: L-BFGS can't use `step()`, SGD/Adam can't use `minimize()`\n- Memory-efficient: O(mn) storage for L-BFGS history (vs O(nΒ²) for full BFGS)\n- Production-ready error handling and diagnostics\n\nAll 1065 tests pass. Implements Phase 1 priorities from optimization spec v2.0.\n",
"label": "ASTTransform",
"confidence": 0.85,
"commit_hash": "498accc05e10f5fc49b468974342c87a5f5da09c",
"author": "noreply@anthropic.com",
"timestamp": 1763908102,
"lines_added": 1345,
"lines_removed": 28,
"files_changed": 1,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "feat(decomposition): Implement Independent Component Analysis (ICA) (Refs Implement Independent Component Analysis (ICA))\n\nAdds comprehensive ICA implementation using FastICA algorithm:\n- FastICA algorithm with deflation approach\n- Centering and whitening preprocessing\n- Eigendecomposition via power iteration\n- Tanh nonlinearity for optimization\n\nMathematical Components:\n- Centering: X_centered = X - mean(X)\n- Whitening: X_white = X_centered * V * Ξ^(-1/2)\n- FastICA: Iterative optimization using negentropy\n- Deflation: Extract components one by one with orthogonalization\n\nImplementation Details:\n- Power iteration for eigenvalue/eigenvector computation\n- ZCA whitening for decorrelation\n- Fixed-point iteration with tanh nonlinearity\n- Gram-Schmidt orthogonalization for deflation\n\nTest Coverage:\n- 7/7 tests passing\n- Basic ICA fitting and transformation\n- Invalid n_components handling\n- Transform before fit error\n- Dimension mismatch detection\n- Centering verification\n- Power iteration convergence\n- Builder pattern with options\n\nQuality Gates:\n- cargo fmt: Clean\n- cargo clippy -D warnings: Clean\n- All tests passing\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "ASTTransform",
"confidence": 0.90000004,
"commit_hash": "955874f6abe8f9e54abafb44e7d2660444f5a1ca",
"author": "noah.gift@gmail.com",
"timestamp": 1763836538,
"lines_added": 634,
"lines_removed": 0,
"files_changed": 3,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "chore: Restore PMAT compliance (Andon Cord pull)\n\nApplied pmat prompt comply to fix all drift from quality processes.\n\n## Compliance Fixes Applied\n\nβ
**Created .pmat Infrastructure**\n- Created .pmat/ directory for PMAT configuration\n- Generated TDG baseline with git context (.pmat/baseline.json)\n- Baseline: commit 0b759b0 on main, 6 uncommitted files\n\nβ
**Fixed Clippy Configuration Conflict**\n- Merged clippy.toml into .clippy.toml\n- Removed duplicate clippy.toml file\n- Result: Zero clippy warnings (was 1)\n- Preserved unwrap() disallowed-methods enforcement\n\nβ
**Verified Security & Quality**\n- cargo audit: 0 vulnerabilities (1 allowed warning: paste unmaintained)\n- All 959 unit tests + 161 doctests passing\n- rust-project-score: 152.0/134 (113.4%) - Grade A+\n- Test coverage: 96.94% line coverage (target: β₯95%)\n\n## Quality Gates Status\n\nβ
PASSING (9/12):\n- Pre-commit hooks active with quality enforcement\n- Zero branching (on main)\n- Zero SATD comments in src/\n- All tests passing (959 + 161)\n- Zero clippy warnings\n- Roadmap synchronized\n- Documentation 100% (15/15)\n- Security audit clean (0 vulnerabilities)\n- Test coverage 96.94%\n\nβ οΈ DOCUMENTED EXCEPTIONS (3/12):\n- TDG baseline created but empty (0 files analyzed) - technical limitation\n- Pre-push hook not needed (no pmat-book dependency)\n- 39 unwrap() calls in src/ - tracked in GH-41\n\n## Toyota Way Principles Applied\n\nβ
**Jidoka (Built-in Quality)**\n- Pre-commit hooks enforce quality at source\n- .clippy.toml disallows unwrap() calls\n- 959 tests verify correctness\n\nβ
**Andon Cord (Stop the Line)**\n- Pulled Andon Cord to halt feature work\n- Systematic compliance restoration\n- Full quality audit before proceeding\n\nβ
**Genchi Genbutsu (Go & See)**\n- Comprehensive audit of actual state\n- No assumptions - direct verification\n- Measured each quality gate\n\nβ
**Kaizen (Continuous Improvement)**\n- .pmat infrastructure for ongoing monitoring\n- TDG baseline enables regression detection\n- Documented gaps for future work\n\nβ
**Zero Defects**\n- Zero SATD comments maintained\n- All tests passing\n- Known defects tracked (GH-41: unwrap() calls)\n\n## Compliance Report\n\nFull report: /tmp/pmat-compliance-report.md\n\n**COMPLIANCE STATUS: β
COMPLIANT** (with documented exceptions)\n\nAll critical quality gates active and enforced. Minor gaps documented\nand tracked via GitHub issues. Ready to resume feature development.\n\nRefs Implement Bayesian Linear Regression (analytical posterior)\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "SecurityVulnerabilities",
"confidence": 0.9,
"commit_hash": "ec06a6a17eab1ed1a1596faf39ca748e896bd36a",
"author": "noah.gift@gmail.com",
"timestamp": 1763827838,
"lines_added": 49,
"lines_removed": 11,
"files_changed": 4,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "docs: Add comprehensive documentation and example for new graph algorithms\n\n**Documentation Updates:**\n- Add detailed theory sections for 8 new graph algorithms in book chapter\n- Closeness centrality (Wasserman & Faust 1994)\n- Eigenvector centrality (power iteration method)\n- Katz centrality (generalized eigenvector with attenuation)\n- Harmonic centrality (robust closeness variant, Boldi & Vigna 2014)\n- Network density (edge ratio metrics)\n- Network diameter (longest shortest path)\n- Clustering coefficient (triangle-based clustering)\n- Degree assortativity (Newman 2002 correlation metric)\n\n**Example Enhancements:**\n- Update graph_social_network.rs to demonstrate all new algorithms\n- Add closeness centrality analysis (reachability)\n- Add eigenvector centrality analysis (connection quality)\n- Add structural statistics section (density, diameter, clustering, assortativity)\n- Enhanced interpretations and real-world insights\n- Apply clippy fixes for format strings and add allow annotation for function length\n\n**Implementation Details:**\n- All algorithms include formulas, complexity analysis, and applications\n- Code examples for each algorithm with proper error handling\n- Comparison of algorithms (e.g., harmonic vs closeness for disconnected graphs)\n- Parameter selection guidance (e.g., Katz alpha values)\n\n**Quality Assurance:**\n- cargo run --example graph_social_network: PASSES\n- make tier1: PASSES\n- make tier2: PASSES (all 775 tests)\n- cargo fmt applied\n- cargo clippy fixes applied for modified files\n- Note: 96 pre-existing clippy warnings in other examples (not from this change)\n\n**Testing:**\n- Example demonstrates all 8 new algorithms on social network\n- Validates output interpretation and practical insights\n- Comprehensive edge case coverage in book documentation\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "TypeAnnotationGaps",
"confidence": 0.75,
"commit_hash": "95dee8bfdc66b1428214fd4102e92e53273e84d5",
"author": "noah.gift@gmail.com",
"timestamp": 1763809472,
"lines_added": 492,
"lines_removed": 1,
"files_changed": 2,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "feat(graph): Add comprehensive centrality and structural analysis algorithms\n\nImplements 8 new graph algorithms to complete the graph module:\n\n**Centrality Algorithms (4 new methods)**:\n- closeness_centrality() - Shortest path-based centrality (Wasserman & Faust 1994)\n- eigenvector_centrality() - Power iteration method for node importance\n- katz_centrality() - Generalized eigenvector with attenuation factor\n- harmonic_centrality() - Robust distance-based centrality (Boldi & Vigna 2014)\n\n**Structural Statistics (4 new methods)**:\n- density() - Edge density ratio (directed/undirected aware)\n- diameter() - Longest shortest path (None if disconnected)\n- clustering_coefficient() - Triangle-based clustering measure\n- assortativity() - Degree correlation coefficient\n\n**Implementation Details**:\n- All algorithms use BFS for shortest paths (O(nΒ·(n+m)) complexity)\n- Power iteration for eigenvector/Katz centrality (O(kΒ·m) complexity)\n- Comprehensive error handling (empty graphs, disconnected components)\n- 82 tests covering edge cases, symmetry properties, graph types\n- Zero clippy warnings in graph module (with allow annotations for intentional patterns)\n- make lint passes (warnings only in pre-existing examples)\n\n**Breaking Changes**: None (pure additions)\n\n**Version**: 0.4.2 β 0.5.0 (new features)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "TypeAnnotationGaps",
"confidence": 0.75,
"commit_hash": "8afa11607d702a431264a1a62c976ce1be57de9e",
"author": "noah.gift@gmail.com",
"timestamp": 1763808996,
"lines_added": 771,
"lines_removed": 2,
"files_changed": 3,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "feat: Complete Phase 3 of GH-55 - Mutation Testing Integration\n\n**Phase 3: Mutation Testing - COMPLETE β
**\n\n**Mutation Testing Setup:**\n- cargo-mutants v25.3.1 installed and configured\n- CI integration already in place (.github/workflows/ci.yml)\n- ~13,705 mutants identified across codebase\n- Target: β₯80% mutation score (PMAT recommendation)\n\n**Documentation Added:**\n1. **mutation-testing-setup.md** - Comprehensive setup guide\n - CI configuration and workflow\n - Local execution instructions\n - Known issues and workarounds\n - Viewing results from CI artifacts\n - Mutation score baseline data\n\n2. **CLAUDE.md updates** - Added mutation testing section\n - CI-based workflow documentation\n - Local execution commands\n - Known package ambiguity issue for published crates\n - Mutation stats: ~13,705 mutants, 300s timeout\n - Reference to detailed setup doc\n\n3. **.cargo-mutants.toml** - Configuration file\n - Stable toolchain specification\n - Test options and timeouts\n - Library-only testing configuration\n\n**Known Issue - Local Execution:**\nLocal mutation testing encounters package ambiguity when testing published crates:\n```\nerror: There are multiple `aprender` packages in your project, and the specification `aprender@0.4.1` is ambiguous.\n```\n\n**Workaround:** Use CI for mutation testing (recommended) or temporarily bump version.\n\n**CI Integration:**\n- Runs on every PR/push to main\n- 300-second timeout per mutant\n- Results uploaded as artifacts (30-day retention)\n- Continue-on-error for non-blocking feedback\n\n**Testing Excellence Progress:**\n- Phase 1: Coverage Analysis β
(96.94% achieved)\n- Phase 2: Coverage CI Integration β
\n- Phase 3: Mutation Testing Integration β
\n- Phase 4: Final documentation updates (remaining)\n\n**Refs:** GH-55 (Testing Excellence improvement)\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "ConfigurationErrors",
"confidence": 0.75,
"commit_hash": "572ba0be72bb99627cc7ee14aa2da9385d8a4e68",
"author": "noah.gift@gmail.com",
"timestamp": 1763762430,
"lines_added": 152,
"lines_removed": 2,
"files_changed": 3,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "fix(lint): Resolve ~140 clippy pedantic warnings in library code (Refs #44)\n\nApplied comprehensive clippy pedantic lint fixes to library code (src/):\n\n**Auto-fixed by cargo clippy --fix (119 warnings):**\n- Format string inlining (variables directly in format! strings)\n- Unnecessary qualifications removed (std::convert::Into::into β Into::into)\n- Debug trait derives added to Graph and DescriptiveStats\n\n**Manual fixes (21 warnings):**\n- Removed needless continue in Graph modularity calculation\n- Removed trivial f32-to-f32 casts in PCA\n- Fixed needless-pass-by-value: save_safetensors now takes &BTreeMap\n- Renamed _bootstrap_sample β bootstrap_sample (no underscore prefix)\n- Applied let...else patterns in 3 tree-building functions\n- Added #[allow(clippy::unused_self)] to 17 helper methods\n (future refactoring: convert to associated functions)\n\n**Results:**\n- Library code: 1 warning (benign config warning only)\n- Tests: All 742 unit tests passing\n- Breaking change: save_safetensors() signature changed to take &BTreeMap\n\n**Note:** Test and benchmark code still have ~990 pedantic warnings.\nThese will be addressed in a future cleanup task.\n\nQuality improvements:\n- More idiomatic Rust code following pedantic guidelines\n- Better performance (removed unnecessary clone in safetensors)\n- Cleaner pattern matching with let...else\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "ASTTransform",
"confidence": 0.85,
"commit_hash": "4114ce23d4c14f07bb70be662f835f5e09169081",
"author": "noah.gift@gmail.com",
"timestamp": 1763751342,
"lines_added": 192,
"lines_removed": 209,
"files_changed": 15,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
}
],
"test": [
{
"message": "fix: Fix 3 broken doctests from unwrap() elimination (Refs #41)\n\n## Issue\n\nAfter unwrap() elimination (commit 6d693c2), 3 doctests were broken:\n1. src/classification/mod.rs - KNearestNeighbors::predict() now returns Result\n2. src/cluster/mod.rs - DBSCAN::labels() already has expect() internally\n3. src/graph/mod.rs - Graph uses from_edges(), not add_edge()\n\n## Fixes\n\n### 1. KNearestNeighbors (line 365)\n**Before:** `let predictions = knn.predict(&test);`\n**After:** `let predictions = knn.predict(&test).expect(\"Predict should succeed\");`\n**Reason:** predict() now returns Result<Vec<usize>>, needs unwrap\n\n### 2. DBSCAN (line 517)\n**Before:** `let labels = dbscan.labels().expect(\"Labels available after fit\");`\n**After:** `let labels = dbscan.labels();`\n**Reason:** labels() returns &Vec<i32>, already has expect() internally\n\n### 3. Graph (line 13)\n**Before:** Uses Graph::new() + add_edge() pattern\n**After:** `let g = Graph::from_edges(&[(0, 1), (1, 2), (2, 0)], false);`\n**Reason:** Graph API uses from_edges() constructor, not add_edge() method\n\n## Verification\n\nβ
cargo test --doc: 98 passed; 0 failed\nβ
cargo test --lib: 742 passed; 0 failed\nβ
All doctests now compile and run successfully\n\n## Cleanup\n\nAlso removed test artifact files:\n- test_forest_10trees.safetensors\n- test_forest_3trees.safetensors\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "OwnershipBorrow",
"confidence": 0.85,
"commit_hash": "70999c6b764e9e7b14ad54489fe64e835e10f65a",
"author": "noah.gift@gmail.com",
"timestamp": 1763750314,
"lines_added": 3,
"lines_removed": 6,
"files_changed": 5,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "feat: Add comprehensive lint configuration to Cargo.toml (Refs #42)\n\n## Overview\n\nEnhanced code quality enforcement by adding high-value Rust and Clippy lints to Cargo.toml. This improves consistency, catches potential issues early, and raises pmat rust-project-score.\n\n## Changes\n\n### Cargo.toml - [lints.rust]\n**Safety:**\n- unsafe_op_in_unsafe_fn = \"warn\"\n\n**Code Quality:**\n- unreachable_pub = \"warn\"\n- missing_debug_implementations = \"warn\"\n\n**Best Practices:**\n- rust_2018_idioms = \"warn\"\n- trivial_casts/trivial_numeric_casts = \"warn\"\n- unused_import_braces = \"warn\"\n- unused_lifetimes = \"warn\"\n- unused_qualifications = \"warn\"\n\n### Cargo.toml - [lints.clippy]\n**Base Levels:**\n- all = \"warn\" (existing)\n- pedantic = \"warn\" (NEW - strict quality checks)\n\n**High-Priority:**\n- checked_conversions = \"warn\"\n- inefficient_to_string = \"warn\"\n- explicit_iter_loop = \"warn\"\n- manual_ok_or = \"warn\"\n- redundant_closure_for_method_calls = \"warn\"\n- + others\n\n**ML-Specific Allows:**\n- cast_* - Allow numeric conversions in ML algorithms\n- float_cmp - Allow float comparisons (with epsilon)\n- many_single_char_names - Mathematical notation (x,y,z)\n- unreadable_literal - Long test data literals\n- items_after_statements - Mid-function declarations\n\n### CLAUDE.md\n- Added comprehensive \"Linting Configuration\" section\n- Documented all lint categories and rationale\n- Explained ML-specific exceptions\n- Current state: ~140 warnings (mostly style)\n\n## Verification\n\nβ
All 742 tests passing\nβ
Cargo builds successfully\nβ
~140 pedantic warnings (non-blocking style issues)\nβ
Core production code remains lint-clean\n\n## Impact\n\n- **Code Quality Score:** Expected improvement from 65.4%\n- **Rust Tooling Score:** Expected improvement from 31.9%\n- **Early Detection:** Catches quality issues during development\n- **Consistency:** Enforces uniform code standards\n\n## Next Steps\n\nOptional: Address ~140 pedantic warnings (format strings, unused qualifications, etc.)\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "ASTTransform",
"confidence": 0.85,
"commit_hash": "12ae9931c996ada26d64805eddfbf4f827a7c5be",
"author": "noah.gift@gmail.com",
"timestamp": 1763747185,
"lines_added": 107,
"lines_removed": 1,
"files_changed": 3,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "feat: Phase 3 COMPLETE - Eliminate all 1,066 unwrap() calls in src/ (Refs #41)\n\n## Achievement\n\nβ
**ZERO unwrap() calls in production code (src/)**\nβ
**ZERO unwrap_err() calls in production code (src/)**\n- Fixed: 1,055 unwrap() β expect() calls\n- Fixed: 28 unwrap_err() β expect_err() calls\n- Total: 1,083 dangerous calls eliminated\n- Remaining: 11 unwrap() in docs only (acceptable)\n- All 742 unit tests passing\n\n## Files Modified (14 files)\n\n### Large Files (>100 calls each):\n1. src/cluster/mod.rs: 280 unwrap() β expect()\n2. src/tree/mod.rs: 178 unwrap() + 3 unwrap_err() β expect()\n3. src/classification/mod.rs: 152 unwrap() + 16 unwrap_err() β expect()\n4. src/linear_model/mod.rs: 136 unwrap() + 2 unwrap_err() β expect()\n5. src/preprocessing/mod.rs: 121 unwrap() + 4 unwrap_err() β expect()\n\n### Medium Files (20-99 calls):\n6. src/stats/mod.rs: 47 unwrap() β expect()\n7. src/primitives/matrix.rs: 37 unwrap() β expect()\n8. src/model_selection/mod.rs: 35 unwrap() β expect()\n9. src/data/mod.rs: 20 unwrap() β expect()\n10. src/metrics/mod.rs: 16 unwrap() β expect()\n11. src/graph/mod.rs: 14 unwrap() β expect()\n12. src/serialization/safetensors.rs: 12 unwrap() + 3 unwrap_err() β expect()\n\n### Small Files (<10 calls):\n13. src/optim/mod.rs: 5 unwrap() β expect()\n14. src/mining/mod.rs: 2 unwrap() β expect()\n\n## Remediation Strategy\n\nAll calls replaced with descriptive messages:\n- Matrix: \"Matrix dimensions (NxM) match data length (L)\"\n- Training: \"Fit should succeed with valid data\"\n- Floats: \"Values are valid f32 (not NaN)\"\n- Iterators: \"Collection is non-empty\"\n- State: \"Model is fitted and has coefficients\"\n- Errors: \"Should fail when [condition]\"\n\n## Verification\n\nβ
Zero unwrap() in code: rg \"^[^/]*\\.unwrap\\(\\)\" src/ β no results\nβ
Zero unwrap_err(): rg \"\\.unwrap_err\\(\\)\" src/ β no results\nβ
All 742 unit tests passing\nβ
Clippy clean on lib: cargo clippy --lib -- -D warnings β\nβ
Docs unwrap() only: 11 in //! and /// (acceptable)\n\n## Impact\n\nCRITICAL FIX: Eliminated Cloudflare-class defect (1,066 unwrap calls)\nENFORCEMENT: .clippy.toml blocks new unwrap() calls\nNote: --no-verify used (tests/examples/benches still have unwrap)\n\n## Stats\n\n- Lines changed: +1,316 -1,108\n- Net change: +208 lines (better error messages)\n- Test coverage: 742 tests, 100% passing\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "IteratorChain",
"confidence": 0.8,
"commit_hash": "6d693c23154a2bb2ccfb11b1d8229bb4c88aeb7b",
"author": "noah.gift@gmail.com",
"timestamp": 1763746550,
"lines_added": 2101,
"lines_removed": 1112,
"files_changed": 17,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "feat: Phase 1-2 of unwrap() elimination - Audit and enforcement (Refs #41)\n\n## Audit Findings\n\nComprehensive audit revealed:\n- **1,066 unwrap() calls in src/** (not 326 as initially reported)\n- 423 in tests/, 118 in examples/, 11 in benches/\n- **Total: 1,618 unwrap() calls**\n\nTop 5 offenders account for 81.3% of src/ unwraps:\n1. src/cluster/mod.rs: 280 (26.3%)\n2. src/tree/mod.rs: 178 (16.7%)\n3. src/classification/mod.rs: 152 (14.3%)\n4. src/linear_model/mod.rs: 136 (12.8%)\n5. src/preprocessing/mod.rs: 121 (11.4%)\n\n## Enforcement Infrastructure\n\n- **NEW**: .clippy.toml with disallowed-methods configuration\n - Bans Option::unwrap() and Result::unwrap()\n - Provides clear guidance: use .expect() or proper error handling\n - Cognitive complexity threshold: 15 (Toyota Way)\n\n- **UPDATED**: .pmat-gates.toml\n - Accurate unwrap_count: 1066 (was 326)\n - check_unwraps: true\n - max_unwraps: 0 (target)\n\n- **UPDATED**: CLAUDE.md\n - Comprehensive unwrap() remediation documentation\n - Top offenders list\n - Code examples (dangerous vs safe patterns)\n - 6-8 week timeline, 80-120 hours effort\n\n## Impact\n\n**BREAKING**: cargo clippy -- -D clippy::disallowed-methods now FAILS\nThis is intentional - enforcement active for new code.\nExisting code remediation in progress (Phase 3).\n\n## Next Steps\n\nPhase 3: Systematic elimination of 1,066 unwrap() calls\nPriority: Top 5 files (867 unwraps, 81.3% of total)\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "ASTTransform",
"confidence": 0.85,
"commit_hash": "043389ed0992f4a3a2862a841ae1084efa133092",
"author": "noah.gift@gmail.com",
"timestamp": 1763743506,
"lines_added": 72,
"lines_removed": 6,
"files_changed": 4,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "feat: Integrate pmat v2.200.0 features (Refs #40)\n\n- Add comprehensive .pmat-gates.toml configuration\n- Update Makefile with pmat-score, pmat-gates, quality-report targets\n- Document PMAT v2.200.0 features in CLAUDE.md\n- Track 326 unwrap() calls (separate issue #41)\n- Current metrics: Rust score 124/134 (A+), TDG 95.2/100 (A+)\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "ConfigurationErrors",
"confidence": 0.75,
"commit_hash": "19ca344f8ed3a88b32b7acb86ce12035d926ed38",
"author": "noah.gift@gmail.com",
"timestamp": 1763743207,
"lines_added": 249,
"lines_removed": 21,
"files_changed": 4,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "docs: Add Error Handling best practices chapter (Refs #38)\n\nTransform stub chapter into comprehensive error handling guide (701 lines).\n\nContent:\n- Core principles (Result<T>, rich context, specific error types)\n- AprenderError design and variants\n- Error handling patterns (? operator, early validation, From trait)\n- Real-world examples from linear_model, cluster modules\n- User-facing error handling strategies\n- Testing error conditions\n- Common pitfalls and solutions\n\nFeatures:\nβ
701 lines of comprehensive content\nβ
4 real examples from aprender codebase\nβ
Clear do's and don'ts with code examples\nβ
Pattern matching for error recovery\nβ
Testing patterns for each error variant\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "ASTTransform",
"confidence": 0.85,
"commit_hash": "e8ea43a9800e551ddad405bf281b6d3cc10290ba",
"author": "noah.gift@gmail.com",
"timestamp": 1763737566,
"lines_added": 696,
"lines_removed": 5,
"files_changed": 1,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "feat: Implement Random Forest Regression (bootstrap aggregating) (Refs #30)\n\nImplemented Random Forest Regressor using EXTREME TDD methodology:\n\n**Core Implementation:**\n- RandomForestRegressor struct with bootstrap aggregating\n- Uses DecisionTreeRegressor as base estimators\n- fit(), predict(), score() methods\n- Builder pattern: with_max_depth(), with_random_state()\n- Predictions averaged across all trees to reduce variance\n\n**Algorithm Details:**\n- Bootstrap sampling: Each tree trained on random sample with replacement\n- Ensemble averaging: Final prediction = mean of tree predictions\n- Variance reduction: Decorrelated trees reduce overfitting\n- RΒ² score for evaluation\n\n**Tests (16 comprehensive):**\nβ
Constructor and configuration\nβ
Simple linear data (y = 2x + 1)\nβ
Non-linear data (y = xΒ²)\nβ
RΒ² score computation\nβ
n_estimators effect (more trees β stable predictions)\nβ
Comparison with single DecisionTreeRegressor\nβ
Multidimensional features (2D+)\nβ
Constant target prediction\nβ
Single sample edge case\nβ
random_state reproducibility\nβ
Validation (mismatched dimensions, zero samples)\nβ
Error handling (predict before fit)\nβ
Comparison with LinearRegression (RF better on non-linear)\nβ
max_depth effect on complexity\nβ
Default trait implementation\n\n**Code Quality:**\n- Zero clippy warnings\n- Total tests: 715 passing (+16 from 699)\n- Exported in prelude\n- Comprehensive rustdoc documentation\n- Iterator-based implementation (no needless indexing)\n\n**Key Features:**\n- Reduces overfitting vs single tree through averaging\n- No hyperparameter tuning required (good defaults)\n- Handles non-linear relationships naturally\n- Reproducible with random_state parameter\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "IteratorChain",
"confidence": 0.8,
"commit_hash": "5d999b52749095373537785bea156cb41ba52313",
"author": "noah.gift@gmail.com",
"timestamp": 1763731392,
"lines_added": 543,
"lines_removed": 1,
"files_changed": 3,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "feat: Implement Decision Tree Regression (CART algorithm) (Refs #29)\n\nImplemented complete Decision Tree Regressor using EXTREME TDD methodology:\n\n**Core Implementation:**\n- RegressionTreeNode/RegressionLeaf/RegressionNode structures\n- DecisionTreeRegressor with builder pattern API\n- fit(), predict(), score() methods (Estimator trait compatible)\n- MSE-based splitting criterion (variance reduction)\n- Configurable: max_depth, min_samples_split, min_samples_leaf\n\n**Algorithm Details:**\n- Mean Squared Error splitting: minimizes weighted variance\n- Leaf predictions: mean of training samples in leaf\n- RΒ² score for evaluation\n- Recursive tree building with stopping criteria\n- Proper handling of edge cases (constant targets, single samples)\n\n**Tests (16 comprehensive):**\nβ
Constructor and configuration\nβ
Simple linear data (y = 2x + 1)\nβ
Non-linear data (y = xΒ²)\nβ
RΒ² score computation\nβ
max_depth limits tree complexity\nβ
min_samples_split/leaf pruning parameters\nβ
Multidimensional features (2D+)\nβ
Constant target prediction\nβ
Single sample edge case\nβ
Validation (mismatched dimensions, zero samples)\nβ
Error handling (predict before fit)\nβ
Comparison with LinearRegression\nβ
Default trait implementation\n\n**Code Quality:**\n- Zero clippy warnings\n- Total tests: 699 passing (+16 from 683)\n- Exported in prelude\n- Comprehensive documentation\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "ConfigurationErrors",
"confidence": 0.75,
"commit_hash": "63d30735a1832bd3c4ed62ce4f6ce8d325e974af",
"author": "noah.gift@gmail.com",
"timestamp": 1763730230,
"lines_added": 761,
"lines_removed": 1,
"files_changed": 3,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "chore: Mark Issue #18 complete in roadmap (Refs #18)\n\nt-SNE (t-Distributed Stochastic Neighbor Embedding) verified complete:\n- 12 comprehensive tests (all passing)\n- Gradient descent optimization with KL divergence minimization\n- Perplexity parameter for local/global structure balance\n- Early exaggeration for better cluster separation\n- fit/transform/fit_transform API via Transformer trait\n- Example: tsne_visualization.rs with 6 scenarios\n- Book chapters: tsne.md theory + tsne-visualization.md case study\n- Zero clippy warnings\n\nt-SNE provides non-linear dimensionality reduction ideal for\nvisualizing high-dimensional data (embeddings, MNIST, clusters).\nPreserves local structure better than PCA.\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "ASTTransform",
"confidence": 0.90000004,
"commit_hash": "4d2fcfa5b475061701ddc2e65be2608f84bfad05",
"author": "noah.gift@gmail.com",
"timestamp": 1763727258,
"lines_added": 1,
"lines_removed": 1,
"files_changed": 1,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "feat: Implement Apriori algorithm for association rule mining (Refs #21)\n\nComplete EXTREME TDD implementation of the Apriori algorithm for frequent\nitemset mining and association rule discovery in transactional data.\n\n# Implementation Details\n\n## Algorithm Components\n\n**Core Apriori Algorithm**:\n- Frequent itemset mining with iterative level-wise search\n- Apriori property (anti-monotonicity) for efficient pruning\n- Candidate generation via join step (combine k-1 itemsets)\n- Candidate pruning via apriori property (infrequent subsets)\n- Association rule generation from frequent itemsets\n- Support, confidence, and lift metric calculation\n\n**Key Methods**:\n- `fit()`: Main algorithm - finds frequent itemsets and generates rules\n- `find_frequent_1_itemsets()`: Initial scan for individual items\n- `generate_candidates()`: Join step for k-itemsets\n- `has_infrequent_subset()`: Prune step (Apriori property)\n- `prune_candidates()`: Filter by minimum support\n- `generate_rules()`: Extract association rules\n- `calculate_support()`: Count transactions containing itemset\n- `generate_subsets()`: Power set generation for antecedents\n\n## API Design\n\n**Builder Pattern**:\n```rust\nlet apriori = Apriori::new()\n .with_min_support(0.3) // 30% minimum support\n .with_min_confidence(0.7); // 70% minimum confidence\n```\n\n**Usage**:\n```rust\napriori.fit(&transactions);\nlet itemsets = apriori.get_frequent_itemsets();\nlet rules = apriori.get_rules();\n```\n\n## Parameters\n\n- **min_support** (default: 0.1): Minimum support threshold (0.0-1.0)\n- **min_confidence** (default: 0.5): Minimum confidence threshold (0.0-1.0)\n\n## Complexity\n\n- **Time**: O(2^n Β· |D| Β· |T|) worst case, O(n^k Β· |D|) typical\n - n = unique items, |D| = transactions, |T| = avg transaction size\n - k = max itemset size (usually < 5 due to pruning)\n- **Space**: O(n + |F|) where |F| = frequent itemsets\n\n## Testing (EXTREME TDD)\n\n**RED Phase** - 15 comprehensive tests:\n1. Constructor and builder pattern (3 tests)\n2. Basic fitting and frequent itemset discovery\n3. Association rule generation\n4. Support calculation (static method)\n5. Confidence calculation\n6. Lift calculation\n7. Minimum support filtering\n8. Minimum confidence filtering\n9. Edge cases: empty transactions, single-item transactions\n10. Error handling: get_rules/get_itemsets before fit\n\n**GREEN Phase** - Algorithm implementation:\n- ~400 lines of core logic\n- HashSet for O(1) itemset membership\n- Bit masking for power set generation (O(2^k) for k-itemsets)\n- Early termination when no frequent k-itemsets found\n- Single database scan per k-level\n\n**REFACTOR Phase**:\n- Added to prelude for ergonomic imports\n- Zero clippy warnings\n- Comprehensive documentation with real-world examples\n\n## Example\n\n**Market Basket Analysis** (`market_basket_apriori.rs`):\n8 comprehensive examples demonstrating:\n1. Basic grocery store transactions\n2. Support threshold effects (20% vs 50%)\n3. Breakfast category analysis\n4. Lift interpretation (correlation vs independence)\n5. Confidence vs support trade-off\n6. Product placement recommendations\n7. Item frequency analysis\n8. Cross-selling opportunities (sorted by lift)\n\n## Documentation\n\n**Theory Chapter** (`ml-fundamentals/apriori.md`):\n- Algorithm explanation with step-by-step walkthrough\n- Support, confidence, lift definitions with formulas\n- Apriori property and pruning strategy\n- Complexity analysis\n- Parameters and best practices\n- Applications: retail, recommendation systems, medical diagnosis\n- Comparison with FP-Growth\n- Common pitfalls and validation strategies\n\n**Case Study** (`examples/market-basket-apriori.md`):\n- Complete RED-GREEN-REFACTOR documentation\n- Technical challenges solved:\n 1. Efficient candidate generation (join step)\n 2. Apriori pruning (anti-monotonicity)\n 3. Rule generation from itemsets (power set)\n 4. Sorting heterogeneous collections (HashSet β Vec)\n- Performance optimizations\n- Use cases with business impact metrics\n\n## Impact\n\n**Tests**: 652 β 667 (+15)\n**Modules**: Added `src/mining/mod.rs`\n**Examples**: Added market basket analysis\n**Quality**: Zero clippy warnings, all tests passing\n\n## Use Cases\n\n1. **Retail**: Cross-selling, product placement, bundle promotions\n2. **E-commerce**: \"Customers who bought X also bought Y\"\n3. **Medical**: Symptom patterns, drug interactions\n4. **Web Analytics**: Clickstream analysis, session patterns\n\n## Technical Highlights\n\n- Apriori property enables exponential search space reduction\n- HashSet data structure for O(1) itemset operations\n- Bit masking for efficient power set generation\n- Sorting by support/confidence for actionable insights\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "ASTTransform",
"confidence": 0.85,
"commit_hash": "152bf0e0ff192b182ac4264367af8e76d63e0b15",
"author": "noah.gift@gmail.com",
"timestamp": 1763725355,
"lines_added": 1643,
"lines_removed": 0,
"files_changed": 8,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
},
{
"message": "feat: Implement t-SNE for dimensionality reduction and visualization (Refs #18)\n\n**Summary:**\nImplemented t-Distributed Stochastic Neighbor Embedding (t-SNE) for non-linear\ndimensionality reduction optimized for visualization of high-dimensional data.\n\n**Changes:**\n- t-SNE implementation (~400 lines)\n - Pairwise distance computation in high-D\n - Perplexity-based conditional probabilities with binary search\n - Symmetric joint probability matrix\n - Student's t-distribution for low-D similarities\n - KL divergence minimization via gradient descent\n - Momentum optimization (0.5 β 0.8 at iteration 250)\n - Reproducible random initialization with LCG\n\n- Tests: 640 β 652 (+12)\n - Basic fit/predict/fit_transform functionality\n - Perplexity parameter effects (2.0, 5.0)\n - Learning rate and iteration count\n - 2D and 3D embeddings\n - Reproducibility with random_state\n - Error handling (transform before fit)\n - Local structure preservation\n - Finite embedding values\n\n- Example: examples/tsne_visualization.rs\n - 4D β 2D reduction demonstration\n - Perplexity effects comparison\n - 3D embedding example\n - Learning rate effects\n - Reproducibility demonstration\n - t-SNE vs PCA comparison\n\n- Documentation:\n - book/src/ml-fundamentals/tsne.md: Algorithm theory\n - book/src/examples/tsne-visualization.md: Implementation case study\n - Complete formulas for perplexity, KL divergence, gradients\n - Time/space complexity: O(nΒ²Β·iter) / O(nΒ²)\n - Comparison with PCA and best practices\n\n- Exported TSNE in prelude\n\n**Technical Details:**\n- Binary search for sigma to match target perplexity\n- Student's t-distribution avoids crowding problem\n- Momentum switches from 0.5 to 0.8 at iteration 250\n- Small random initialization (Β±0.00005) for stability\n- Numerical stability with max(1e-12) for probabilities\n- Zero clippy warnings\n\nπ€ Generated with [Claude Code](https://claude.com/claude-code)\n\nCo-Authored-By: Claude <noreply@anthropic.com>\n",
"label": "ASTTransform",
"confidence": 0.90000004,
"commit_hash": "14cb96e946b83fce692b25d406097ca0e9f52bd1",
"author": "noah.gift@gmail.com",
"timestamp": 1763724402,
"lines_added": 1224,
"lines_removed": 1,
"files_changed": 7,
"error_code": null,
"clippy_lint": null,
"has_suggestion": false,
"suggestion_applicability": null,
"source": "CommitMessage"
}
],
"metadata": {
"total_examples": 64,
"train_size": 44,
"validation_size": 9,
"test_size": 11,
"class_distribution": {
"TraitBounds": 8,
"ASTTransform": 31,
"ConcurrencyBugs": 1,
"ConfigurationErrors": 3,
"ComprehensionBugs": 1,
"TypeAnnotationGaps": 2,
"SecurityVulnerabilities": 5,
"IntegrationFailures": 3,
"TypeErrors": 1,
"IteratorChain": 2,
"OwnershipBorrow": 6,
"StdlibMapping": 1
},
"avg_confidence": 0.8343747,
"min_confidence": 0.75,
"repositories": [
"aprender"
]
}
}