rust-rule-miner πβοΈ
Automatic rule discovery from historical data using association rule mining, sequential pattern mining, and graph-based pattern matching.
Discover business rules, recommendations, and patterns from your data without manual rule authoring!
π― Features
Core Features
- Association Rule Mining - Discover "If X then Y" patterns (Apriori, FP-Growth algorithms)
- Sequential Pattern Mining - Find time-ordered patterns (A β B β C)
- Graph-Based Patterns - Model entity relationships and discover complex patterns
- Quality Metrics - Confidence, Support, Lift, Conviction scores for each rule
- π Engine Integration - Direct execution with rust-rule-engine (enabled by default)
- Excel/CSV Loading - Stream large datasets from Excel (.xlsx) and CSV files with ultra-low memory using excelstream
- ColumnMapping - Flexible field selection and multi-field pattern mining from CSV/Excel
- GRL Export - Export rules to GRL format for external rule engines
- Visualization - Export graphs to DOT format for Graphviz
Additional Features (opt-in)
- ποΈ PostgreSQL Streaming (
postgresfeature) - Stream and mine data directly from PostgreSQL - βοΈ Cloud Storage (
cloudfeature) - Load data from AWS S3 and HTTP endpoints
π Quick Start
use ;
use Utc;
// 1. Create transactions with items you want to mine patterns from
// Each transaction contains: ID, items (the values to find patterns in), timestamp
let transactions = vec!;
// The miner will find patterns like: "Laptop" often appears with "Mouse"
// 2. Configure mining parameters
let config = MiningConfig ;
// 3. Mine association rules
let mut miner = new;
miner.add_transactions?;
let rules = miner.mine_association_rules?;
// 4. Display discovered rules
for rule in &rules
// Output:
// Rule: ["Laptop"] => ["Mouse"]
// Confidence: 100.0%
// Support: 75.0%
// Lift: 1.33
π¦ Installation
Default installation (includes rust-rule-engine for execution):
[]
= "0.2.2"
Mining-only (without engine, just export to GRL):
[]
= { = "0.2.2", = false }
With additional features:
[]
# Add PostgreSQL streaming support
= { = "0.2.2", = ["postgres"] }
# Add cloud storage support (S3, HTTP)
= { = "0.2.2", = ["cloud"] }
# Combine all features
= { = "0.2.2", = ["postgres", "cloud"] }
# Mining-only + PostgreSQL (without engine)
= { = "0.2.2", = false, = ["postgres"] }
π Loading Data from Excel/CSV
Stream large datasets with constant memory usage using excelstream:
use ;
// Specify which columns to mine: transaction_id, items, timestamp
let mapping = simple;
// Load from CSV file (ultra-fast, ~1.2M rows/sec)
let transactions = from_csv?;
// Load from Excel file (.xlsx)
let transactions = from_excel?; // 0 = first sheet
// Mine rules from loaded data
let mut miner = new;
miner.add_transactions?;
let rules = miner.mine_association_rules?;
Memory usage: ~3-35 MB regardless of file size! π
Mining Different Fields (New in v0.2.0+)
No preprocessing needed! Use ColumnMapping to mine any fields directly:
ColumnMapping API:
// Single field mining
simple
// Multi-field mining (combine multiple columns)
multi_field
Examples:
use ;
// CSV: customer_id, product_name, category, price, location, timestamp
// 0 1 2 3 4 5
// Option 1: Mine product names (column 1)
let mapping = simple; // tx_id=0, items=1, timestamp=5
let transactions = from_csv?;
// Option 2: Mine categories (column 2)
let mapping = simple; // tx_id=0, items=2, timestamp=5
let transactions = from_csv?;
// Option 3: Mine product + category combined
let mapping = multi_field;
let transactions = from_csv?;
// Items: "Laptop::Electronics", "Mouse::Accessories"
// Option 4: Mine product + category + location
let mapping = multi_field;
let transactions = from_csv?;
// Items: "Laptop::Electronics::US", "Mouse::Accessories::UK"
Multi-field zipping: If your CSV has comma-separated values in multiple columns:
customer_id,products,categories,locations,timestamp
123,"Laptop,Mouse","Electronics,Accessories","US,US",2024-01-01
The miner will automatically zip them together:
let mapping = multi_field;
// Result: ["Laptop::Electronics::US", "Mouse::Accessories::US"]
π§ Use Cases
1. E-commerce Product Recommendations
use ;
// Load historical purchase data from CSV (transaction_id, items, timestamp)
let mapping = simple;
let transactions = from_csv?;
// Configure mining parameters
let config = MiningConfig ;
// Discover: "Customers who bought X also bought Y"
let mut miner = new;
miner.add_transactions?;
let rules = miner.mine_association_rules?;
// Result: Laptop (85%) β Mouse, Keyboard (75%) β Monitor
2. Fraud Detection Pattern Discovery
use ;
// Configure for fraud detection
let config = MiningConfig ;
// Find patterns unique to fraud cases
let mut fraud_miner = new;
fraud_miner.add_transactions?;
let patterns = fraud_miner.mine_association_rules?;
// Result: IP_mismatch + unusual_time + high_amount β fraud (90%)
3. Medical Diagnosis Support
// Discover: "Symptoms A, B, C β Likely Disease X"
let medical_miner = new;
4. Sequential Pattern Mining
use ;
use Duration;
// Find time-ordered patterns
let config = MiningConfig ;
let mut miner = new;
miner.add_transactions?;
let sequential_patterns = miner.find_sequential_patterns?;
// Result: Laptop β (2 days) β Mouse β (5 days) β Laptop Bag
π¨ Engine Integration
Execute mined rules in real-time with built-in rust-rule-engine support (included by default).
Two-Phase Approach:
- Mining Phase: Apply quality criteria (min_support, min_confidence, min_lift) to filter rules
- Execution Phase: Execute pre-filtered high-quality rules in real-time
use ;
use ;
// Load historical data (transaction_id, items, timestamp)
let mapping = simple;
let transactions = from_csv?;
// PHASE 1: Mine rules with quality criteria
let config = MiningConfig ;
let mut miner = new;
miner.add_transactions?;
let rules = miner.mine_association_rules?; // β Only high-quality rules
// PHASE 2: Load filtered rules into engine and execute
let mut engine = new;
engine.load_rules?; // β Loads only the filtered rules from Phase 1
// Execute in real-time
let facts = facts_from_cart;
let result = engine.execute?;
if let Some = result.get
Key Point: Mining criteria are applied during mine_association_rules(), not during execution. The engine only executes pre-filtered high-quality rules.
Flexible GRL Export for Any Domain
No more hardcoded field names! Configure for any use case:
use ;
// E-commerce
let ecommerce_config = custom;
let grl = to_grl_with_config;
// Fraud detection
let fraud_config = custom;
let grl = to_grl_with_config;
// Security
let security_config = custom;
let grl = to_grl_with_config;
See examples/flexible_domain_mining.rs for complete examples across multiple domains.
Generated GRL (rust-rule-engine v1.15.0+ with += operator):
// Auto-generated rules from pattern mining
// Generated: 2026-01-03 14:00:00 UTC
// Rule #1: Laptop β Mouse
// Confidence: 85.7% | Support: 60.0% | Lift: 1.43
rule "Mined_Laptop_Implies_Mouse" salience 85 no-loop {
when
ShoppingCart.items contains "Laptop" &&
!(Recommendation.items contains "Mouse")
then
Recommendation.items += "Mouse"; // Array append operator (v1.15.0+)
LogMessage("Rule fired: confidence 85.7%");
}
π Algorithms
1. Apriori (Classic)
- Best for: Small to medium datasets (<10k transactions)
- Pros: Simple, easy to understand, breadth-first search
- Cons: Can be slow with many unique items
2. FP-Growth (Recommended)
- Best for: Large datasets (10k+ transactions)
- Pros: Faster than Apriori, no candidate generation
- Cons: More complex, uses more memory
3. Sequential Pattern Mining
- Best for: Time-ordered event sequences
- Features: Supports time windows, gap constraints
π― Quality Metrics
Each discovered rule includes:
- Confidence: P(B|A) - How often B happens when A happens
- Support: P(A β§ B) - How common the pattern is overall
- Lift: Confidence / P(B) - Correlation strength (>1: positive, <1: negative)
- Conviction: How much more often A implies B than expected by chance
π Performance
Benchmarks with default config (min_support=0.05, min_confidence=0.6):
| Dataset Size | Algorithm | Time | Memory | Throughput |
|---|---|---|---|---|
| 100 transactions | Apriori | ~10-20ms | ~5 MB | 5-10K tx/s |
| 1,000 transactions | Apriori | ~100-200ms | ~10-15 MB | 5-10K tx/s |
| 10,000 transactions | Apriori | ~1-2s | ~30-50 MB | 5-10K tx/s |
| 100,000 transactions | Apriori | ~10-20s | ~200-500 MB | 5-10K tx/s |
Notes:
- Performance varies with min_support threshold (lower = slower)
- Memory usage depends on number of unique items and patterns
- excelstream provides constant ~3-35 MB memory during data loading
- See docs/PERFORMANCE.md for detailed benchmarks
π Integration with rust-rule-engine
This crate is designed to work seamlessly with rust-rule-engine v1.15.0+:
- Mine rules from historical data (this crate)
- Export to GRL format with
+=array append operator - Execute rules with RETE algorithm (rust-rule-engine)
- Explain decisions with backward chaining (rust-rule-engine)
Requirements: rust-rule-engine v1.15.0 or higher (for += operator support)
[]
= "1.15.0" # Required for += array append in GRL
π Examples
See examples/ directory:
Basic Examples:
01_simple_ecommerce.rs- Simple e-commerce with engine execution02_medium_complexity.rs- Medium complexity patterns with RETE03_advanced_large_dataset.rs- Large-scale mining with statistics04_load_from_excel_csv.rs- Loading data from Excel/CSV with ColumnMappingbasic_mining.rs- Basic association rule mining
Engine Integration:
integration_with_engine.rs- Simple MiningRuleEngine APIintegration_with_rete.rs- High-performance RETE engineflexible_domain_mining.rs- Multi-domain examples (fraud, security, content)
Advanced Features:
postgres_stream_mining.rs- PostgreSQL streaming + mining (requirespostgresfeature)performance_test.rs- Performance benchmarkingcloud_demo.rs- Cloud storage integration (requirescloudfeature)excelstream_demo.rs- Excel streaming examples
πΊοΈ Roadmap
Completed (v0.2.0):
- Apriori algorithm
- Association rule generation
- Quality metrics (confidence, support, lift)
- GRL export with flexible field configuration
- Engine integration (rust-rule-engine)
- PostgreSQL streaming support
- Multi-domain support (e-commerce, fraud, security)
- Excel/CSV data loading
- Column mapping configuration - Select and combine fields from multi-column data
Planned:
- FP-Growth algorithm optimization
- Sequential pattern mining
- Graph pattern matching
- Incremental mining (update rules with new data)
- Multi-level mining (category hierarchies)
- Negative pattern mining
- Real-time streaming with LISTEN/NOTIFY
- Rule versioning and A/B testing
- WebAssembly support
π Documentation
Getting Started:
v0.2.0 Engine Integration:
- Engine Integration Summary - Complete v0.2.0 overview
- Integration Guide - Detailed API guide with examples
- PostgreSQL Streaming - Database integration tutorial
Advanced Topics:
- Integration with Web Frameworks - Actix-Web, Axum, production deployment
- Performance Tuning - Benchmarks and optimization
- Algorithm Details - Technical algorithm documentation
- Advanced Usage
π€ Contributing
Contributions welcome! See CONTRIBUTING.md.
π License
MIT License - see LICENSE file.
π¬ Research & References
- Apriori: Agrawal & Srikant (VLDB 1994) - "Fast Algorithms for Mining Association Rules"
- FP-Growth: Han et al. (SIGMOD 2000) - "Mining Frequent Patterns without Candidate Generation"
- Sequential Patterns: Agrawal & Srikant (ICDE 1995) - "Mining Sequential Patterns"
π Related Projects
- rust-rule-engine - Production rule engine with RETE algorithm
- mlxtend - Python ML library (inspiration)
Built with β€οΈ in Rust π¦