lawkit
π Multi-law statistical analysis toolkit - Uncover hidden patterns and continuously detect anomalies automatically
English README | ζ₯ζ¬θͺη README | δΈζη README
Why lawkit?
Traditional tools analyze one pattern at a time. lawkit analyzes multiple statistical laws together to give you the complete picture. It automatically detects conflicts, runs faster with parallel processing, and provides clear insights.
Designed for modern automation with JSON, CSV, and other structured outputs that work perfectly with AI tools and automated workflows. Ideal for fraud detection, data quality checks, and business intelligence.
# Single law analysis - Benford Law fraud detection with visual charts
)
)
)
)
)
)
)
)
)
)
# Pareto Analysis with Lorenz curve visualization
)
)
)
# Multi-law integration analysis
)
β¨ Key Features
- π― Multi-Law Analysis: Benford, Pareto, Zipf, Normal, Poisson distributions with smart integration
- π Visual Charts: ASCII bar charts showing digit distributions, Lorenz curves, probability plots, and histograms
- π International Support: Parse numbers in 5 languages (EN, JP, CN, HI, AR) with rich output formats
- π Advanced Analytics: Time series analysis, outlier detection (LOF, Isolation Forest, DBSCAN), meta-chaining
- β‘ High Performance: Rust-powered parallel processing optimized for large datasets
π Performance
Real benchmark results on AMD Ryzen 5 PRO 4650U:
# Traditional tools analyze one pattern at a time
ποΈ How It Works
Core Analysis Engine
graph TB
A[π Input Data<br/>CSV, JSON, Excel, PDF...] --> B[π Parse & Validate<br/>5 Language Support]
B --> C1[π΅οΈ Benford Law<br/>Fraud Detection]
B --> C2[π Pareto Analysis<br/>80/20 Rule]
B --> C3[π€ Zipf Law<br/>Frequency Analysis]
B --> C4[π Normal Distribution<br/>Quality Control]
B --> C5[β‘ Poisson Distribution<br/>Rare Events]
C1 --> D1[π Statistical Scores]
C2 --> D2[π Gini Coefficient]
C3 --> D3[π Correlation Analysis]
C4 --> D4[π Normality Tests]
C5 --> D5[π Event Modeling]
D1 --> E[π§ Integration Engine<br/>Conflict Detection]
D2 --> E
D3 --> E
D4 --> E
D5 --> E
E --> F1[β οΈ Risk Assessment<br/>Critical/High/Medium/Low]
E --> F2[π― Smart Recommendations<br/>Primary/Secondary Laws]
E --> F3[π Advanced Outliers<br/>LOF, Isolation Forest, DBSCAN]
E --> F4[π Time Series Analysis<br/>Trends, Seasonality, Anomalies]
F1 --> G[π Comprehensive Report<br/>lawkit/JSON/CSV/YAML/XML]
F2 --> G
F3 --> G
F4 --> G
Three-Stage Analysis Workflow
graph LR
subgraph "Stage 1: Basic Analysis"
A[π lawkit analyze<br/>Multi-law Integration] --> A1[Overall Quality Score<br/>Law Compatibility<br/>Initial Insights]
end
subgraph "Stage 2: Validation"
A1 --> B[π lawkit validate<br/>Data Quality Checks]
B --> B1[Consistency Analysis<br/>Cross-validation<br/>Reliability Assessment]
end
subgraph "Stage 3: Deep Diagnosis"
B1 --> C[π©Ί lawkit diagnose<br/>Conflict Detection]
C --> C1[Detailed Root Cause<br/>Resolution Strategies<br/>Risk Assessment]
end
style A stroke:#2196f3,stroke-width:2px
style B stroke:#9c27b0,stroke-width:2px
style C stroke:#ff9800,stroke-width:2px
analyze β validate β diagnose: Start with a broad overview, then check data quality, and finally investigate any specific problems.
lawkit looks at your data from multiple angles at once, then combines what it finds to give you clear insights and practical recommendations.
Specification
Supported Statistical Laws
π΅οΈ Benford Law - Fraud Detection
The first digit of naturally occurring numbers follows a specific distribution (1 appears ~30%, 2 appears ~18%, etc.). Deviations often indicate data manipulation, making it invaluable for:
- Financial auditing: Detecting manipulated accounting records
- Election monitoring: Identifying vote count irregularities
- Scientific data validation: Spotting fabricated research data
- Tax fraud detection: Finding altered income/expense reports
π Pareto Analysis - 80/20 Principle
The famous "80/20 rule" where 80% of effects come from 20% of causes. Essential for:
- Business optimization: Identifying top customers, products, or revenue sources
- Resource allocation: Focusing effort on high-impact areas
- Quality management: Finding the few defects causing most problems
- Wealth distribution analysis: Understanding economic inequality patterns
π€ Zipf Law - Frequency Power Laws
Word frequencies follow a predictable pattern where the nth most common word appears 1/n as often as the most common word. Useful for:
- Content analysis: Analyzing text patterns and authenticity
- Market research: Understanding brand mention distributions
- Language processing: Detecting artificial or generated text
- Social media analysis: Identifying unusual posting patterns
π Normal Distribution - Statistical Foundation
The bell-curve distribution that appears throughout nature and human behavior. Critical for:
- Quality control: Detecting manufacturing defects and process variations
- Performance analysis: Evaluating test scores, measurements, and metrics
- Risk assessment: Understanding natural variation vs. anomalies
- Process improvement: Establishing control limits and specifications
β‘ Poisson Distribution - Rare Event Modeling
Models the probability of rare events occurring in fixed time/space intervals. Essential for:
- System reliability: Predicting failure rates and maintenance needs
- Customer service: Modeling call center traffic and wait times
- Network analysis: Understanding packet loss and connection patterns
- Healthcare monitoring: Tracking disease outbreaks and incident rates
Types of Analysis
- Single law analysis
- Multi-law comparison and integration
- Advanced outlier detection (LOF, Isolation Forest, DBSCAN)
- Time series analysis and trend detection
- Data generation for testing and validation
Output Formats
lawkit outputs results in multiple formats for different use cases:
- lawkit Format (Default): Human-readable analysis results
- JSON/CSV/YAML/TOML/XML: Machine-readable structured formats for automation, integration, and data processing
Installation
CLI Tool
# From crates.io (recommended)
# From releases
Rust Library
# In your Cargo.toml
[]
= "2.1"
use analyze_benford;
use parse_text_input;
let numbers = parse_text_input?;
let result = analyze_benford?;
println!;
Package Integrations
# Node.js integration
# Python integration
Basic Usage
Single Law Analysis with Visual Charts
# Benford Law - Fraud detection with digit distribution chart
)
)
)
)
)
)
)
)
)
# Pareto Analysis - 80/20 Rule with Lorenz curve visualization
)
)
)
# Normal Distribution - Quality control with histogram
# Zipf Law - Rank-frequency distribution with power law analysis
# 1: ββββββββββββββββββββββββββββββββββββββββββββββββββ 1.74% (expected: 1.74%)
# 2: ββββββββββββββββββββββββββββββββββββββββββββββββββ 1.22% (expected: 0.87%)
# 3: ββββββββββββββββββββββββββββββββββββββββββββββββββ 1.04% (expected: 0.58%)
# 4: ββββββββββββββββββββββββββββββββββββββββββββββββββ 0.87% (expected: 0.43%)
# 5: ββββββββββββββββββββββββββββββββββββββββββββββββββ 0.87% (expected: 0.35%)
# 6: ββββββββββββββββββββββββββββββββββββββββββββββββββ 0.70% (expected: 0.29%)
# 7: ββββββββββββββββββββββββββββββββββββββββββββββββββ 0.70% (expected: 0.25%)
# 8: ββββββββββββββββββββββββββββββββββββββββββββββββββ 0.70% (expected: 0.22%)
# 9: ββββββββββββββββββββββββββββββββββββββββββββββββββ 0.70% (expected: 0.19%)
#10: ββββββββββββββββββββββββββββββββββββββββββββββββββ 0.70% (expected: 0.17%)
)
# Poisson Distribution - Rare events with probability chart
)
)
)
)
)
)=0.103, )=0.234, P(Xβ₯2)=0.662
Ξ»=2.27, Variance/Mean=0.774 ()
Three-Stage Analysis Workflow
We recommend the analyze β validate β diagnose approach for thorough data analysis:
# Stage 1: Basic multi-law analysis
)
# Stage 2: Data validation with consistency checks
)
)
)
)
)
)
)
)
# Stage 3: Deep conflict analysis and recommendations
)
)
)
))
;
)
)
;
)
)
Advanced Usage
# Generate test data
# Built-in time series analysis
# Returns: trend analysis, seasonality detection, changepoints, forecasts
# Advanced filtering and analysis
# Pipeline usage
|
|
# Meta-chaining with diffx for time series analysis
# Continuous monitoring pipeline
for; do
done
π Meta-Chaining: Tracking Long-Term Pattern Evolution
Meta-chaining combines lawkit's built-in time series analysis with diffx for long-term pattern tracking:
graph LR
A[Jan Data] -->|lawkit| B[Jan Analysis]
C[Feb Data] -->|lawkit| D[Feb Analysis]
E[Mar Data] -->|lawkit| F[Mar Analysis]
B -->|diffx| G[Period Differences<br/>JanβFeb]
D -->|diffx| G
D -->|diffx| H[Period Differences<br/>FebβMar]
F -->|diffx| H
G -->|long-term trend| I[Pattern<br/>Evolution]
H -->|long-term trend| I
style I stroke:#0288d1,stroke-width:3px
Built-in Time Series Analysis (single dataset):
- Trend detection with R-squared analysis
- Automatic seasonality detection and decomposition
- Changepoint identification (level, trend, variance shifts)
- Forecasting with confidence intervals
- Anomaly detection and data quality assessment
Meta-chaining with diffx (multiple time periods):
- Period Differences: Changes in statistical results between adjacent periods (e.g., JanβFeb changes)
- Pattern Evolution: Long-term statistical pattern development trends (e.g., year-long changes)
- Gradual drift in Benford compliance (potential fraud buildup)
- Cross-period anomaly comparison
- Historical pattern baseline establishment
Documentation
For comprehensive guides, examples, and API documentation:
π User Guide - Installation, usage, and examples
π§ CLI Reference - Complete command documentation
π Statistical Laws Guide - Detailed analysis examples
β‘ Performance Guide - Optimization and large datasets
π International Support - Multi-language number parsing
Contributing
We welcome contributions! Please see our Contributing Guide for details.
License
This project is licensed under the MIT License - see the LICENSE for details.