lawkit
π Multi-law statistical analysis toolkit - Uncover hidden patterns and continuously detect anomalies automatically
English README | ζ₯ζ¬θͺη README | δΈζη README

Why lawkit?
Traditional tools analyze one pattern at a time. lawkit analyzes multiple statistical laws together to give you the complete picture. It automatically detects conflicts, runs faster with parallel processing, and provides clear insights.
Designed for modern automation with JSON, CSV, and other structured outputs that work perfectly with AI tools and automated workflows. Ideal for fraud detection, data quality checks, and business intelligence.
$ lawkit benf financial_data.csv
Benford Law Analysis Results
Dataset: financial_data.csv
Numbers analyzed: 2500
Risk Level: Low [LOW]
First Digit Distribution:
1: ββββββββββββββββββββββββββββββββββββββββββββ 30.1% (expected: 30.1%)
2: ββββββββββββββββββββββββββββββββββββββββββββ 17.6% (expected: 17.6%)
3: ββββββββββββββββββββββββββββββββββββββββββββ 12.5% (expected: 12.5%)
4: ββββββββββββββββββββββββββββββββββββββββββββ 9.7% (expected: 9.7%)
5: ββββββββββββββββββββββββββββββββββββββββββββ 7.9% (expected: 7.9%)
6: ββββββββββββββββββββββββββββββββββββββββββββ 6.7% (expected: 6.7%)
7: ββββββββββββββββββββββββββββββββββββββββββββ 5.8% (expected: 5.8%)
8: ββββββββββββββββββββββββββββββββββββββββββββ 5.1% (expected: 5.1%)
9: ββββββββββββββββββββββββββββββββββββββββββββ 4.6% (expected: 4.6%)
Statistical Tests:
Chi-square: 1.34 (p-value: 0.995)
Mean Absolute Deviation: 0.8%
$ lawkit pareto sales_data.csv
Pareto Principle (80/20 Rule) Analysis Results
Dataset: sales_data.csv
Numbers analyzed: 1000
[LOW] Dataset analysis
Lorenz Curve (Cumulative Distribution):
10%: ββββββββββββββββββββββββββββββββββββββββββββββ 5.2% cumulative
20%: ββββββββββββββββββββββββββββββββββββββββββββββ 20.1% cumulative
30%: ββββββββββββββββββββββββββββββββββββββββββββββ 35.4% cumulative
40%: ββββββββββββββββββββββββββββββββββββββββββββββ 48.9% cumulative
50%: ββββββββββββββββββββββββββββββββββββββββββββββ 61.7% cumulative
80/20 Rule: Top 20% owns 79.2% of total wealth (Ideal: 80.0%, Ratio: 0.99)
$ lawkit analyze --laws all data.csv
Statistical Laws Integration Analysis
Dataset: data.csv
Numbers Analyzed: 1000
Laws Executed: 5 (benf, pareto, zipf, normal, poisson)
Integration Metrics:
Overall Quality Score: 0.743
Consistency Score: 0.823
Conflicts Detected: 2
Recommendation Confidence: 0.892
β¨ Key Features
- π― Multi-Law Analysis: Benford, Pareto, Zipf, Normal, Poisson distributions with smart integration
- π Visual Charts: ASCII bar charts showing digit distributions, Lorenz curves, probability plots, and histograms
- π International Support: Parse numbers in 5 languages (EN, JP, CN, HI, AR) with rich output formats
- π Advanced Analytics: Time series analysis, outlier detection (LOF, Isolation Forest, DBSCAN), meta-chaining
- β‘ High Performance: Rust-powered parallel processing optimized for large datasets
π Performance
Real benchmark results on AMD Ryzen 5 PRO 4650U:
$ other-tool data.csv $ lawkit benf data.csv $ lawkit analyze data.csv
ποΈ How It Works
Core Analysis Engine
graph TB
A[π Input Data<br/>CSV, JSON, Excel, PDF...] --> B[π Parse & Validate<br/>5 Language Support]
B --> C1[π΅οΈ Benford Law<br/>Fraud Detection]
B --> C2[π Pareto Analysis<br/>80/20 Rule]
B --> C3[π€ Zipf Law<br/>Frequency Analysis]
B --> C4[π Normal Distribution<br/>Quality Control]
B --> C5[β‘ Poisson Distribution<br/>Rare Events]
C1 --> D1[π Statistical Scores]
C2 --> D2[π Gini Coefficient]
C3 --> D3[π Correlation Analysis]
C4 --> D4[π Normality Tests]
C5 --> D5[π Event Modeling]
D1 --> E[π§ Integration Engine<br/>Conflict Detection]
D2 --> E
D3 --> E
D4 --> E
D5 --> E
E --> F1[β οΈ Risk Assessment<br/>Critical/High/Medium/Low]
E --> F2[π― Smart Recommendations<br/>Primary/Secondary Laws]
E --> F3[π Advanced Outliers<br/>LOF, Isolation Forest, DBSCAN]
E --> F4[π Time Series Analysis<br/>Trends, Seasonality, Anomalies]
F1 --> G[π Comprehensive Report<br/>lawkit/JSON/CSV/YAML/XML]
F2 --> G
F3 --> G
F4 --> G
Three-Stage Analysis Workflow
graph LR
subgraph "Stage 1: Basic Analysis"
A[π lawkit analyze<br/>Multi-law Integration] --> A1[Overall Quality Score<br/>Law Compatibility<br/>Initial Insights]
end
subgraph "Stage 2: Validation"
A1 --> B[π lawkit validate<br/>Data Quality Checks]
B --> B1[Consistency Analysis<br/>Cross-validation<br/>Reliability Assessment]
end
subgraph "Stage 3: Deep Diagnosis"
B1 --> C[π©Ί lawkit diagnose<br/>Conflict Detection]
C --> C1[Detailed Root Cause<br/>Resolution Strategies<br/>Risk Assessment]
end
style A stroke:#2196f3,stroke-width:2px
style B stroke:#9c27b0,stroke-width:2px
style C stroke:#ff9800,stroke-width:2px
analyze β validate β diagnose: Start with a broad overview, then check data quality, and finally investigate any specific problems.
lawkit looks at your data from multiple angles at once, then combines what it finds to give you clear insights and practical recommendations.
Specification
Supported Statistical Laws
π΅οΈ Benford Law - Fraud Detection
The first digit of naturally occurring numbers follows a specific distribution (1 appears ~30%, 2 appears ~18%, etc.). Deviations often indicate data manipulation, making it invaluable for:
- Financial auditing: Detecting manipulated accounting records
- Election monitoring: Identifying vote count irregularities
- Scientific data validation: Spotting fabricated research data
- Tax fraud detection: Finding altered income/expense reports
π Pareto Analysis - 80/20 Principle
The famous "80/20 rule" where 80% of effects come from 20% of causes. Essential for:
- Business optimization: Identifying top customers, products, or revenue sources
- Resource allocation: Focusing effort on high-impact areas
- Quality management: Finding the few defects causing most problems
- Wealth distribution analysis: Understanding economic inequality patterns
π€ Zipf Law - Frequency Power Laws
Word frequencies follow a predictable pattern where the nth most common word appears 1/n as often as the most common word. Useful for:
- Content analysis: Analyzing text patterns and authenticity
- Market research: Understanding brand mention distributions
- Language processing: Detecting artificial or generated text
- Social media analysis: Identifying unusual posting patterns
π Normal Distribution - Statistical Foundation
The bell-curve distribution that appears throughout nature and human behavior. Critical for:
- Quality control: Detecting manufacturing defects and process variations
- Performance analysis: Evaluating test scores, measurements, and metrics
- Risk assessment: Understanding natural variation vs. anomalies
- Process improvement: Establishing control limits and specifications
β‘ Poisson Distribution - Rare Event Modeling
Models the probability of rare events occurring in fixed time/space intervals. Essential for:
- System reliability: Predicting failure rates and maintenance needs
- Customer service: Modeling call center traffic and wait times
- Network analysis: Understanding packet loss and connection patterns
- Healthcare monitoring: Tracking disease outbreaks and incident rates
Types of Analysis
- Single law analysis
- Multi-law comparison and integration
- Advanced outlier detection (LOF, Isolation Forest, DBSCAN)
- Time series analysis and trend detection
- Data generation for testing and validation
Output Formats
lawkit outputs results in multiple formats for different use cases:
- lawkit Format (Default): Human-readable analysis results
- JSON/CSV/YAML/TOML/XML: Machine-readable structured formats for automation, integration, and data processing
Installation
CLI Tool
cargo install lawkit
wget https://github.com/kako-jun/lawkit/releases/latest/download/lawkit-linux-x86_64.tar.gz
tar -xzf lawkit-linux-x86_64.tar.gz
Rust Library
[dependencies]
lawkit-core = "2.1"
use lawkit_core::laws::benford::analyze_benford;
use lawkit_core::common::input::parse_text_input;
let numbers = parse_text_input("123 456 789")?;
let result = analyze_benford(&numbers, "data.txt", false)?;
println!("Chi-square: {}", result.chi_square);
Package Integrations
npm install lawkit-js
pip install lawkit-python
lawkit-download-binary
Basic Usage
Single Law Analysis with Visual Charts
$ lawkit benf financial_data.csv
First Digit Distribution:
1: ββββββββββββββββββββββββββββββββββββββββ 20.0% (expected: 30.1%)
2: ββββββββββββββββββββββββββββββββββββββββ 11.4% (expected: 17.6%)
3: ββββββββββββββββββββββββββββββββββββββββ 5.7% (expected: 12.5%)
4: ββββββββββββββββββββββββββββββββββββββββ 14.3% (expected: 9.7%)
5: ββββββββββββββββββββββββββββββββββββββββ 17.1% (expected: 7.9%)
6: ββββββββββββββββββββββββββββββββββββββββ 2.9% (expected: 6.7%)
7: ββββββββββββββββββββββββββββββββββββββββ 14.3% (expected: 5.8%)
8: ββββββββββββββββββββββββββββββββββββββββ 11.4% (expected: 5.1%)
9: ββββββββββββββββββββββββββββββββββββββββ 2.9% (expected: 4.6%)
$ lawkit pareto sales_data.csv
Lorenz Curve (Cumulative Distribution):
7%: ββββββββββββββββββββββββββββββββββββββββββββββββββ 26.0% cumulative
13%: ββββββββββββββββββββββββββββββββββββββββββββββββββ 46.9% cumulative
27%: ββββββββββββββββββββββββββββββββββββββββββββββββββ 72.9% cumulative
33%: ββββββββββββββββββββββββββββββββββββββββββββββββββ 80.7% cumulative
47%: ββββββββββββββββββββββββββββββββββββββββββββββββββ 89.8% cumulative
80/20 Rule: Top 20% owns 62.5% of total wealth (Ideal: 80.0%, Ratio: 0.78)
$ lawkit normal measurements.csv
Distribution Histogram:
71.36- 76.99: ββββββββββββββββββββββββββββββββββββββββββββββββββ 2.7%
76.99- 82.61: ββββββββββββββββββββββββββββββββββββββββββββββββββ 11.5%
82.61- 88.24: ββββββββββββββββββββββββββββββββββββββββββββββββββ 34.0%
88.24- 93.87: ββββββββββββββββββββββββββββββββββββββββββββββββββ 69.8%
93.87- 99.50: ββββββββββββββββββββββββββββββββββββββββββββββββββ 100.0%
Distribution: ΞΌ=99.50, Ο=9.38, Range: [71.36, 127.64]
$ lawkit poisson event_counts.csv
Probability Distribution:
P(X= 0): ββββββββββββββββββββββββββββββββββββββββββββββββββ 0.180
P(X= 1): ββββββββββββββββββββββββββββββββββββββββββββββββββ 0.309
P(X= 2): ββββββββββββββββββββββββββββββββββββββββββββββββββ 0.265
P(X= 3): ββββββββββββββββββββββββββββββββββββββββββββββββββ 0.151
P(X= 4): ββββββββββββββββββββββββββββββββββββββββββββββββββ 0.065
Key Probabilities: P(X=0)=0.180, P(X=1)=0.309, P(Xβ₯2)=0.511
Three-Stage Analysis Workflow
We recommend the analyze β validate β diagnose approach for thorough data analysis:
$ lawkit analyze --laws all data.csv
Statistical Laws Integration Analysis
Dataset: data.csv
Numbers analyzed: 1000
Laws executed: 5 (benford, pareto, zipf, normal, poisson)
Integration Metrics:
Overall Quality: 0.743
Consistency: 0.823
Conflicts Detected: 2
Recommendation Confidence: 0.892
Law Results:
Benford Law: 0.652
Pareto Principle: 0.845
Zipf Law: 0.423
Normal Distribution: 0.912
Poisson Distribution: 0.634
Conflicts:
[CONFLICT] Benford Law score 0.652 significantly deviates from expected 0.500 - deviation 30.4%
Likely Cause: Different distribution assumptions
Suggestion: Focus on Zipf analysis for frequency data
Risk Assessment: [MEDIUM]
$ lawkit validate --laws benf,pareto,normal transactions.csv --consistency-check
Data Validation and Consistency Analysis
Dataset: transactions.csv
Numbers analyzed: 2500
Laws validated: 3 (benford, pareto, normal)
Validation Results:
Data Quality Score: 0.891
Cross-validation Consistency: 0.943
Statistical Reliability: HIGH
Individual Law Validation:
[PASS] Benford Law validation (Score: 0.834, p-value: 0.023)
[PASS] Pareto Principle validation (Gini: 0.78, Alpha: 2.12)
[WARNING] Normal Distribution validation (Shapiro-Wilk: 0.032)
Consistency Analysis:
Benford-Pareto Agreement: 0.912 (HIGH)
Benford-Normal Agreement: 0.643 (MEDIUM)
Pareto-Normal Agreement: 0.587 (MEDIUM)
Data Quality Assessment: RELIABLE (Validation Score: 0.891)
$ lawkit diagnose --laws all suspicious_data.csv --report detailed
Detailed Conflict Detection and Diagnostic Report
Dataset: suspicious_data.csv
Numbers analyzed: 1500
Laws analyzed: 5 (benford, pareto, zipf, normal, poisson)
[CONFLICT] 3 Critical Issues Detected
Critical Conflict Laws: Benford Law vs Normal Distribution
Conflict Score: 0.847 (HIGH)
Description: Benford Law and Normal Distribution show significantly different
evaluations (difference: 0.623) with structural differences in:
confidence_level ("high" β "low"), score_category ("good" β "poor")
Root Cause: Benford Law indicates potential data manipulation while Normal
suggests legitimate natural distribution pattern
Resolution: Investigate data source integrity; consider temporal analysis
to identify manipulation periods
Critical Conflict Laws: Pareto Principle vs Poisson Distribution
Conflict Score: 0.793 (HIGH)
Description: Power law distribution conflicts with discrete event modeling
Root Cause: Data contains mixed patterns (continuous wealth distribution
and discrete event counts)
Resolution: Segment data by type before analysis; apply Pareto Principle to amounts,
Poisson Distribution to frequencies
Critical Conflict Laws: Zipf Law vs Normal Distribution
Conflict Score: 0.651 (MEDIUM)
Description: Frequency-based analysis conflicts with continuous distribution
Root Cause: Dataset may contain both textual frequency data and numerical measurements
Resolution: Separate frequency analysis from statistical distribution testing
Risk Assessment: [CRITICAL] (Multiple fundamental conflicts detected)
Recommendation: Manual data review required before automated decision-making
Advanced Usage
lawkit generate pareto --samples 1000 > test_data.txt
lawkit generate normal --mean 100 --stddev 15 --samples 500
lawkit normal monthly_sales.csv --enable-timeseries --timeseries-window 12
lawkit analyze --laws all --filter ">=1000" financial_data.xlsx
lawkit benf sales_data.csv --format xml
cat raw_numbers.txt | lawkit benf -
lawkit generate zipf --samples 10000 | lawkit analyze --laws all -
lawkit benf sales_2023.csv > analysis_2023.txt
lawkit benf sales_2024.csv > analysis_2024.txt
diffx analysis_2023.txt analysis_2024.txt
for month in {01..12}; do
lawkit analyze --laws all sales_2024_${month}.csv > analysis_${month}.txt
done
diffx analysis_*.txt --chain
π Meta-Chaining: Tracking Long-Term Pattern Evolution
Meta-chaining combines lawkit's built-in time series analysis with diffx for long-term pattern tracking:
graph LR
A[Jan Data] -->|lawkit| B[Jan Analysis]
C[Feb Data] -->|lawkit| D[Feb Analysis]
E[Mar Data] -->|lawkit| F[Mar Analysis]
B -->|diffx| G[Period Differences<br/>JanβFeb]
D -->|diffx| G
D -->|diffx| H[Period Differences<br/>FebβMar]
F -->|diffx| H
G -->|long-term trend| I[Pattern<br/>Evolution]
H -->|long-term trend| I
style I stroke:#0288d1,stroke-width:3px
Built-in Time Series Analysis (single dataset):
- Trend detection with R-squared analysis
- Automatic seasonality detection and decomposition
- Changepoint identification (level, trend, variance shifts)
- Forecasting with confidence intervals
- Anomaly detection and data quality assessment
Meta-chaining with diffx (multiple time periods):
- Period Differences: Changes in statistical results between adjacent periods (e.g., JanβFeb changes)
- Pattern Evolution: Long-term statistical pattern development trends (e.g., year-long changes)
- Gradual drift in Benford compliance (potential fraud buildup)
- Cross-period anomaly comparison
- Historical pattern baseline establishment
Documentation
For comprehensive guides, examples, and API documentation:
π User Guide - Installation, usage, and examples
π§ CLI Reference - Complete command documentation
π Statistical Laws Guide - Detailed analysis examples
β‘ Performance Guide - Optimization and large datasets
π International Support - Multi-language number parsing
Contributing
We welcome contributions! Please see our Contributing Guide for details.
License
This project is licensed under the MIT License - see the LICENSE for details.