Expand description
§u-insight
Statistical analysis and data profiling engine with C FFI bindings.
u-insight transforms raw tabular data into actionable statistical insights. It operates in two distinct layers:
- Profiling — tolerates dirty data, reports data quality and statistics
- Analysis — requires clean data, discovers patterns and relationships
§Modules
dataframe— Column-major tabular data model (DataFrame, Column, DataType)csv_parser— CSV parsing with automatic type inferencejson_parser— JSON parsing with automatic type inferenceprofiling— Column-level and dataset-level data profilinganalysis— Correlation (Pearson/Spearman), regression (simple/multiple OLS), Cramér’s Vclustering— K-Means++ (auto-K, Gap Statistic), Mini-Batch K-Means, DBSCAN, Hierarchical (4 linkages), HDBSCANdistribution— ECDF, histogram, QQ-plot, normality tests (KS, JB, Shapiro-Wilk, Anderson-Darling), Grubbs, distribution fittingpca— Principal Component Analysis (dimensionality reduction)isolation_forest— Isolation Forest anomaly detection (Liu et al. 2008)lof— Local Outlier Factor (LOF) density-based anomaly detectionmahalanobis— Mahalanobis distance multivariate outlier detectionfeature_importance— Composite importance, ANOVA F-test, Mutual Information, Permutation Importanceffi— C FFI bindings (32 functions, 20 structs, auto-generated C header via cbindgen)error— Error types
§Quick Start
use u_insight::csv_parser::CsvParser;
use u_insight::dataframe::DataType;
let csv = "name,value,active\nAlice,1.5,true\nBob,2.3,false\nCharlie,3.1,true\n";
let df = CsvParser::new().parse_str(csv).unwrap();
assert_eq!(df.row_count(), 3);
assert_eq!(df.column_count(), 3);
// Type inference: name=Text, value=Numeric, active=Boolean
let schema = df.schema();
assert_eq!(schema[1].1, DataType::Numeric);
assert_eq!(schema[2].1, DataType::Boolean);Modules§
- analysis
- Analysis module for clean, preprocessed data.
- clustering
- Clustering algorithms: K-Means, DBSCAN, Hierarchical Agglomerative, and HDBSCAN.
- csv_
parser - CSV parser with automatic type inference.
- dataframe
- Column-major DataFrame for tabular data.
- distribution
- Distribution analysis module.
- error
- Error types for u-insight.
- feature_
importance - Feature importance analysis and selection.
- ffi
- C FFI bindings for u-insight.
- isolation_
forest - Isolation Forest for multivariate anomaly detection.
- json_
parser - JSON parser with automatic type inference.
- lof
- Local Outlier Factor (LOF) for density-based anomaly detection.
- mahalanobis
- Mahalanobis distance for multivariate outlier detection.
- pca
- Principal Component Analysis (PCA).
- profiling
- Column-level and dataset-level data profiling.