Skip to main content

Crate u_insight

Crate u_insight 

Source
Expand description

§u-insight

Statistical analysis and data profiling engine with C FFI bindings.

u-insight transforms raw tabular data into actionable statistical insights. It operates in two distinct layers:

  • Profiling — tolerates dirty data, reports data quality and statistics
  • Analysis — requires clean data, discovers patterns and relationships

§Modules

  • dataframe — Column-major tabular data model (DataFrame, Column, DataType)
  • csv_parser — CSV parsing with automatic type inference
  • json_parser — JSON parsing with automatic type inference
  • profiling — Column-level and dataset-level data profiling
  • analysis — Correlation (Pearson/Spearman), regression (simple/multiple OLS), Cramér’s V
  • clustering — K-Means++ (auto-K, Gap Statistic), Mini-Batch K-Means, DBSCAN, Hierarchical (4 linkages), HDBSCAN
  • distribution — ECDF, histogram, QQ-plot, normality tests (KS, JB, Shapiro-Wilk, Anderson-Darling), Grubbs, distribution fitting
  • pca — Principal Component Analysis (dimensionality reduction)
  • isolation_forest — Isolation Forest anomaly detection (Liu et al. 2008)
  • lof — Local Outlier Factor (LOF) density-based anomaly detection
  • mahalanobis — Mahalanobis distance multivariate outlier detection
  • feature_importance — Composite importance, ANOVA F-test, Mutual Information, Permutation Importance
  • ffi — C FFI bindings (32 functions, 20 structs, auto-generated C header via cbindgen)
  • error — Error types

§Quick Start

use u_insight::csv_parser::CsvParser;
use u_insight::dataframe::DataType;

let csv = "name,value,active\nAlice,1.5,true\nBob,2.3,false\nCharlie,3.1,true\n";
let df = CsvParser::new().parse_str(csv).unwrap();

assert_eq!(df.row_count(), 3);
assert_eq!(df.column_count(), 3);

// Type inference: name=Text, value=Numeric, active=Boolean
let schema = df.schema();
assert_eq!(schema[1].1, DataType::Numeric);
assert_eq!(schema[2].1, DataType::Boolean);

Modules§

analysis
Analysis module for clean, preprocessed data.
clustering
Clustering algorithms: K-Means, DBSCAN, Hierarchical Agglomerative, and HDBSCAN.
csv_parser
CSV parser with automatic type inference.
dataframe
Column-major DataFrame for tabular data.
distribution
Distribution analysis module.
error
Error types for u-insight.
feature_importance
Feature importance analysis and selection.
ffi
C FFI bindings for u-insight.
isolation_forest
Isolation Forest for multivariate anomaly detection.
json_parser
JSON parser with automatic type inference.
lof
Local Outlier Factor (LOF) for density-based anomaly detection.
mahalanobis
Mahalanobis distance for multivariate outlier detection.
pca
Principal Component Analysis (PCA).
profiling
Column-level and dataset-level data profiling.