Expand description
§Lake Pulse
A Rust library for analyzing data lake table health across multiple formats and storage providers.
Lake Pulse provides comprehensive health analysis for data lake tables including Delta Lake, Apache Iceberg, Apache Hudi, and Lance. It supports multiple cloud storage providers (AWS S3, Azure Data Lake, GCS) and local filesystems.
§Features
- Multi-format support: Delta Lake, Apache Iceberg, Apache Hudi, Lance
- Cloud storage: AWS S3, Azure Data Lake Storage, Google Cloud Storage, Local filesystem
- Health metrics: File size distribution, partition analysis, data skew detection
- Advanced analysis: Schema evolution, time travel metrics, deletion vectors, compaction opportunities
- Performance tracking: Built-in timing metrics with Gantt chart visualization
§Quick Start
§Local Filesystem Example
use lake_pulse::{Analyzer, StorageConfig};
// Configure storage for local filesystem
let config = StorageConfig::local()
.with_option("path", "./examples/data");
// Create analyzer
let analyzer = Analyzer::builder(config)
.build()
.await?;
// Analyze a table (auto-detects format: Delta, Iceberg, Hudi, or Lance)
let report = analyzer.analyze("delta_dataset").await?;
// Print the health report
println!("{}", report);§AWS S3 Example
use lake_pulse::{Analyzer, StorageConfig};
let config = StorageConfig::aws()
.with_option("bucket", "my-bucket")
.with_option("region", "us-east-1")
.with_option("access_key_id", "ACCESS_KEY")
.with_option("secret_access_key", "SECRET_KEY");
let analyzer = Analyzer::builder(config).build().await?;
let report = analyzer.analyze("my/table/path").await?;
println!("{}", report);§Azure Data Lake Example
use lake_pulse::{Analyzer, StorageConfig};
let config = StorageConfig::azure()
.with_option("container", "my-container")
.with_option("account_name", "my-account")
.with_option("tenant_id", "TENANT_ID")
.with_option("client_id", "CLIENT_ID")
.with_option("client_secret", "CLIENT_SECRET");
let analyzer = Analyzer::builder(config).build().await?;
let report = analyzer.analyze("my/table/path").await?;
println!("{}", report);For more examples, see the examples/ directory.
§Modules
Re-exports§
pub use analyze::metrics::HealthMetrics;pub use analyze::metrics::HealthReport;pub use analyze::Analyzer;pub use storage::StorageConfig;