Crate lake_pulse

Crate lake_pulse 

Source
Expand description

§Lake Pulse

A Rust library for analyzing data lake table health across multiple formats and storage providers.

Lake Pulse provides comprehensive health analysis for data lake tables including Delta Lake, Apache Iceberg, Apache Hudi, and Lance. It supports multiple cloud storage providers (AWS S3, Azure Data Lake, GCS) and local filesystems.

§Features

  • Multi-format support: Delta Lake, Apache Iceberg, Apache Hudi, Lance
  • Cloud storage: AWS S3, Azure Data Lake Storage, Google Cloud Storage, Local filesystem
  • Health metrics: File size distribution, partition analysis, data skew detection
  • Advanced analysis: Schema evolution, time travel metrics, deletion vectors, compaction opportunities
  • Performance tracking: Built-in timing metrics with Gantt chart visualization

§Quick Start

§Local Filesystem Example

use lake_pulse::{Analyzer, StorageConfig};

// Configure storage for local filesystem
let config = StorageConfig::local()
    .with_option("path", "./examples/data");

// Create analyzer
let analyzer = Analyzer::builder(config)
    .build()
    .await?;

// Analyze a table (auto-detects format: Delta, Iceberg, Hudi, or Lance)
let report = analyzer.analyze("delta_dataset").await?;

// Print the health report
println!("{}", report);

§AWS S3 Example

use lake_pulse::{Analyzer, StorageConfig};

let config = StorageConfig::aws()
    .with_option("bucket", "my-bucket")
    .with_option("region", "us-east-1")
    .with_option("access_key_id", "ACCESS_KEY")
    .with_option("secret_access_key", "SECRET_KEY");

let analyzer = Analyzer::builder(config).build().await?;
let report = analyzer.analyze("my/table/path").await?;
println!("{}", report);

§Azure Data Lake Example

use lake_pulse::{Analyzer, StorageConfig};

let config = StorageConfig::azure()
    .with_option("container", "my-container")
    .with_option("account_name", "my-account")
    .with_option("tenant_id", "TENANT_ID")
    .with_option("client_id", "CLIENT_ID")
    .with_option("client_secret", "CLIENT_SECRET");

let analyzer = Analyzer::builder(config).build().await?;
let report = analyzer.analyze("my/table/path").await?;
println!("{}", report);

For more examples, see the examples/ directory.

§Modules

  • analyze - Core analysis functionality and table analyzers
  • storage - Cloud storage abstraction layer
  • reader - Table format readers (Delta, Iceberg, Hudi, Lance)
  • util - Utility functions and helpers

Re-exports§

pub use analyze::metrics::HealthMetrics;
pub use analyze::metrics::HealthReport;
pub use analyze::Analyzer;
pub use storage::StorageConfig;

Modules§

analyze
Table analysis functionality
reader
Table format readers
storage
Cloud storage abstraction layer
util
Utility functions and helpers