# Lake Pulse <img src=./rustacean-flat-happy.svg style="width: 40px" />
[![CI][ci-badge]][ci-link]
[![codecov][coverage-badge]][coverage-link]
[![Docs][docs-badge]][docs-link]
[![License: MIT or Apache-2.0][license-badge]][license-link]
[![Latest Version][crates-badge]][crates-link]
[![Downloads][downloads-badge]][crates-link]
[ci-badge]: https://github.com/adobe/lake-pulse/actions/workflows/ci.yml/badge.svg
[ci-link]: https://github.com/adobe/lake-pulse/actions/workflows/ci.yml
[crates-badge]: https://img.shields.io/crates/v/lake-pulse.svg
[crates-link]: https://crates.io/crates/lake-pulse
[downloads-badge]: https://img.shields.io/crates/d/lake-pulse.svg
[coverage-badge]: https://codecov.io/gh/adobe/lake-pulse/graph/badge.svg?token=3mH5uUJ6se
[coverage-link]: https://codecov.io/gh/adobe/lake-pulse
[license-badge]: https://img.shields.io/badge/license-MIT_or_Apache--2.0-blue
[license-link]: ./LICENSE-APACHE
[docs-badge]: https://docs.rs/lake-pulse/badge.svg
[docs-link]: https://docs.rs/lake-pulse

A Rust library for analyzing data lake table health — *checking the pulse* — across multiple formats (Delta Lake, Apache Iceberg, Apache Hudi, Lance) and storage providers (AWS S3, Azure Data Lake, GCS, HDFS, Local).
## Supported Formats
[![Delta Lake][delta-badge]][delta-link]
[![Apache Iceberg][iceberg-badge]][iceberg-link]
[![Apache Hudi][hudi-badge]][hudi-link]
[![Lance][lance-badge]][lance-link]
[delta-badge]: https://img.shields.io/badge/Delta_Lake-00acd3?style=for-the-badge&logo=
[delta-link]: https://github.com/delta-io/delta
[iceberg-badge]: https://img.shields.io/badge/Apache_Iceberg-90d4f0?style=for-the-badge&logo=
[iceberg-link]: https://github.com/apache/iceberg
[hudi-badge]: https://img.shields.io/badge/Apache_Hudi-eeeeee?style=for-the-badge&logo=
[hudi-link]: https://github.com/apache/hudi
[lance-badge]: https://img.shields.io/badge/Lance-ff734a?style=for-the-badge&logo=
[lance-link]: https://github.com/lancedb/lance
<!--
The format badges are created using shields.io based on base64 encoded version of the SVG logo.
To get source SVGs, get the last part of the URL after the `base64,`. Get that encoded string
and decode it using a base64 decoder, and you'll get the SVG representation of the logo.
-->
## Overview
Lake Pulse provides comprehensive health metrics for your data lake tables, including:
- File organization and compaction opportunities
- Metadata analysis and schema evolution
- Partition statistics
- Time travel/snapshot metrics
- Storage efficiency insights
## Quick Start
### Basic Example - Analyzing a Delta Lake table on AWS S3
```rust
use lake_pulse::{Analyzer, StorageConfig};
#[tokio::main]
async fn main() {
let storage_config = StorageConfig::aws()
.with_option("bucket", "my-bucket-1234")
.with_option("region", "us-east-1")
.with_option("access_key_id", "the_access_key_id")
.with_option("secret_access_key", "the_secret_access_key")
.with_option("session_token", "session_token_if_needed");
let analyzer = Analyzer::builder(storage_config).build().await.unwrap();
// Generate report
let report = analyzer.analyze("my/table/path").await.unwrap();
// Print pretty report
println!("{}", report);
}
```
## Supported Table Formats
- **Delta Lake** - Full support for transaction logs, deletion vectors, and Delta-specific metrics
- **Apache Iceberg** - Metadata analysis, snapshot management, and Iceberg-specific features
- **Apache Hudi** - Basic support for Hudi table structure analysis and metrics *(requires `hudi` feature)*
- **Lance** - Modern columnar format with vector search capabilities *(requires `lance` feature)*
## Feature Flags
By default, Lake Pulse includes support for **Delta Lake** and **Apache Iceberg**. Additional table formats can be enabled via feature flags:
| Feature | Description |
|---------|-------------|
| `hudi` | Enables Apache Hudi support |
| `lance` | Enables Lance support |
| `all` | Enables all table formats (`hudi` + `lance`) |
### Usage
```toml
# In your Cargo.toml
# Default (Delta + Iceberg only)
lake-pulse = "0.2"
# With Hudi support
lake-pulse = { version = "0.2", features = ["hudi"] }
# With Lance support
lake-pulse = { version = "0.2", features = ["lance"] }
# With all table formats (Delta + Iceberg + Hudi + Lance)
lake-pulse = { version = "0.2", features = ["all"] }
```
## Storage Configuration
Lake Pulse uses the [`object_store`](https://docs.rs/object_store/) crate for cloud storage access. Configuration options are passed through to the underlying storage provider.
### AWS S3 Configuration Options
Common options for S3 (see [object_store AWS documentation](https://docs.rs/object_store/latest/object_store/aws/enum.AmazonS3ConfigKey.html)):
- `bucket` - S3 bucket name
- `region` - AWS region (e.g., "us-east-1")
- `access_key_id` - AWS access key ID
- `secret_access_key` - AWS secret access key
- `session_token` - Optional session token for temporary credentials
- `endpoint` - Optional custom endpoint URL
### Azure Configuration Options
Common options for Azure (see [object_store Azure documentation](https://docs.rs/object_store/latest/object_store/azure/enum.AzureConfigKey.html)):
- `container` - Azure container name
- `account_name` - Storage account name
- `tenant_id` - Azure tenant ID
- `client_id` - Service principal client ID
- `client_secret` - Service principal client secret
### GCP Configuration Options
Common options for GCP (see [object_store GCP documentation](https://docs.rs/object_store/latest/object_store/gcp/enum.GoogleConfigKey.html)):
- `bucket` - GCS bucket name
- `service_account_key` - Path to service account JSON key file
### HDFS Configuration Options
HDFS support is provided via the [`hdfs-native-object-store`](https://docs.rs/hdfs-native-object-store/) crate:
- `url` - HDFS namenode URL (e.g., "hdfs://namenode:8020")
```rust
let storage_config = StorageConfig::hdfs()
.with_option("url", "hdfs://namenode:8020");
let analyzer = Analyzer::builder(storage_config).build().await.unwrap();
let report = analyzer.analyze("/path/to/table").await.unwrap();
```
### Local Filesystem
```rust
let storage_config = StorageConfig::local();
let analyzer = Analyzer::builder(storage_config).build().await.unwrap();
let report = analyzer.analyze("/path/to/table").await.unwrap();
```
## Examples
See the [`examples/`](examples/) directory for more detailed usage examples:
- `s3_store.rs` - AWS S3 example
- `adl_store.rs` - Azure Data Lake example
- `hdfs_store.rs` - HDFS example
- `local_store.rs` - Local filesystem example
- `local_store_iceberg.rs` - Iceberg table example
- `local_store_hudi.rs` - Hudi table example *(requires `hudi` feature)*
- `local_store_lance.rs` - Lance table example *(requires `lance` feature)*
Run examples with:
```bash
cargo run --example s3_store
# For examples requiring feature flags:
cargo run --features hudi --example local_store_hudi
cargo run --features lance --example local_store_lance
```
## Documentation
For detailed information on configuration options, refer to the `object_store` crate documentation:
- [AWS S3 Configuration](https://docs.rs/object_store/latest/object_store/aws/enum.AmazonS3ConfigKey.html)
- [Azure Configuration](https://docs.rs/object_store/latest/object_store/azure/enum.AzureConfigKey.html)
- [GCP Configuration](https://docs.rs/object_store/latest/object_store/gcp/enum.GoogleConfigKey.html)
- [HDFS Configuration](https://docs.rs/hdfs-native-object-store/latest/hdfs_native_object_store/)
## Supported Storages
See [LAKE_PULSE_SUPPORTED_STORAGES.md](docs/LAKE_PULSE_SUPPORTED_STORAGES.md) for a
comparison of storage providers supported by Lake Pulse.
## Minimum Supported Rust Version (MSRV)
This crate requires Rust **1.88** or later.
## Contributing
Contributions are welcome! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.
## License
See LICENSE files for details.