sqlite-vtable-opendal 0.1.0

SQLite virtual tables for querying cloud object store metadata using OpenDAL
Documentation
# sqlite-vtable-opendal

> **Federated SQLite Virtual Tables for Cloud Object Stores using OpenDAL**

Query cloud storage metadata using SQL โ€” without ingestion.

[![Crates.io](https://img.shields.io/crates/v/sqlite-vtable-opendal.svg)](https://crates.io/crates/sqlite-vtable-opendal)
[![Documentation](https://docs.rs/sqlite-vtable-opendal/badge.svg)](https://docs.rs/sqlite-vtable-opendal)
[![License](https://img.shields.io/badge/license-MIT%2FApache--2.0-blue.svg)](LICENSE)

---

## ๐ŸŽฏ Problem Statement

Modern data workflows require querying metadata from cloud object stores (Dropbox, S3, etc.). Today this requires:

- โœ— Custom scripts
- โœ— Full data ingestion
- โœ— Non-composable APIs

**There is no lightweight way to query remote object metadata using SQL without ingestion.**

---

## ๐Ÿ’ก Solution

`sqlite-vtable-opendal` provides SQLite virtual tables that expose cloud storage as queryable tables:

```sql
SELECT path, size
FROM local_files
WHERE size > 10000000
ORDER BY size DESC;
```

**No data ingestion. No materialization. Just pure SQL.**

---

## โœจ Features

- ๐Ÿš€ **Zero Data Ingestion** - Query directly from storage without downloading
- ๐Ÿ“Š **Standard SQL** - Use familiar SQL syntax for cloud storage queries
- โšก **Metadata-Only Queries** - Fetch only what you need (size, dates, etags)
- ๐Ÿ”Œ **Multiple Backends** - Local FS, Dropbox, S3, Google Drive (via OpenDAL)
- ๐ŸŽฏ **Composable** - Combine with SQLite's powerful query engine
- ๐Ÿงช **Well-Tested** - 14 tests covering unit, integration, and doc tests

---

## ๐Ÿ“ฆ Installation

Add to your `Cargo.toml`:

```toml
[dependencies]
sqlite-vtable-opendal = "0.1.0"
rusqlite = { version = "0.32", features = ["bundled-full"] }
```

---

## ๐Ÿš€ Quick Start

### Local Filesystem Example

```rust
use rusqlite::Connection;
use sqlite_vtable_opendal::backends::local_fs;

fn main() -> rusqlite::Result<()> {
    // Open SQLite connection
    let conn = Connection::open_in_memory()?;

    // Register virtual table for /tmp directory
    local_fs::register(&conn, "local_files", "/tmp")?;

    // Query files using SQL!
    let mut stmt = conn.prepare(
        "SELECT name, size FROM local_files
         WHERE size > 1000
         ORDER BY size DESC"
    )?;

    let files = stmt.query_map([], |row| {
        Ok((row.get::<_, String>(0)?, row.get::<_, i64>(1)?))
    })?;

    for file in files {
        let (name, size) = file?;
        println!("{}: {} bytes", name, size);
    }

    Ok(())
}
```

### Available Columns

All virtual tables expose these columns:

| Column | Type | Description |
|--------|------|-------------|
| `path` | TEXT | Full path to file/directory |
| `size` | INTEGER | File size in bytes |
| `last_modified` | TEXT | ISO 8601 timestamp |
| `etag` | TEXT | Content hash (MD5, SHA256, etc.) |
| `is_dir` | INTEGER | 1 if directory, 0 if file |
| `content_type` | TEXT | MIME type or file extension |
| `name` | TEXT | File/directory name (without path) |
| `content` | BLOB | Actual file content (NULL by default) |

---

## ๐Ÿ“š Backend Usage

### Local Filesystem

```rust
use sqlite_vtable_opendal::backends::local_fs;

local_fs::register(&conn, "my_files", "/path/to/directory")?;
```

Query:
```sql
SELECT * FROM my_files WHERE name LIKE '%.txt';
```

### Future Backends

Coming soon:

```rust
// Dropbox (planned)
dropbox::register(&conn, "dropbox_files", access_token, root_path)?;

// S3 (planned)
s3::register(&conn, "s3_files", bucket, region, credentials)?;

// Google Drive (planned)
gdrive::register(&conn, "gdrive_files", credentials)?;
```

---

## ๐Ÿ” SQL Query Examples

### Find Large Files

```sql
SELECT path, size FROM local_files
WHERE size > 100000000
ORDER BY size DESC
LIMIT 10;
```

### Count Files by Extension

```sql
SELECT
    content_type,
    COUNT(*) as count,
    SUM(size) as total_size
FROM local_files
WHERE is_dir = 0
GROUP BY content_type
ORDER BY count DESC;
```

### Find Recently Modified Files

```sql
SELECT path, last_modified
FROM local_files
WHERE last_modified > '2024-01-01'
ORDER BY last_modified DESC;
```

### Calculate Directory Statistics

```sql
SELECT
    COUNT(*) as file_count,
    SUM(size) as total_bytes,
    AVG(size) as avg_size,
    MAX(size) as largest_file
FROM local_files
WHERE is_dir = 0;
```

---

## ๐Ÿ—๏ธ Architecture

### System Flow

```
SQLite Query โ†’ Virtual Table โ†’ OpenDAL โ†’ Storage Backend โ†’ Metadata
```

### Design Principles

1. **Lazy Loading** - Only fetch what's requested
2. **Metadata-First** - Content fetching is opt-in
3. **Async-Ready** - Non-blocking operations via Tokio
4. **Extensible** - Easy to add new storage backends

### Key Components

- **`types`** - Core data structures (`FileMetadata`, `QueryConfig`)
- **`error`** - Comprehensive error handling with `thiserror`
- **`vtab`** - SQLite virtual table infrastructure
- **`backends`** - Storage backend implementations

---

## ๐Ÿงช Testing

Run all tests:

```bash
cargo test
```

Run specific backend tests:

```bash
cargo test local_fs
```

Run with output:

```bash
cargo test -- --nocapture
```

### Test Coverage

- **Unit Tests**: 9 tests covering backend functionality
- **Integration Tests**: 2 tests validating SQLite queries
- **Doc Tests**: 3 tests ensuring examples work

---

## ๐ŸŽฏ Use Cases

### Data Engineering

```sql
-- Discover datasets without downloading
SELECT path FROM s3_files WHERE path LIKE '%/data/2024/%';
```

### Backup Auditing

```sql
-- Find backups older than 30 days
SELECT path, last_modified FROM dropbox_files
WHERE path LIKE '%backup%'
AND last_modified < date('now', '-30 days');
```

### Large File Detection

```sql
-- Identify files consuming most space
SELECT path, size FROM local_files
WHERE size > 1000000000
ORDER BY size DESC;
```

### Compliance Scanning

```sql
-- Find files modified in specific timeframe
SELECT path, last_modified FROM gdrive_files
WHERE last_modified BETWEEN '2024-01-01' AND '2024-12-31';
```

---

## ๐Ÿ› ๏ธ Development

### Project Structure

```
src/
โ”œโ”€โ”€ lib.rs                  # Library entry point
โ”œโ”€โ”€ types.rs                # Core data structures
โ”œโ”€โ”€ error.rs                # Error types
โ”œโ”€โ”€ backends/
โ”‚   โ”œโ”€โ”€ mod.rs              # Backend trait
โ”‚   โ””โ”€โ”€ local_fs.rs         # Local filesystem backend
โ””โ”€โ”€ vtab/
    โ””โ”€โ”€ mod.rs              # SQLite virtual table implementation
```

### Adding a New Backend

1. Create a new file in `src/backends/`
2. Implement the `StorageBackend` trait
3. Implement a `register()` function for SQLite
4. Add comprehensive tests
5. Update documentation

See `src/backends/local_fs.rs` for reference implementation.

---

## ๐Ÿค Contributing

Contributions are welcome! Areas we'd love help with:

- ๐ŸŒ Additional storage backends (Dropbox, S3, Azure, GCS)
- ๐Ÿ“Š Query optimization (predicate pushdown)
- ๐Ÿ’พ Metadata caching layer
- ๐Ÿ“– More usage examples
- ๐Ÿงช Additional tests

### Development Setup

```bash
git clone https://github.com/mukhtaronif/sqlite-vtab-opendal.git
cd sqlite-vtab-opendal
cargo build
cargo test
```

---

## ๐Ÿ”— Related Projects

- [OpenDAL]https://github.com/apache/opendal - Unified data access layer
- [rusqlite]https://github.com/rusqlite/rusqlite - SQLite bindings for Rust
- [Surveilr]https://github.com/surveilr/surveilr - Uses this library for federated queries


---

## ๐Ÿ“„ License

Licensed under either of:

- Apache License, Version 2.0 ([LICENSE-APACHE]LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license ([LICENSE-MIT]LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

--- 

## ๐Ÿ™ Acknowledgments

- Built with [OpenDAL]https://github.com/apache/opendal for storage abstraction
- Inspired by SQLite's virtual table flexibility
- Developed for [Surveilr]https://github.com/surveilr/surveilr federated queries

---

<div align="center">

**[Documentation](https://docs.rs/sqlite-vtable-opendal)** |
**[Crates.io](https://crates.io/crates/sqlite-vtable-opendal)** |
**[Repository](https://github.com/mukhtaronif/sqlite-vtab-opendal)** |
**[Issues](https://github.com/mukhtaronif/sqlite-vtab-opendal/issues)**

Made with โค๏ธ for the Rust community

</div>