sqlite-vtable-opendal 0.1.0

SQLite virtual tables for querying cloud object store metadata using OpenDAL
Documentation

sqlite-vtable-opendal

Federated SQLite Virtual Tables for Cloud Object Stores using OpenDAL

Query cloud storage metadata using SQL โ€” without ingestion.

Crates.io Documentation License


๐ŸŽฏ Problem Statement

Modern data workflows require querying metadata from cloud object stores (Dropbox, S3, etc.). Today this requires:

  • โœ— Custom scripts
  • โœ— Full data ingestion
  • โœ— Non-composable APIs

There is no lightweight way to query remote object metadata using SQL without ingestion.


๐Ÿ’ก Solution

sqlite-vtable-opendal provides SQLite virtual tables that expose cloud storage as queryable tables:

SELECT path, size
FROM local_files
WHERE size > 10000000
ORDER BY size DESC;

No data ingestion. No materialization. Just pure SQL.


โœจ Features

  • ๐Ÿš€ Zero Data Ingestion - Query directly from storage without downloading
  • ๐Ÿ“Š Standard SQL - Use familiar SQL syntax for cloud storage queries
  • โšก Metadata-Only Queries - Fetch only what you need (size, dates, etags)
  • ๐Ÿ”Œ Multiple Backends - Local FS, Dropbox, S3, Google Drive (via OpenDAL)
  • ๐ŸŽฏ Composable - Combine with SQLite's powerful query engine
  • ๐Ÿงช Well-Tested - 14 tests covering unit, integration, and doc tests

๐Ÿ“ฆ Installation

Add to your Cargo.toml:

[dependencies]
sqlite-vtable-opendal = "0.1.0"
rusqlite = { version = "0.32", features = ["bundled-full"] }

๐Ÿš€ Quick Start

Local Filesystem Example

use rusqlite::Connection;
use sqlite_vtable_opendal::backends::local_fs;

fn main() -> rusqlite::Result<()> {
    // Open SQLite connection
    let conn = Connection::open_in_memory()?;

    // Register virtual table for /tmp directory
    local_fs::register(&conn, "local_files", "/tmp")?;

    // Query files using SQL!
    let mut stmt = conn.prepare(
        "SELECT name, size FROM local_files
         WHERE size > 1000
         ORDER BY size DESC"
    )?;

    let files = stmt.query_map([], |row| {
        Ok((row.get::<_, String>(0)?, row.get::<_, i64>(1)?))
    })?;

    for file in files {
        let (name, size) = file?;
        println!("{}: {} bytes", name, size);
    }

    Ok(())
}

Available Columns

All virtual tables expose these columns:

Column Type Description
path TEXT Full path to file/directory
size INTEGER File size in bytes
last_modified TEXT ISO 8601 timestamp
etag TEXT Content hash (MD5, SHA256, etc.)
is_dir INTEGER 1 if directory, 0 if file
content_type TEXT MIME type or file extension
name TEXT File/directory name (without path)
content BLOB Actual file content (NULL by default)

๐Ÿ“š Backend Usage

Local Filesystem

use sqlite_vtable_opendal::backends::local_fs;

local_fs::register(&conn, "my_files", "/path/to/directory")?;

Query:

SELECT * FROM my_files WHERE name LIKE '%.txt';

Future Backends

Coming soon:

// Dropbox (planned)
dropbox::register(&conn, "dropbox_files", access_token, root_path)?;

// S3 (planned)
s3::register(&conn, "s3_files", bucket, region, credentials)?;

// Google Drive (planned)
gdrive::register(&conn, "gdrive_files", credentials)?;

๐Ÿ” SQL Query Examples

Find Large Files

SELECT path, size FROM local_files
WHERE size > 100000000
ORDER BY size DESC
LIMIT 10;

Count Files by Extension

SELECT
    content_type,
    COUNT(*) as count,
    SUM(size) as total_size
FROM local_files
WHERE is_dir = 0
GROUP BY content_type
ORDER BY count DESC;

Find Recently Modified Files

SELECT path, last_modified
FROM local_files
WHERE last_modified > '2024-01-01'
ORDER BY last_modified DESC;

Calculate Directory Statistics

SELECT
    COUNT(*) as file_count,
    SUM(size) as total_bytes,
    AVG(size) as avg_size,
    MAX(size) as largest_file
FROM local_files
WHERE is_dir = 0;

๐Ÿ—๏ธ Architecture

System Flow

SQLite Query โ†’ Virtual Table โ†’ OpenDAL โ†’ Storage Backend โ†’ Metadata

Design Principles

  1. Lazy Loading - Only fetch what's requested
  2. Metadata-First - Content fetching is opt-in
  3. Async-Ready - Non-blocking operations via Tokio
  4. Extensible - Easy to add new storage backends

Key Components

  • types - Core data structures (FileMetadata, QueryConfig)
  • error - Comprehensive error handling with thiserror
  • vtab - SQLite virtual table infrastructure
  • backends - Storage backend implementations

๐Ÿงช Testing

Run all tests:

cargo test

Run specific backend tests:

cargo test local_fs

Run with output:

cargo test -- --nocapture

Test Coverage

  • Unit Tests: 9 tests covering backend functionality
  • Integration Tests: 2 tests validating SQLite queries
  • Doc Tests: 3 tests ensuring examples work

๐ŸŽฏ Use Cases

Data Engineering

-- Discover datasets without downloading
SELECT path FROM s3_files WHERE path LIKE '%/data/2024/%';

Backup Auditing

-- Find backups older than 30 days
SELECT path, last_modified FROM dropbox_files
WHERE path LIKE '%backup%'
AND last_modified < date('now', '-30 days');

Large File Detection

-- Identify files consuming most space
SELECT path, size FROM local_files
WHERE size > 1000000000
ORDER BY size DESC;

Compliance Scanning

-- Find files modified in specific timeframe
SELECT path, last_modified FROM gdrive_files
WHERE last_modified BETWEEN '2024-01-01' AND '2024-12-31';

๐Ÿ› ๏ธ Development

Project Structure

src/
โ”œโ”€โ”€ lib.rs                  # Library entry point
โ”œโ”€โ”€ types.rs                # Core data structures
โ”œโ”€โ”€ error.rs                # Error types
โ”œโ”€โ”€ backends/
โ”‚   โ”œโ”€โ”€ mod.rs              # Backend trait
โ”‚   โ””โ”€โ”€ local_fs.rs         # Local filesystem backend
โ””โ”€โ”€ vtab/
    โ””โ”€โ”€ mod.rs              # SQLite virtual table implementation

Adding a New Backend

  1. Create a new file in src/backends/
  2. Implement the StorageBackend trait
  3. Implement a register() function for SQLite
  4. Add comprehensive tests
  5. Update documentation

See src/backends/local_fs.rs for reference implementation.


๐Ÿค Contributing

Contributions are welcome! Areas we'd love help with:

  • ๐ŸŒ Additional storage backends (Dropbox, S3, Azure, GCS)
  • ๐Ÿ“Š Query optimization (predicate pushdown)
  • ๐Ÿ’พ Metadata caching layer
  • ๐Ÿ“– More usage examples
  • ๐Ÿงช Additional tests

Development Setup

git clone https://github.com/mukhtaronif/sqlite-vtab-opendal.git
cd sqlite-vtab-opendal
cargo build
cargo test

๐Ÿ”— Related Projects

  • OpenDAL - Unified data access layer
  • rusqlite - SQLite bindings for Rust
  • Surveilr - Uses this library for federated queries

๐Ÿ“„ License

Licensed under either of:

at your option.


๐Ÿ™ Acknowledgments

  • Built with OpenDAL for storage abstraction
  • Inspired by SQLite's virtual table flexibility
  • Developed for Surveilr federated queries

Documentation | Crates.io | Repository | Issues

Made with โค๏ธ for the Rust community