sqlite-vtable-opendal 0.1.1

SQLite virtual tables for querying cloud object store metadata using OpenDAL
Documentation

sqlite-vtable-opendal

Federated SQLite Virtual Tables for Cloud Object Stores using OpenDAL

Query cloud storage metadata using SQL โ€” without ingestion.

Crates.io Documentation License


๐ŸŽฏ Problem Statement

Modern data workflows require querying metadata from cloud object stores (Dropbox, S3, etc.). Today this requires:

  • โœ— Custom scripts
  • โœ— Full data ingestion
  • โœ— Non-composable APIs

There is no lightweight way to query remote object metadata using SQL without ingestion.


๐Ÿ’ก Solution

sqlite-vtable-opendal provides SQLite virtual tables that expose cloud storage as queryable tables:

SELECT path, size
FROM local_files
WHERE size > 10000000
ORDER BY size DESC;

No data ingestion. No materialization. Just pure SQL.


โœจ Features

  • ๐Ÿš€ Zero Data Ingestion - Query directly from storage without downloading
  • ๐Ÿ“Š Standard SQL - Use familiar SQL syntax for cloud storage queries
  • โšก Metadata-Only Queries - Fetch only what you need (size, dates, etags)
  • ๐Ÿ”Œ Multiple Backends - Local FS, Dropbox, S3, Google Drive, PostgreSQL, HTTP (via OpenDAL)
  • ๐ŸŽฏ Composable - Combine with SQLite's powerful query engine
  • ๐Ÿงช Well-Tested - 31 tests covering unit, integration, and doc tests

๐Ÿ“ฆ Installation

Add to your Cargo.toml:

[dependencies]
sqlite-vtable-opendal = "0.1.0"
rusqlite = { version = "0.32", features = ["bundled-full"] }

๐Ÿš€ Quick Start

Local Filesystem Example

use rusqlite::Connection;
use sqlite_vtable_opendal::backends::local_fs;

fn main() -> rusqlite::Result<()> {
    // Open SQLite connection
    let conn = Connection::open_in_memory()?;

    // Register virtual table for /tmp directory
    local_fs::register(&conn, "local_files", "/tmp")?;

    // Query files using SQL!
    let mut stmt = conn.prepare(
        "SELECT name, size FROM local_files
         WHERE size > 1000
         ORDER BY size DESC"
    )?;

    let files = stmt.query_map([], |row| {
        Ok((row.get::<_, String>(0)?, row.get::<_, i64>(1)?))
    })?;

    for file in files {
        let (name, size) = file?;
        println!("{}: {} bytes", name, size);
    }

    Ok(())
}

Available Columns

All virtual tables expose these columns:

Column Type Description
path TEXT Full path to file/directory
size INTEGER File size in bytes
last_modified TEXT ISO 8601 timestamp
etag TEXT Content hash (MD5, SHA256, etc.)
is_dir INTEGER 1 if directory, 0 if file
content_type TEXT MIME type or file extension
name TEXT File/directory name (without path)
content BLOB Actual file content (NULL by default)

๐Ÿ“š Backend Usage

Local Filesystem

use sqlite_vtable_opendal::backends::local_fs;

local_fs::register(&conn, "my_files", "/path/to/directory")?;

Query:

SELECT * FROM my_files WHERE name LIKE '%.txt';

AWS S3

use sqlite_vtable_opendal::backends::s3;

s3::register(
    &conn,
    "s3_files",
    "my-bucket",           // bucket name
    "us-east-1",           // region
    "AKIAIOSFODNN7EXAMPLE", // access_key_id
    "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" // secret_access_key
)?;

Query:

SELECT path, size FROM s3_files WHERE path LIKE '%.csv' LIMIT 100;

Dropbox

use sqlite_vtable_opendal::backends::dropbox;

dropbox::register(
    &conn,
    "dropbox_files",
    "your_access_token",  // Get from Dropbox App Console
    "/"                   // root path
)?;

Query:

SELECT name, last_modified FROM dropbox_files ORDER BY last_modified DESC;

Google Drive

use sqlite_vtable_opendal::backends::gdrive;

gdrive::register(
    &conn,
    "gdrive_files",
    "your_access_token",  // OAuth2 access token
    "/"                   // root path
)?;

Query:

SELECT path, size FROM gdrive_files WHERE is_dir = 0;

PostgreSQL

use sqlite_vtable_opendal::backends::postgresql;

postgresql::register(
    &conn,
    "pg_data",
    "postgresql://user:password@localhost/mydb",
    "my_table",     // table name
    "id",           // key field (becomes path)
    "data"          // value field (becomes content)
)?;

Query:

SELECT path, size FROM pg_data;

HTTP

use sqlite_vtable_opendal::backends::http;

http::register(
    &conn,
    "http_data",
    "https://api.example.com/data"  // endpoint URL
)?;

Query:

SELECT path, content_type FROM http_data;

๐Ÿ” SQL Query Examples

Find Large Files

SELECT path, size FROM local_files
WHERE size > 100000000
ORDER BY size DESC
LIMIT 10;

Count Files by Extension

SELECT
    content_type,
    COUNT(*) as count,
    SUM(size) as total_size
FROM local_files
WHERE is_dir = 0
GROUP BY content_type
ORDER BY count DESC;

Find Recently Modified Files

SELECT path, last_modified
FROM local_files
WHERE last_modified > '2024-01-01'
ORDER BY last_modified DESC;

Calculate Directory Statistics

SELECT
    COUNT(*) as file_count,
    SUM(size) as total_bytes,
    AVG(size) as avg_size,
    MAX(size) as largest_file
FROM local_files
WHERE is_dir = 0;

๐Ÿ—๏ธ Architecture

System Flow

SQLite Query โ†’ Virtual Table โ†’ OpenDAL โ†’ Storage Backend โ†’ Metadata

Design Principles

  1. Lazy Loading - Only fetch what's requested
  2. Metadata-First - Content fetching is opt-in
  3. Async-Ready - Non-blocking operations via Tokio
  4. Extensible - Easy to add new storage backends

Key Components

  • types - Core data structures (FileMetadata, QueryConfig)
  • error - Comprehensive error handling with thiserror
  • vtab - SQLite virtual table infrastructure
  • backends - Storage backend implementations

๐Ÿงช Testing

Run all tests:

cargo test

Run specific backend tests:

cargo test local_fs

Run with output:

cargo test -- --nocapture

Test Coverage

  • Unit Tests: 21 tests covering all backend functionality
  • Doc Tests: 10 tests ensuring examples work
  • Integration Tests: Full end-to-end SQLite query validation

๐ŸŽฏ Use Cases

Data Engineering

-- Discover datasets without downloading
SELECT path FROM s3_files WHERE path LIKE '%/data/2024/%';

Backup Auditing

-- Find backups older than 30 days
SELECT path, last_modified FROM dropbox_files
WHERE path LIKE '%backup%'
AND last_modified < date('now', '-30 days');

Large File Detection

-- Identify files consuming most space
SELECT path, size FROM local_files
WHERE size > 1000000000
ORDER BY size DESC;

Compliance Scanning

-- Find files modified in specific timeframe
SELECT path, last_modified FROM gdrive_files
WHERE last_modified BETWEEN '2024-01-01' AND '2024-12-31';

๐Ÿ› ๏ธ Development

Project Structure

src/
โ”œโ”€โ”€ lib.rs                  # Library entry point
โ”œโ”€โ”€ types.rs                # Core data structures
โ”œโ”€โ”€ error.rs                # Error types
โ”œโ”€โ”€ backends/
โ”‚   โ”œโ”€โ”€ mod.rs              # Backend trait
โ”‚   โ”œโ”€โ”€ local_fs.rs         # Local filesystem backend
โ”‚   โ”œโ”€โ”€ s3.rs               # AWS S3 backend
โ”‚   โ”œโ”€โ”€ dropbox.rs          # Dropbox backend
โ”‚   โ”œโ”€โ”€ gdrive.rs           # Google Drive backend
โ”‚   โ”œโ”€โ”€ postgresql.rs       # PostgreSQL backend
โ”‚   โ””โ”€โ”€ http.rs             # HTTP/HTTPS backend
โ””โ”€โ”€ vtab/
    โ””โ”€โ”€ mod.rs              # SQLite virtual table implementation

Adding a New Backend

  1. Create a new file in src/backends/
  2. Implement the StorageBackend trait
  3. Implement a register() function for SQLite
  4. Add comprehensive tests
  5. Update documentation

See src/backends/local_fs.rs for reference implementation.


๐Ÿค Contributing

Contributions are welcome! Areas we'd love help with:

  • ๐ŸŒ Additional storage backends (Azure Blob, Google Cloud Storage, MinIO)
  • ๐Ÿ“Š Query optimization (predicate pushdown, index hints)
  • ๐Ÿ’พ Metadata caching layer
  • ๐Ÿ“– More usage examples and tutorials
  • ๐Ÿงช Additional tests and benchmarks
  • ๐Ÿ› Bug fixes and performance improvements

Development Setup

git clone https://github.com/mukhtaronif/sqlite-vtab-opendal.git
cd sqlite-vtab-opendal
cargo build
cargo test

๐Ÿ”— Related Projects

  • OpenDAL - Unified data access layer
  • rusqlite - SQLite bindings for Rust
  • Surveilr - Uses this library for federated queries

๐Ÿ“„ License

Licensed under either of:

at your option.


๐Ÿ™ Acknowledgments

  • Built with OpenDAL for storage abstraction
  • Inspired by SQLite's virtual table flexibility
  • Developed for Surveilr federated queries

Documentation | Crates.io | Repository | Issues

Made with โค๏ธ for the Rust community