sqlite-vtable-opendal
Federated SQLite Virtual Tables for Cloud Object Stores using OpenDAL
Query cloud storage metadata using SQL โ without ingestion.
๐ฏ Problem Statement
Modern data workflows require querying metadata from cloud object stores (Dropbox, S3, etc.). Today this requires:
- โ Custom scripts
- โ Full data ingestion
- โ Non-composable APIs
There is no lightweight way to query remote object metadata using SQL without ingestion.
๐ก Solution
sqlite-vtable-opendal provides SQLite virtual tables that expose cloud storage as queryable tables:
SELECT path, size
FROM local_files
WHERE size > 10000000
ORDER BY size DESC;
No data ingestion. No materialization. Just pure SQL.
โจ Features
- ๐ Zero Data Ingestion - Query directly from storage without downloading
- ๐ Standard SQL - Use familiar SQL syntax for cloud storage queries
- โก Metadata-Only Queries - Fetch only what you need (size, dates, etags)
- ๐ Multiple Backends - Local FS, Dropbox, S3, Google Drive (via OpenDAL)
- ๐ฏ Composable - Combine with SQLite's powerful query engine
- ๐งช Well-Tested - 14 tests covering unit, integration, and doc tests
๐ฆ Installation
Add to your Cargo.toml:
[]
= "0.1.0"
= { = "0.32", = ["bundled-full"] }
๐ Quick Start
Local Filesystem Example
use Connection;
use local_fs;
Available Columns
All virtual tables expose these columns:
| Column | Type | Description |
|---|---|---|
path |
TEXT | Full path to file/directory |
size |
INTEGER | File size in bytes |
last_modified |
TEXT | ISO 8601 timestamp |
etag |
TEXT | Content hash (MD5, SHA256, etc.) |
is_dir |
INTEGER | 1 if directory, 0 if file |
content_type |
TEXT | MIME type or file extension |
name |
TEXT | File/directory name (without path) |
content |
BLOB | Actual file content (NULL by default) |
๐ Backend Usage
Local Filesystem
use local_fs;
register?;
Query:
SELECT * FROM my_files WHERE name LIKE '%.txt';
Future Backends
Coming soon:
// Dropbox (planned)
register?;
// S3 (planned)
register?;
// Google Drive (planned)
register?;
๐ SQL Query Examples
Find Large Files
SELECT path, size FROM local_files
WHERE size > 100000000
ORDER BY size DESC
LIMIT 10;
Count Files by Extension
SELECT
content_type,
COUNT(*) as count,
SUM(size) as total_size
FROM local_files
WHERE is_dir = 0
GROUP BY content_type
ORDER BY count DESC;
Find Recently Modified Files
SELECT path, last_modified
FROM local_files
WHERE last_modified > '2024-01-01'
ORDER BY last_modified DESC;
Calculate Directory Statistics
SELECT
COUNT(*) as file_count,
SUM(size) as total_bytes,
AVG(size) as avg_size,
MAX(size) as largest_file
FROM local_files
WHERE is_dir = 0;
๐๏ธ Architecture
System Flow
SQLite Query โ Virtual Table โ OpenDAL โ Storage Backend โ Metadata
Design Principles
- Lazy Loading - Only fetch what's requested
- Metadata-First - Content fetching is opt-in
- Async-Ready - Non-blocking operations via Tokio
- Extensible - Easy to add new storage backends
Key Components
types- Core data structures (FileMetadata,QueryConfig)error- Comprehensive error handling withthiserrorvtab- SQLite virtual table infrastructurebackends- Storage backend implementations
๐งช Testing
Run all tests:
Run specific backend tests:
Run with output:
Test Coverage
- Unit Tests: 9 tests covering backend functionality
- Integration Tests: 2 tests validating SQLite queries
- Doc Tests: 3 tests ensuring examples work
๐ฏ Use Cases
Data Engineering
-- Discover datasets without downloading
SELECT path FROM s3_files WHERE path LIKE '%/data/2024/%';
Backup Auditing
-- Find backups older than 30 days
SELECT path, last_modified FROM dropbox_files
WHERE path LIKE '%backup%'
AND last_modified < date('now', '-30 days');
Large File Detection
-- Identify files consuming most space
SELECT path, size FROM local_files
WHERE size > 1000000000
ORDER BY size DESC;
Compliance Scanning
-- Find files modified in specific timeframe
SELECT path, last_modified FROM gdrive_files
WHERE last_modified BETWEEN '2024-01-01' AND '2024-12-31';
๐ ๏ธ Development
Project Structure
src/
โโโ lib.rs # Library entry point
โโโ types.rs # Core data structures
โโโ error.rs # Error types
โโโ backends/
โ โโโ mod.rs # Backend trait
โ โโโ local_fs.rs # Local filesystem backend
โโโ vtab/
โโโ mod.rs # SQLite virtual table implementation
Adding a New Backend
- Create a new file in
src/backends/ - Implement the
StorageBackendtrait - Implement a
register()function for SQLite - Add comprehensive tests
- Update documentation
See src/backends/local_fs.rs for reference implementation.
๐ค Contributing
Contributions are welcome! Areas we'd love help with:
- ๐ Additional storage backends (Dropbox, S3, Azure, GCS)
- ๐ Query optimization (predicate pushdown)
- ๐พ Metadata caching layer
- ๐ More usage examples
- ๐งช Additional tests
Development Setup
๐ Related Projects
- OpenDAL - Unified data access layer
- rusqlite - SQLite bindings for Rust
- Surveilr - Uses this library for federated queries
๐ License
Licensed under either of:
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option.
๐ Acknowledgments
- Built with OpenDAL for storage abstraction
- Inspired by SQLite's virtual table flexibility
- Developed for Surveilr federated queries
Documentation | Crates.io | Repository | Issues
Made with โค๏ธ for the Rust community