sqlite-vtable-opendal 0.1.1

SQLite virtual tables for querying cloud object store metadata using OpenDAL
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
# sqlite-vtable-opendal

> **Federated SQLite Virtual Tables for Cloud Object Stores using OpenDAL**

Query cloud storage metadata using SQL โ€” without ingestion.

[![Crates.io](https://img.shields.io/crates/v/sqlite-vtable-opendal.svg)](https://crates.io/crates/sqlite-vtable-opendal)
[![Documentation](https://docs.rs/sqlite-vtable-opendal/badge.svg)](https://docs.rs/sqlite-vtable-opendal)
[![License](https://img.shields.io/badge/license-MIT%2FApache--2.0-blue.svg)](LICENSE)

---

## ๐ŸŽฏ Problem Statement

Modern data workflows require querying metadata from cloud object stores (Dropbox, S3, etc.). Today this requires:

- โœ— Custom scripts
- โœ— Full data ingestion
- โœ— Non-composable APIs

**There is no lightweight way to query remote object metadata using SQL without ingestion.**

---

## ๐Ÿ’ก Solution

`sqlite-vtable-opendal` provides SQLite virtual tables that expose cloud storage as queryable tables:

```sql
SELECT path, size
FROM local_files
WHERE size > 10000000
ORDER BY size DESC;
```

**No data ingestion. No materialization. Just pure SQL.**

---

## โœจ Features

- ๐Ÿš€ **Zero Data Ingestion** - Query directly from storage without downloading
- ๐Ÿ“Š **Standard SQL** - Use familiar SQL syntax for cloud storage queries
- โšก **Metadata-Only Queries** - Fetch only what you need (size, dates, etags)
- ๐Ÿ”Œ **Multiple Backends** - Local FS, Dropbox, S3, Google Drive, PostgreSQL, HTTP (via OpenDAL)
- ๐ŸŽฏ **Composable** - Combine with SQLite's powerful query engine
- ๐Ÿงช **Well-Tested** - 31 tests covering unit, integration, and doc tests

---

## ๐Ÿ“ฆ Installation

Add to your `Cargo.toml`:

```toml
[dependencies]
sqlite-vtable-opendal = "0.1.0"
rusqlite = { version = "0.32", features = ["bundled-full"] }
```

---

## ๐Ÿš€ Quick Start

### Local Filesystem Example

```rust
use rusqlite::Connection;
use sqlite_vtable_opendal::backends::local_fs;

fn main() -> rusqlite::Result<()> {
    // Open SQLite connection
    let conn = Connection::open_in_memory()?;

    // Register virtual table for /tmp directory
    local_fs::register(&conn, "local_files", "/tmp")?;

    // Query files using SQL!
    let mut stmt = conn.prepare(
        "SELECT name, size FROM local_files
         WHERE size > 1000
         ORDER BY size DESC"
    )?;

    let files = stmt.query_map([], |row| {
        Ok((row.get::<_, String>(0)?, row.get::<_, i64>(1)?))
    })?;

    for file in files {
        let (name, size) = file?;
        println!("{}: {} bytes", name, size);
    }

    Ok(())
}
```

### Available Columns

All virtual tables expose these columns:

| Column | Type | Description |
|--------|------|-------------|
| `path` | TEXT | Full path to file/directory |
| `size` | INTEGER | File size in bytes |
| `last_modified` | TEXT | ISO 8601 timestamp |
| `etag` | TEXT | Content hash (MD5, SHA256, etc.) |
| `is_dir` | INTEGER | 1 if directory, 0 if file |
| `content_type` | TEXT | MIME type or file extension |
| `name` | TEXT | File/directory name (without path) |
| `content` | BLOB | Actual file content (NULL by default) |

---

## ๐Ÿ“š Backend Usage

### Local Filesystem

```rust
use sqlite_vtable_opendal::backends::local_fs;

local_fs::register(&conn, "my_files", "/path/to/directory")?;
```

Query:
```sql
SELECT * FROM my_files WHERE name LIKE '%.txt';
```

### AWS S3

```rust
use sqlite_vtable_opendal::backends::s3;

s3::register(
    &conn,
    "s3_files",
    "my-bucket",           // bucket name
    "us-east-1",           // region
    "AKIAIOSFODNN7EXAMPLE", // access_key_id
    "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" // secret_access_key
)?;
```

Query:
```sql
SELECT path, size FROM s3_files WHERE path LIKE '%.csv' LIMIT 100;
```

### Dropbox

```rust
use sqlite_vtable_opendal::backends::dropbox;

dropbox::register(
    &conn,
    "dropbox_files",
    "your_access_token",  // Get from Dropbox App Console
    "/"                   // root path
)?;
```

Query:
```sql
SELECT name, last_modified FROM dropbox_files ORDER BY last_modified DESC;
```

### Google Drive

```rust
use sqlite_vtable_opendal::backends::gdrive;

gdrive::register(
    &conn,
    "gdrive_files",
    "your_access_token",  // OAuth2 access token
    "/"                   // root path
)?;
```

Query:
```sql
SELECT path, size FROM gdrive_files WHERE is_dir = 0;
```

### PostgreSQL

```rust
use sqlite_vtable_opendal::backends::postgresql;

postgresql::register(
    &conn,
    "pg_data",
    "postgresql://user:password@localhost/mydb",
    "my_table",     // table name
    "id",           // key field (becomes path)
    "data"          // value field (becomes content)
)?;
```

Query:
```sql
SELECT path, size FROM pg_data;
```

### HTTP

```rust
use sqlite_vtable_opendal::backends::http;

http::register(
    &conn,
    "http_data",
    "https://api.example.com/data"  // endpoint URL
)?;
```

Query:
```sql
SELECT path, content_type FROM http_data;
```

---

## ๐Ÿ” SQL Query Examples

### Find Large Files

```sql
SELECT path, size FROM local_files
WHERE size > 100000000
ORDER BY size DESC
LIMIT 10;
```

### Count Files by Extension

```sql
SELECT
    content_type,
    COUNT(*) as count,
    SUM(size) as total_size
FROM local_files
WHERE is_dir = 0
GROUP BY content_type
ORDER BY count DESC;
```

### Find Recently Modified Files

```sql
SELECT path, last_modified
FROM local_files
WHERE last_modified > '2024-01-01'
ORDER BY last_modified DESC;
```

### Calculate Directory Statistics

```sql
SELECT
    COUNT(*) as file_count,
    SUM(size) as total_bytes,
    AVG(size) as avg_size,
    MAX(size) as largest_file
FROM local_files
WHERE is_dir = 0;
```

---

## ๐Ÿ—๏ธ Architecture

### System Flow

```
SQLite Query โ†’ Virtual Table โ†’ OpenDAL โ†’ Storage Backend โ†’ Metadata
```

### Design Principles

1. **Lazy Loading** - Only fetch what's requested
2. **Metadata-First** - Content fetching is opt-in
3. **Async-Ready** - Non-blocking operations via Tokio
4. **Extensible** - Easy to add new storage backends

### Key Components

- **`types`** - Core data structures (`FileMetadata`, `QueryConfig`)
- **`error`** - Comprehensive error handling with `thiserror`
- **`vtab`** - SQLite virtual table infrastructure
- **`backends`** - Storage backend implementations

---

## ๐Ÿงช Testing

Run all tests:

```bash
cargo test
```

Run specific backend tests:

```bash
cargo test local_fs
```

Run with output:

```bash
cargo test -- --nocapture
```

### Test Coverage

- **Unit Tests**: 21 tests covering all backend functionality
- **Doc Tests**: 10 tests ensuring examples work
- **Integration Tests**: Full end-to-end SQLite query validation

---

## ๐ŸŽฏ Use Cases

### Data Engineering

```sql
-- Discover datasets without downloading
SELECT path FROM s3_files WHERE path LIKE '%/data/2024/%';
```

### Backup Auditing

```sql
-- Find backups older than 30 days
SELECT path, last_modified FROM dropbox_files
WHERE path LIKE '%backup%'
AND last_modified < date('now', '-30 days');
```

### Large File Detection

```sql
-- Identify files consuming most space
SELECT path, size FROM local_files
WHERE size > 1000000000
ORDER BY size DESC;
```

### Compliance Scanning

```sql
-- Find files modified in specific timeframe
SELECT path, last_modified FROM gdrive_files
WHERE last_modified BETWEEN '2024-01-01' AND '2024-12-31';
```

---

## ๐Ÿ› ๏ธ Development

### Project Structure

```
src/
โ”œโ”€โ”€ lib.rs                  # Library entry point
โ”œโ”€โ”€ types.rs                # Core data structures
โ”œโ”€โ”€ error.rs                # Error types
โ”œโ”€โ”€ backends/
โ”‚   โ”œโ”€โ”€ mod.rs              # Backend trait
โ”‚   โ”œโ”€โ”€ local_fs.rs         # Local filesystem backend
โ”‚   โ”œโ”€โ”€ s3.rs               # AWS S3 backend
โ”‚   โ”œโ”€โ”€ dropbox.rs          # Dropbox backend
โ”‚   โ”œโ”€โ”€ gdrive.rs           # Google Drive backend
โ”‚   โ”œโ”€โ”€ postgresql.rs       # PostgreSQL backend
โ”‚   โ””โ”€โ”€ http.rs             # HTTP/HTTPS backend
โ””โ”€โ”€ vtab/
    โ””โ”€โ”€ mod.rs              # SQLite virtual table implementation
```

### Adding a New Backend

1. Create a new file in `src/backends/`
2. Implement the `StorageBackend` trait
3. Implement a `register()` function for SQLite
4. Add comprehensive tests
5. Update documentation

See `src/backends/local_fs.rs` for reference implementation.

---

## ๐Ÿค Contributing

Contributions are welcome! Areas we'd love help with:

- ๐ŸŒ Additional storage backends (Azure Blob, Google Cloud Storage, MinIO)
- ๐Ÿ“Š Query optimization (predicate pushdown, index hints)
- ๐Ÿ’พ Metadata caching layer
- ๐Ÿ“– More usage examples and tutorials
- ๐Ÿงช Additional tests and benchmarks
- ๐Ÿ› Bug fixes and performance improvements

### Development Setup

```bash
git clone https://github.com/mukhtaronif/sqlite-vtab-opendal.git
cd sqlite-vtab-opendal
cargo build
cargo test
```

---

## ๐Ÿ”— Related Projects

- [OpenDAL]https://github.com/apache/opendal - Unified data access layer
- [rusqlite]https://github.com/rusqlite/rusqlite - SQLite bindings for Rust
- [Surveilr]https://github.com/surveilr/surveilr - Uses this library for federated queries


---

## ๐Ÿ“„ License

Licensed under either of:

- Apache License, Version 2.0 ([LICENSE-APACHE]LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license ([LICENSE-MIT]LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.

--- 

## ๐Ÿ™ Acknowledgments

- Built with [OpenDAL]https://github.com/apache/opendal for storage abstraction
- Inspired by SQLite's virtual table flexibility
- Developed for [Surveilr]https://github.com/surveilr/surveilr federated queries

---

<div align="center">

**[Documentation](https://docs.rs/sqlite-vtable-opendal)** |
**[Crates.io](https://crates.io/crates/sqlite-vtable-opendal)** |
**[Repository](https://github.com/mukhtaronif/sqlite-vtab-opendal)** |
**[Issues](https://github.com/mukhtaronif/sqlite-vtab-opendal/issues)**

Made with โค๏ธ for the Rust community

</div>