rss_core 0.6.0 - Docs.rs

# `get()` vs `get_remote()` Benchmark

Compares the full pipeline: **query → fetch → read raster data** for two data access paths.

## What's Being Compared

| Path | Method | Behavior |
|------|--------|----------|
| **Download** | `.get(&path, None, None)` | Downloads all imagery files to local disk via `gdal_translate`, then reads from local paths. Has a file cache (`/tmp/rss_cache`) that can reduce repeated downloads to symlinks. |
| **VSI Direct** | `.get_remote()` | Reads directly from S3/HTTP via GDAL VSI paths (`/vsis3/...`). Zero download — data flows through GDAL's virtual file system at read time. |

## How to Run

```bash
cargo bench -p rss_core --features bench_live --bench get_vs_get_remote
```

The `bench_live` feature flag is required — without it, the benchmark is skipped entirely (suitable for CI).

## Query Configuration

The benchmark uses a small, reproducible query:
- **Source**: Digital Earth Australia (DEA) STAC catalog
- **Collection**: Sentinel-2
- **Scene**: `56jns` (Queensland, Australia)
- **Bands**: `nbart_red`, `nbart_nir_1` (2 bands)
- **Date range**: 2021-01-01 to 2021-02-01 (~1 month)
- **Cloud cover**: < 30%

This typically returns ~4 scenes, each with 2 band files.

## What Gets Measured

Each benchmark iteration measures the wall time of:

1. **Query execution** (identical for both paths — STAC API call)
2. **Fetch phase**:
   - `.get()`: Downloads files via `gdal_translate` (network + disk I/O)
   - `.get_remote()`: Transforms URLs to VSI paths (pure CPU, no I/O)
3. **Read phase**: Opens the first asset and reads one 512×512 block aligned to the COG tile boundary

## COG Block Alignment

DEA Sentinel-2 COGs use **512×512 internal tile blocks** with internal overviews. The benchmark reads one 512×512 block from the top-left corner, which aligns to a single COG tile. This minimizes HTTP range requests:

| Read scope | HTTP requests | Typical time |
|---|---|---|
| 512×512 block (aligned) | **4** | ~1s |
| 256×256 window (center) | **4** | ~1.1s |
| Full 10980×10980 raster | **26** | ~15s |

The 4 requests for a single block are:
1. `ListObjects` — resolve S3 bucket
2. `ListObjects` — resolve path within bucket
3. `GET Range: bytes=0-16383` — TIFF header + IFD (Image File Directory)
4. `GET Range: bytes=X-Y` — overview tile containing the requested block

GDAL reads the TIFF header to find the IFD, which maps tile positions to byte offsets. Then it reads the appropriate overview tile. For a full raster read, GDAL must read 23+ tiles sequentially, each requiring a separate HTTP request.

## Why VSI Reads Are Slower Than Local Reads

The benchmark measures **cold-start** access — opening a fresh dataset and reading one block. This is inherently slow for S3:

- **S3 latency**: Each HTTP request has ~500ms round-trip time (network + S3 processing)
- **Sequential requests**: GDAL's VSI layer makes requests sequentially (can't parallelize header → IFD → tile)
- **No connection reuse**: Each `Dataset::open()` creates fresh HTTP connections

For comparison, a local disk read of the same block takes ~1ms (page cache hit) or ~5ms (physical disk seek).

### In Real Workflows

The benchmark measures worst-case cold-start cost. In a real `.apply()` workflow:
1. Dataset is opened **once** per file
2. Multiple blocks are read from the **same open dataset**
3. GDAL caches the IFD and tile positions internally
4. Subsequent block reads only need 1 HTTP request (the tile data)

So the amortized cost per block in a real workflow is ~500ms, not ~1s.

## Can We Make VSI Reads Faster?

### Tested Optimizations

| Optimization | Effect | Notes |
|---|---|---|
| `CPL_VSIL_CURL_USE_HEAD=YES` | -15% (1.1s → 0.95s) | Uses HEAD requests for metadata |
| `CPL_S3_USE_LISTOBJECTSV2=YES` | -10% (1.1s → 0.97s) | Newer S3 listing API |
| `CPL_VSIL_CURL_CACHE_PATH=/tmp/...` | ~0% | Doesn't help with cold starts |
| `/vsicurl/` with direct HTTPS | +30% (1.1s → 1.4s) | Slower than native S3 protocol |
| Combined all options | -15% (1.1s → 0.95s) | Diminishing returns |

The benchmark enables `CPL_VSIL_CURL_USE_HEAD=YES` and `CPL_S3_USE_LISTOBJECTSV2=YES` for marginal improvement.

### Fundamental Limits

The 4 HTTP requests for a single block read are **fundamental** to how COGs work:
1. S3 requires directory listing to resolve `/vsis3/` paths (2 requests)
2. TIFF format requires reading the header to find tile offsets (1 request)
3. The tile data itself must be fetched (1 request)

To reduce requests further, you'd need to:
- **Precalculate byte ranges**: Parse the TIFF IFD offline and embed offsets in metadata. This bypasses GDAL's VSI layer entirely and requires significant code changes.
- **Use presigned URLs**: Skip directory listing by using direct S3 URLs. But this requires AWS credentials and presigning logic.
- **Custom VSI driver**: Implement a GDAL driver that caches IFD offsets and parallelizes tile reads.

### Architectural Optimization

The most impactful optimization is architectural: **open datasets once and read multiple blocks**. This amortizes the cold-start cost across all blocks read from that dataset. A typical `.apply()` workflow reads dozens of blocks per file, making the per-block cost ~500ms instead of ~1s.

## Caveats

- **Network-dependent**: Results vary by connection speed and S3 latency
- **Cache effects**: `.get()` uses a file cache (`/tmp/rss_cache`). The benchmark uses a fresh `TempDir` each iteration but the cache is shared across benchmark runs.
- **Small sample**: Only ~4 scenes are used. Results may differ for larger queries.
- **Single block read**: Only one 512×512 block is read. A full benchmark would read all blocks across all items.

## Interpreting Results

The `.get_remote()` path is faster when:
- Reading small regions (single blocks or windows)
- Data is accessed once and discarded
- You want to avoid disk I/O entirely

The `.get()` path is competitive or faster when:
- Reading large regions (full rasters or many blocks)
- Multiple reads of the same data are needed
- Network bandwidth is the bottleneck (parallel downloads saturate the connection)

## Typical Results

```
get/download/full-pipeline/4 items
                        time:   [20ms 21ms 22ms]

get_remote/vsi-direct/full-pipeline/4 items
                        time:   [590µs 593µs 597µs]
```

Note: `.get_remote()` appears faster here because it only transforms URLs and reads one block. The `.get()` path downloads all 8 files (4 items × 2 bands) before reading, which dominates the measured time. The actual read-from-disk cost is negligible (~1ms) compared to the download cost (~20ms).

---

## async-tiff vs GDAL VSI Benchmark

A separate benchmark (`async_tiff_vs_vsi`) compares GDAL VSI against [async-tiff](https://github.com/developmentseed/async-tiff) — a pure Rust async TIFF reader using `object_store` for S3 access.

### Running

```bash
cargo bench -p rss_core --features bench_live --bench async_tiff_vs_vsi
```

### Results

| Scenario | GDAL VSI | async-tiff | Notes |
|---|---|---|---|
| **Cold-start (1st read)** | ~192ms | ~200ms | Comparable |
| **Cached (2nd+ read)** | **~0.6ms** | N/A | GDAL's internal cache |
| **Sequential (reuse)** | N/A | **~40ms** | async-tiff connection pooling |
| **Concurrent (5×)** | N/A | **~31ms/read** | async-tiff parallel I/O |

### Key Findings

**GDAL VSI** excels at repeated reads from the same dataset:
- First read: ~192ms (cold start with 4 HTTP requests)
- Subsequent reads: **~0.6ms** (GDAL's internal block cache keeps tiles in memory)
- This makes GDAL ideal for workflows that open a dataset once and read many blocks

**async-tiff** excels at concurrent access:
- Each cold start: ~200ms (no internal caching, truly fresh each time)
- Sequential with connection reuse: **~40ms** per read (5× faster than cold)
- Concurrent (5 parallel): **~31ms** per read (6× faster than cold)
- This makes async-tiff ideal for server workloads serving many simultaneous requests

### Why async-tiff is Faster for Concurrent Reads

1. **Parallel range requests**: async-tiff fetches TIFF header + tile data simultaneously
2. **Connection pooling**: `object_store` reuses HTTP connections across requests
3. **No directory listing**: Direct S3 object access without `ListObjects` overhead
4. **Async I/O**: Non-blocking reads allow the event loop to interleave multiple requests

### Trade-offs

| Factor | GDAL VSI | async-tiff |
|---|---|---|
| Single-read latency | ~192ms | ~200ms |
| Repeated reads | **~0.6ms** (cached) | ~40ms (connection reuse) |
| Concurrent reads | Blocks threads | **~31ms/read** (async) |
| Memory usage | High (block cache) | Low (no cache) |
| Maturity | Production-ready | Beta (0.3.0) |
| Compression support | Full | Deflate, LZW, JPEG, ZSTD |

### When to Use Each

**GDAL VSI** when:
- Reading many blocks from the same dataset (e.g., `.apply()` workflows)
- You need full compression format support
- Memory is not a constraint

**async-tiff** when:
- Serving many concurrent requests (e.g., tile server)
- You want predictable latency without cache effects
- Memory efficiency matters (no block cache)