eorst 1.0.1

Earth Observation and Remote Sensing Toolkit - library for raster processing pipelines
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

## [1.0.1] - 2026-05-13

### Added

- `OutputFormat` enum and `OutputConfig` struct for controlling output format: plain GeoTIFF, GeoTIFF with overviews, or Cloud Optimized GeoTIFF (COG)
- `translate_to_cog()` in `gdal_utils` — in-process COG translation using the GDAL COG driver, replacing the memory-hungry subprocess approach
- `build_overviews()` on `ParallelGeoTiffWriter` — in-place overview building with multithreading
- `_cog` variants: `apply_cog()`, `apply_with_mask_cog()`, `apply_reduction_with_mask_cog()`, `rasterize_cog()` — produce COG output directly from processing pipelines
- CLI flags `--output-format`, `--compression`, `--overview-resampling`, `--overview-levels` on `rasterize` and `warp` commands
- Tests for overviews, COG translation, and config defaults
- `Extent::snap_to_grid(resolution)` — rounds extent coordinates outward to align with a resolution grid
- `Extent::union(other)` — computes bounding-box union of two extents
- `compute_raster_union_extent(files, target_epsg)` — computes the union extent of multiple raster files, reprojecting each input's corners to the target CRS via GDAL's `CoordTransform` API
- `compute_vector_extent(vector_path, target_epsg)` — computes the extent of a vector layer, reprojecting to the target CRS if needed
- `mosaic()` and `mosaic_keep_inputs()` now accept optional `extent` and `resolution` parameters — when provided, these are passed as `-te` and `-tr` flags to `gdalwarp`

### Changed

- `rasterize()` rewritten to use in-memory GDAL MEM driver + parallel writer instead of intermediate files + mosaic step — eliminates temporary file I/O and subprocess overhead
- `mosaic()` and `mosaic_keep_inputs()` signature changed: added `extent: Option<Extent>` and `resolution: Option<f64>` parameters. All internal callers pass `None, None` for backward compatibility.

## [1.0.0] - 2026-05-12

### Added

- New `parallel_writer` module with `ParallelGeoTiffWriter`, `create_output_geotiff()`, and `write_block()` — enables direct parallel windowed writes to GeoTIFF, eliminating intermediate block files and subprocess-based mosaic/translate phase
- New `trimm_array3_asymmetric()` in `array_ops` for asymmetric overlap trimming on 3D arrays
- New `get_vs_get_remote` benchmark in `rss_core` — compares full pipeline (query → fetch → read) for `.get()` (download to disk) vs `.get_remote()` (VSI direct read from S3/HTTP). Gated behind `bench_live` feature flag for CI compatibility.
- New `apply_with_mask` method on `RasterDataset` for dual-dataset processing with `RasterDataBlock` metadata
- New `apply_reduction()` — like `reduce()` but worker receives `&RasterDataBlock<R>` with metadata (layer indices, dates, etc.)
- New `apply_reduction_with_mask()` — like `reduce_with_mask()` but workers receive `&RasterDataBlock` pairs
- New `apply_reduction_row_pixel()` — renamed from `reduce_row_pixel()` (same signature)
- New `apply_reduction_row_pixel_with_mask()` — renamed from `reduce_row_pixel_with_mask()` (same signature)
- New `apply_mosaic()` — renamed from `mosaic()`
- Added `Select` trait for name-based layer and time selection on `RasterDataBlock`
  - `select_layers(&[&str]) -> Result<RasterDataBlock<T>, SelectError>` - select multiple layers by name
  - `select_times(&[DateType]) -> Result<RasterDataBlock<T>, SelectError>` - select multiple times by date index
  - Fluent chaining: `.select_layers(...)?.select_times(...)?`
  - `SelectError` enum with `LayerNotFound`, `TimeNotFound`, `EmptySelection`, `ConcatenationError` variants
  - `available_layer_names()` and `available_time_indices()` convenience methods on `RasterDataBlock`
- Added comprehensive documentation to all public types, traits, and functions
  - Documented RasterType trait, Extent, Layer, RasterBlock, RasterMetadata, etc.
  - Added docs to utility functions and type aliases
  - Documented Filters trait for image processing
- Extended docstrings with detailed explanations and code examples for commonly used functions
  - Added examples for `iter()`, `process()`, `reduce()`, `from_source()`, `from_stac_query()`, `from_scratch()`
  - Explained key concepts like block-based processing and dimension reduction
- Added individual container builds for all apps (eo-sample, eo-warp, eo-rasterize)
  - Updated flake.nix packagesList to include all apps
  - Updated GitLab CI/CD to build separate container artifacts
- Added integration tests for eorst library
  - Tests for RasterDataset creation, iteration, and block reading
  - Located in `libs/eorst/tests/integration.rs`

### Changed

- **1.0.0 Release**: Major rewrite of processing API. Worker functions now receive `&RasterDataBlock<R>` with metadata instead of raw arrays.
- **Parallel writer rewrite**: `RasterDataset::apply()` now uses the parallel writer directly — pre-creates a single output GeoTIFF and writes blocks in parallel via a mutex-guarded cached GDAL dataset (only opened once). Eliminates 100+ intermediate files and 4× subprocess spawning (gdalwarp/gdalbuildvrt/gdal_translate). Benchmarks show ~2× speedup over previous block-file approach and ~17% faster than Python rioxarray baseline.
- **Zero-copy OpenCV filters**: `Filters` trait now uses `Mat::from_slice` (borrow input) and `Mat::new_rows_cols_with_data_mut` (write output in-place) — eliminates 2 memory copies per filter operation. Old `arrayview2_to_mat` and `mat_to_array2` functions deprecated but retained for backward compatibility.
- Renamed `process_t` to `apply` for clarity
  - Worker signature now returns `anyhow::Result<Array4<U>>` instead of `Array4<U>`
  - `apply` returns `anyhow::Result<()>` and propagates worker errors via `?`
  - Enables semantic layer selection with `?` operator throughout worker code
- `DataSource` now carries `layer_names` field, propagated from `DataSourceBuilder`
  - `DataSourceBuilder::bands()` updates `layer_names` to match selected band count
  - `RasterDatasetBuilder` propagates `layer_names` to `RasterMetadata`
  - Enables `set_names()` to work correctly with `apply` + `Select` API
- Fixed rayon thread pool leak: time-step assembly now runs inside `pool.install()` to respect `n_cpus` in both `apply` and `apply_with_mask`
- Migrated all 12 example files from deprecated `reduce`/`mosaic` methods to `apply_*` API with `&RasterDataBlock<T>` worker signatures
- Updated `test_reduce` unit test to use `apply_reduction` with `RasterDataBlock`
- Optimized container sizes by separating build and runtime dependencies
  - Removed dev tools from runtime (clippy, rustfmt, llvm, cmake)
  - Reduced container size from ~1GB to ~525MB (50% smaller)
  - Added minimal runtime deps (gdal, openssl, lightgbm, libgccjit)
- Fixed clippy warnings across the codebase
  - Fixed empty lines after doc comments
  - Removed useless use of `format!` and `vec!`
  - Improved `&Vec<Geometry>` to `&[Geometry]` slice references

### Fixed

- Repaired 6 pre-existing compilation errors in `zonal_stats.rs` (missing imports, type mismatches, histogram API changes)
- Fixed type mismatch in `sampling.rs`: `n_block_cols` helper now correctly converts `RasterDataShape` to `ImageSize`
- Fixed doc test compilation errors in RasterDataset and RasterBlockSlice2
  - Added explicit type annotations where required
  - Fixed imports for RasterDataset and Filters traits
- Fixed examples to run from any directory using `CARGO_MANIFEST_DIR` for path resolution

### Refactored

- Consolidated GDAL command execution, thread pool creation, and mosaic cleanup into reusable helpers in `gdal_utils` — eliminates duplicated `.spawn().expect().wait()` and `ThreadPoolBuilder` patterns across 15+ locations
- Replaced 5 individual `Dataset::open` helpers in `RasterDatasetBuilder` with the shared `read_basic_raster_info()` — removes ~62 lines of duplication in `get_extent()`
- Extracted `read_raster_band<T>()` from 3 duplicated locations in `io.rs` and `processing.rs`
- Extracted `write_time_step_blocks()` from 3 duplicated locations in `processing.rs`
- Extracted `sample_value()` from 2 duplicated `SamplingMethod` match blocks in `sampling.rs`
- Extracted `histograms_to_dataframe!` macro from 2 duplicated blocks in `zonal_stats.rs`
- Consolidated sampling block processing in `sampling.rs`: extracted `validate_buffer_size()`, `make_rectangle()`, `build_block_index_pipeline()`, `collect_points_for_block()`, and generic `assemble_block_results<K>()` — eliminates ~80 lines of duplicated code between `extract_blockwise()` and `extract()`
- Consolidated block construction in `builder.rs`: unified `n_block_cols`, `block_col_row`, `get_block_gt`, and `block_from_id` into single shared helper functions used by both `build()` and `from_scratch()` paths — eliminates ~60 lines of duplicated block attribute construction

### Deprecated

- `reduce()` — use `apply_reduction()` instead. The new method's worker receives `&RasterDataBlock<R>` with metadata.
- `reduce_with_mask()` — use `apply_reduction_with_mask()` instead. Workers receive `&RasterDataBlock` pairs.
- `reduce_row_pixel()` — use `apply_reduction_row_pixel()` instead (same signature).
- `reduce_row_pixel_with_mask()` — use `apply_reduction_row_pixel_with_mask()` instead (same signature).
- `mosaic()` — use `apply_mosaic()` instead.

### Removed

- `process()` — use `apply()` instead: `rds.apply(|rdb| Ok(worker(&rdb.data)), n_cpus, out)?`
- `process_with_mask()` — use `apply_with_mask()` instead
- `process_block()` (private helper)
- `assemble_time_step()` (private helper)
- `process_hybrid()` and `process_block_hybrid()` — deprecated since 0.3.2, zero callers, fully removed
- `data/**/*` and `examples/**/*` removed from published crate tarball (still available in git repo)

## [0.3.1]

### Changed

- Updated `ndhistogram` 0.10 → 0.12
  - Migrated from removed `azip!` macro to `ndarray::azip!`
  - Updated `Histogram` trait import path to `ndhistogram::Histogram`
  - Updated `UniformNoFlow` import path to `ndhistogram::axis::UniformNoFlow`
  - Replaced `VecHistogram` iteration with `Histogram::iter()` trait method

- Updated `polars` 0.51 → 0.53
  - Replaced removed `Series::new()` with `Column::new()`
  - Replaced `DataFrame::new(columns)` with `DataFrame::new_infer_height(columns)`
  - Replaced removed `concat()` for DataFrames with `polars::functions::concat_df_diagonal()`
  - Added `diagonal_concat` feature flag for DataFrame concatenation

- Updated `opencv` 0.92 → 0.98
  - Updated `Result` type import from `opencv::result::Result` to `opencv::Result`

- Updated `lgbm` to latest version
  - Migrated imports from submodule paths to crate-root re-exports (`lgbm::Booster`, `lgbm::Dataset`, etc.)

### Removed

- Removed dead code causing compiler warnings:
  - Unused `UtilsDimension` type alias in `builder.rs`
  - Unused `raster_from_size` function in `processing.rs` (duplicate of `gdal_utils::raster_from_size`)
  - Unused `create_temp_file` function in `io.rs` (not called within module)
  - Unused `BlockSize` import in `processing.rs` (no longer needed after `raster_from_size` removal)