n3gb-rs 0.2.2

A Rust implementation of a hierarchical hex-based spatial indexing system based on the OSGB National Grid.
Documentation
# Architecture

THIS WAS CREATED BY CLAUDE AND REVIEWED BY CHRIS CARLON.

I will change this as and when things spring to mind.

A guided on how `n3gb-rs` works internally, and a cookbook for changing and
extending it. 

## The one-paragraph mental model

A coordinate — either British National Grid (BNG / EPSG:27700) easting/northing,
or WGS84 lon/lat that gets **projected to BNG first** — is mapped to an integer
`(row, col)` on an offset hexagon lattice. The lattice spacing comes from per-zoom
constant tables. That `(row, col)` yields a hexagon **center** point, and the center +
zoom level is encoded into a 19-byte, checksummed, URL-safe Base64 **id**. A
`HexCell` is one such cell; a `HexGrid` is a collected, spatially-indexed
`Vec<HexCell>`. Everything else in the crate is either (a) turning geometry into
cells, or (b) turning cells into an output (Arrow / GeoParquet / CSV).

> **Important:** unlike H3, the n3gb index is **not hierarchical**. Each zoom level
> is an independent lattice — there is no built-in parent/child relationship
> between a cell at zoom 10 and one at zoom 11.

## 1. Module map

All modules are declared private in `lib.rs`; the public API is defined entirely
by the `pub use` re-export block (`src/lib.rs:206`). To change what users can see,
you edit that block.

```
lib.rs                  crate root — module declarations + public re-exports
├── cell.rs             HexCell (single hexagon)            → coord, error, geom, index, io
├── grid.rs             HexGrid + HexGridBuilder            → cell, coord, error, index, io
├── coord/              Crs, ConversionMethod, Coordinate trait, BNG transforms
│   ├── mod.rs
│   └── bng_transformations.rs
├── geom/               create_hexagon, parse_geometry      (leaf — no internal deps)
│   ├── mod.rs
│   ├── hexagon.rs
│   └── parse.rs
├── index/              row/col math, Base64 id encode/decode, constant tables
│   ├── mod.rs
│   ├── constants.rs
│   ├── indexing.rs
│   └── identifier.rs
├── dimensions.rs       HexagonDims — pure hexagon math      → error
├── error.rs            N3gbError enum + From conversions     (leaf)
└── io/                 Arrow / GeoParquet / CSV export
    ├── mod.rs
    ├── arrow.rs
    ├── parquet.rs
    └── csv.rs
```

**Dependency rule worth remembering:** `geom/` is a leaf module — it must not
depend on `cell` (that would be circular, since `cell` uses `geom`). Domain logic
that *creates* `HexCell`s lives in `cell.rs`, not in `geom/`.

## 2. The data model

### `HexCell` (`src/cell.rs:41`)

```rust
pub struct HexCell {
    pub id: String,          // URL-safe Base64 unique identifier
    pub center: Point<f64>,  // BNG coordinates of the hexagon center
    pub zoom_level: u8,      // 0–15
    pub row: i64,            // offset-lattice row
    pub col: i64,            // offset-lattice column
}
```

Key methods:

| Method | Where | What it does |
|--------|-------|--------------|
| `from_bng(coord, zoom)` | `cell.rs:173` | One cell from a BNG coordinate |
| `from_wgs84(coord, zoom, method)` | `cell.rs:203` | Projects WGS84 → BNG, then indexes |
| `from_hex_id(id)` | `cell.rs:76` | Reconstructs a cell from its id string |
| `from_geometry(geom, zoom, crs, method)` | `cell.rs:217` | Dispatches any `geo_types::Geometry``Vec<HexCell>` |
| `from_line_string_bng/_wgs84` | `cell.rs:92/148` | All unique cells a line passes through |
| `easting()` / `northing()` | `cell.rs:342/347` | Center X / Y |
| `to_polygon()` | `cell.rs:355` | Hexagon boundary as a `geo_types::Polygon` |
| `grid_distance(&other)` | `cell.rs:328` | Hex-step distance (same zoom required) |

### `HexGrid` (`src/grid.rs:51`)

```rust
pub struct HexGrid {
    cells: Vec<HexCell>,
    index: HashMap<(i64, i64), usize>,  // (row, col) → position in `cells`
    zoom_level: u8,
}
```

The `HashMap` is what makes `get_cell_at(point)` (`grid.rs:375`) an O(1) lookup
instead of a linear scan. Constructors come in BNG and WGS84 pairs:
`from_bng_extent` / `from_wgs84_extent`, `from_rect`, `from_bng_polygon` /
`from_wgs84_polygon`, `from_bng_multipolygon` / `from_wgs84_multipolygon`. Query
helpers: `len`, `is_empty`, `cells`, `iter`, `filter`, `to_polygons`. Both
`&HexGrid` and `HexGrid` implement `IntoIterator`.

### `HexGridBuilder` (`src/grid.rs:456`)

The chainable front door, and the most ergonomic way to build a grid:

```rust
let grid = HexGrid::builder()
    .zoom_level(12)
    .bng_extent(&(300_000.0, 300_000.0), &(350_000.0, 350_000.0))
    .build()?;
```

Setters mirror the constructors (`rect`, `bng_extent`, `wgs84_extent`,
`bng_polygon`, `wgs84_polygon`, `bng_multipolygon`, `wgs84_multipolygon`,
`conversion_method`). `build()` validates and produces the `HexGrid`.

## 3. The spatial pipeline (step by step)

This is the heart of the crate. Follow one coordinate from input to id.

**Step 0 — (optional) WGS84 → BNG.** `coord/bng_transformations.rs::convert_to_bng`
dispatches on `ConversionMethod`:

- `Ostn15` (**default**) — uses the `lonlat_bng` crate. The OSTN15 grid-shift data
  is embedded at compile time, so it's ~1 mm accurate with **no system
  dependencies**. This is why it's the default: a self-contained, reliable path.
- `Proj` — uses the system PROJ library via the `proj` crate. A `Proj` object is
  cached thread-locally (lazy init, once per thread). Accuracy is ~1 mm *with* the
  OSTN15 grid file installed, ~5 m (Helmert fallback) without. PROJ is a heavier,
  riskier build dependency — hence it's opt-in.

**Step 1 — BNG point → (row, col).** `index/indexing.rs::point_to_row_col`:

```text
dx = CELL_WIDTHS[zoom]
dy = 1.5 * CELL_RADIUS[zoom]
qx = (x - GRID_EXTENTS[0]) / dx
ry = (y - GRID_EXTENTS[1]) / dy
row = round(ry)
col = round(qx - (row mod 2))      // odd rows shift half a cell — offset coords
```

**Step 2 — (row, col) → center.** `row_col_to_center` is the exact inverse:

```text
x = GRID_EXTENTS[0] + col*dx + (row mod 2)*(dx/2)
y = GRID_EXTENTS[1] + row*dy
```

**Step 3 — center + zoom → id.** `index/identifier.rs::generate_hex_identifier`
packs a 19-byte buffer, then URL-safe Base64 (no padding):

| Bytes | Field |
|-------|-------|
| 0 | version (`IDENTIFIER_VERSION = 1`) |
| 1–8 | `easting × 1000` as big-endian `u64` |
| 9–16 | `northing × 1000` as big-endian `u64` |
| 17 | zoom level |
| 18 | checksum (wrapping sum of bytes 0–17) |

`SCALE_FACTOR = 1000` preserves 3 decimal places (millimetre precision).

**Step 4 — id → components (round trip).** `decode_hex_identifier` Base64-decodes,
checks length is 19, verifies the checksum, checks the version, then divides the
integers back by 1000.

### Worked example (Manchester-ish, zoom 10)

```text
WGS84 (lon -2.248, lat 53.481)
  → OSTN15 → BNG (easting ≈ 385500.123, northing ≈ 339874.456)
  → dx = CELL_WIDTHS[10] = 130.0, dy = 1.5 * 75.06 = 112.59
  → row = round(339874.456 / 112.59) = 3019
  → col = round(385500.123/130 - (3019 mod 2)) = 2964
  → center ≈ (385385, 340213)
  → id bytes [1, <385500123 be>, <339874456 be>, 10, checksum]
  → Base64 string
```

### Cube coordinates and distance

`offset_to_cube(row, col)` converts to cube coords (`q = col - row/2`, `r = row`,
`s = -q - r`, with `q + r + s = 0`). This is the bridge used by
`grid_distance`, which is Manhattan distance in cube space. Cells must share a
zoom level or you get `N3gbError::ZoomLevelMismatch`.

## 4. The constant tables (the tuning knobs)

`index/constants.rs` *is* the grid definition. Three length-16 arrays, indexed by
zoom:

- `GRID_EXTENTS = [0.0, 0.0, 750000.0, 1350000.0]``[min_x, min_y, max_x, max_y]`,
  the BNG bounds of Great Britain.
- `CELL_RADIUS[16]` — center-to-vertex distance per zoom (zoom 0 ≈ country scale,
  zoom 15 ≈ 0.58 m).
- `CELL_WIDTHS[16]` — hexagon width per zoom (zoom 15 ≈ 1 m).
- `MAX_ZOOM_LEVEL = 15`, `IDENTIFIER_VERSION = 1`.

The relationship the pipeline relies on: `dy = 1.5 * radius`, `dx = width`. Editing
a row in these arrays changes cell sizes at that zoom — and therefore changes
every id produced at that zoom (see the versioning note in §9).

## 5. Geometry helpers (`geom/`)

- `hexagon.rs::create_hexagon(center, size)` — builds a **pointy-top** hexagon:
  vertices at angles `30° + 60°·i`, 7 coordinates total (the last repeats the
  first to close the ring). This backs `HexCell::to_polygon()`.
- `parse.rs::parse_geometry(s)` — auto-detects format: a leading `{` means
  **GeoJSON**, otherwise **WKT**. Both funnel through `geo_types::Geometry`. A
  GeoJSON `FeatureCollection` is rejected (single geometries/features only).

## 6. Dimensions (`dimensions.rs`)

A standalone hexagon-math utility, *not* part of the indexing pipeline.
`HexagonDims` holds side, circumradius, apothem, across-corners, across-flats,
perimeter, area. Construct it from any one measurement: `from_side`,
`from_circumradius`, `from_apothem`, `from_across_flats`, `from_across_corners`,
`from_area`, plus `bounding_box`. All validate positive inputs and return
`N3gbError::InvalidDimension` otherwise.

## 7. IO layer (`io/`)

The export API is built from two **blanket traits** implemented for any
`T: AsRef<[HexCell]>`. That single bound means they work uniformly on `&HexCell`,
`Vec<HexCell>`, and `&[HexCell]` — a lone cell is handled via
`std::slice::from_ref(self)`.

- `HexCellsToArrow` (`io/arrow.rs:20`): `to_arrow_points()`, `to_arrow_polygons()`
  (parallelised with rayon), `to_record_batch()`. The record batch has 7 columns:
  `id`, `zoom_level`, `row`, `col`, `easting`, `northing`, `geometry`. CRS metadata
  is EPSG:27700.
- `HexCellsToGeoParquet` (`io/parquet.rs:47`): `to_geoparquet(path)` builds the
  record batch then calls `write_geoparquet`, which uses WKB geometry encoding and
  appends GeoParquet key-value metadata.

CSV (`io/csv.rs`) is configuration-driven via `CsvHexConfig` (builders
`new` / `from_coords`, plus `.exclude()`, `.crs()`, `.with_hex_geometry()`,
`.conversion_method()`, `.hex_density()`). Input can be a single geometry column
(`CoordinateSource::GeometryColumn`) or separate X/Y columns
(`CoordinateColumns`); optional hex geometry output is WKT or GeoJSON
(`GeometryFormat`). `csv_to_hex_csv` streams input→output and can aggregate to a
per-hex density count.

## 8. Error model (`error.rs`)

`N3gbError` is the single error type (11 variants: invalid id length, bad
checksum, unsupported version, invalid zoom/dimension, base64 decode, projection,
io, csv, geometry parse, zoom mismatch). It implements `Display` + `std::error::
Error`, and has `From` conversions for `std::io::Error`, `csv::Error`,
`arrow_schema::ArrowError`, and `parquet::errors::ParquetError` — which is what
lets `?` propagate cleanly across the IO boundary. Convention: every fallible
public function returns `Result<T, N3gbError>`.

## 9. How to extend it (cookbook)

Each recipe names the file(s) to touch and the existing pattern to copy.

**Add a HexCell query method** (e.g. `neighbors()`)
→ Add an `impl HexCell` method in `cell.rs`. Use the cube-coordinate helpers from
`index` (`offset_to_cube`) for neighbour math. Inherent methods are public
automatically — no re-export needed.

**Add a constructor from a new geometry type**
→ Follow `from_line_string_bng` in `cell.rs`. If it accepts WGS84 input, add the
matching `convert_*_to_bng` helper in `coord/bng_transformations.rs` and provide a
`_wgs84` variant alongside the `_bng` one.

**Add a new grid source**
→ Add a `from_*` constructor on `HexGrid` (`grid.rs`) that produces a
`Vec<HexCell>` and feeds the internal index-building path. Then add the matching
setter on `HexGridBuilder` and wire it into `build()`.

**Add or retune a zoom level**
→ Edit the arrays in `index/constants.rs`. Keep all three length-16 arrays in sync;
bump `MAX_ZOOM_LEVEL` if you add entries. ⚠️ This changes the ids produced at the
affected zooms — coordinate with the versioning note below.

**Add a new output format** (e.g. GeoJSON export)
→ Add a trait in `io/` over `T: AsRef<[HexCell]>`, mirroring `HexCellsToArrow`,
then re-export it from `lib.rs`.

**Add a new error case**
→ Add a variant to `N3gbError`, a matching arm in its `Display` impl, and a `From`
impl if it wraps a foreign error type.

**Change the id binary format**
→ Bump `IDENTIFIER_VERSION` in `constants.rs` and branch on the version byte in
`decode_hex_identifier` so existing ids still decode.

**Every change:**
- Re-export any new public type/function from the `pub use` block in `lib.rs`
  (`src/lib.rs:206`) — that block defines the entire public surface.
- Add tests as `#[cfg(test)] mod tests` **in the same file** (this repo keeps
  tests inline; there is no `tests/` directory).
- If it's a user-facing entry point, add an example under `examples/`.

## 10. Build, test, run

```bash
cargo test                       # inline unit tests
cargo run --example cell_basics  # 5 examples — see below
cargo clippy --all-targets       # lints (Cargo.toml sets clippy all = warn)
cargo fmt                        # formatting
```

PROJ system library is needed **only** for `ConversionMethod::Proj` and the
`check_proj` example (`brew install proj` / `apt-get install libproj-dev`). The
default `Ostn15` path needs nothing extra.

**CI today:** `.github/workflows/ci.yml` runs `cargo audit` + `cargo test`;
`.github/workflows/security.yml` runs cargo-deny (push/PR + daily). There are no
`fmt` / `clippy` / `doc` CI jobs yet — a known gap.

## 11. Suggested reading order

Read each example with its underlying module open alongside:

1. `examples/cell_basics.rs``HexCell` creation, ids, round-trips → `cell.rs`, `index/`
2. `examples/grid_distance.rs` — spatial relationships, `grid_distance``index/indexing.rs`
3. `examples/grid_from_polygon.rs` — the builder + polygon clipping → `grid.rs`
4. `examples/line_coverage.rs` — line/geometry coverage → `cell.rs`
5. `examples/check_proj.rs` — WGS84 conversion backends → `coord/`

## 12. Known future work (from `TODO.md`)

- Grid traversal (path/cells-between) beyond the single-cell `grid_distance`.
- Geometry validation / bounds checking on input.
- Hierarchical parent/child navigation — non-trivial because the index is not
  hierarchical; would mean re-indexing the same center at another zoom.