ndatafusion 0.0.1-reserve.0

Extensions and support for linear algebra in DataFusion
# AGENTS.md

Repository-level instructions for human and LLM contributors.

## Scope

Applies to the entire repository.

## Mission

Build `ndatafusion` into a production-grade DataFusion extension crate that exposes `nabled`
capabilities through explicit Arrow/DataFusion contracts without forking numerical semantics from
`nabled`.

## Mandatory Context Bootstrap

Before making architectural or broad refactor changes, read these in order:

1. `docs/README.md`
2. `docs/DECISIONS.md`
3. `docs/CAPABILITY_MATRIX.md`
4. `docs/EXECUTION_TRACKER.md`
5. `docs/PUBLISH_CHECKLIST.md` when release, docs.rs, or publish posture is relevant
6. `docs/ARCHITECTURE.md`
7. `docs/STATUS.md`

Do not infer status from memory. Use:

1. `docs/STATUS.md` for current project state.
2. `docs/CAPABILITY_MATRIX.md` for scope gaps and v1 sufficiency.
3. `docs/EXECUTION_TRACKER.md` for `Done / Next / Needed` execution state.
4. Resume from the highest-priority open `N-*` item in `docs/EXECUTION_TRACKER.md` unless
   explicitly redirected.
5. If the highest-priority open `N-*` item is an upstream prerequisite in `../ndarrow` or
   `../nabled`, complete that work first before resuming local `ndatafusion` implementation.

## Non-Negotiable Constraints

1. `ndatafusion` is a DataFusion-facing facade over `nabled`; it does not own new numerical
   kernels.
2. `nabled` remains the source of truth for algorithm semantics, provider/backend/kernel routing,
   and Arrow boundary assumptions.
3. Capability parity matters more than 1:1 overload mirroring.
4. Admit only natural Arrow/DataFusion contracts with explicit shape and result semantics.
5. No hidden copy-heavy conversions in hot paths.
6. Prefer zero-copy row extraction and explicit materialization only when the output contract
   requires it.
7. Preserve `nabled` feature forwarding exactly, except `nabled/arrow`, which is part of the base
   `ndatafusion` contract and is enabled unconditionally.
8. Keep execution-axis terminology explicit and consistent:
   - `Provider` = decomposition implementation source
   - `Backend` = primitive-kernel execution target
   - `Kernel` = operation-family backend contract
9. Use the new module layout only. Do not create `mod.rs`.
10. Prefer direct batch-native delegation into stabilized lower-layer Arrow APIs. Treat row/cell
    codecs as fallback-only when the lower layers do not yet admit the needed contract.
11. Cross-repo prerequisite work must not regress correctness or performance in already admitted
    lower-layer public behavior.
12. Do not narrow or remove broader lower-layer public APIs solely because `ndatafusion` chooses a
    stricter SQL-facing contract surface.

## Repository Rules

1. Treat `.justfile` as the canonical task runner.
2. Run `just checks` after every legitimate checkpoint.
3. `just checks` is not optional. It runs:
   - `cargo +nightly fmt --all -- --check --config-path ./rustfmt.toml`
   - nightly/stable clippy over the supported `no-default-features` feature matrix, including
     provider-gated and accelerator-gated combinations
   - all unit and integration tests via `just test`
4. Keep the `e2e` integration test target present, because `.justfile` expects
   `cargo test --test e2e`.
5. Before any commit is pushed, line coverage must be greater than 90%. Use `cargo llvm-cov` via
   the coverage commands in `.justfile` to verify this.
6. Clippy pedantic lints are enabled in `Cargo.toml` and must be addressed.
7. Do not silence pedantic lints with `allow` unless there is no viable alternative.
8. Prefer fixing the underlying code. If a conscious assertion is needed, `expect` is preferable
   to `allow`.
9. Any unavoidable lint exception must be narrowly scoped and justified.
10. The guiding principle for all design and implementation work is:
    - `algebraic, homomorphic, compositional built from denotational semantics`

## Quality Gates

Run and pass before finalizing:

1. `just checks`

Coverage expectation:

1. Keep line coverage greater than 90% as a hard gate for merge/push.
2. If coverage drops below this threshold, add meaningful tests before finalizing.
3. Prefer meaningful coverage over synthetic assertions.

## Test Placement

1. Unit tests in module-local `#[cfg(test)]` blocks.
2. `tests/` for cross-module integration/e2e behavior.

## Documentation Discipline

When architecture or behavior changes, update docs in the same change set:

1. `README.md` for user-visible behavior.
2. `docs/STATUS.md` for migration and implementation truth.
3. `docs/EXECUTION_TRACKER.md` for execution progression (`Done / Next / Needed`).
4. `docs/CAPABILITY_MATRIX.md` for scope and sufficiency updates.
5. `docs/ARCHITECTURE.md` if the target design or sequencing changes.
6. `docs/DECISIONS.md` when assumptions or constraints are locked or changed.
7. `docs/PUBLISH_CHECKLIST.md` when release posture, docs.rs assumptions, or publish blockers
   change.
8. Relevant rustdoc comments for API contract changes.

## Reference Sources

1. Use `../ndarrow` as the primary Arrow/ndarray contract reference.
2. Use `../nabled` as the primary numerical semantics and Arrow façade reference.
3. Use `../../packages/datafusion` to validate DataFusion interfaces and compatibility details.
4. Use `../../packages/datafusion-functions-json` as a reference for standalone DataFusion function
   library structure.
5. When dependency work touches Arrow compatibility, verify `datafusion`, `ndarrow`, and `nabled` Arrow
   versions explicitly before changing versions.