feral-amd
Approximate Minimum Degree (AMD) fill-reducing ordering for sparse symmetric matrices, implemented in pure Rust using the in-place quotient-graph algorithm of Amestoy, Davis and Duff (1996, 2004).
- Status: feature-complete (Slice A + Slice B of the development plan). Byte-for-byte match with the SuiteSparse AMD reference on the pinned oracle fixture suite.
- License: MIT
- Dependencies: none in the runtime library.
- MSRV: stable Rust, edition 2021. No
unsafe.
What AMD does
Given the sparsity pattern of a symmetric matrix A, AMD computes a
permutation P such that the Cholesky factorization P A Pᵀ = L Lᵀ
(or symmetric indefinite P A Pᵀ = L D Lᵀ) introduces far fewer
non-zeros than the natural ordering. "Fill" is the set of zero
entries in A that become non-zero in L; minimising fill is NP-hard
in general, so AMD is a greedy heuristic that in practice rivals
nested dissection on small-to-medium matrices and is much cheaper to
compute.
The canonical reference algorithm is Tim Davis's amd_2.c inside
SuiteSparse (BSD-3-Clause). feral-amd is a clean-room Rust
transliteration, cross-checked line-by-line against the in-tree
faer-rs port of the same
algorithm.
Literature
- Amestoy, P. R., Davis, T. A., and Duff, I. S. (1996). An Approximate Minimum Degree Ordering Algorithm. SIAM Journal on Matrix Analysis and Applications, 17(4), 886–905. Introduces the approximate external degree bound that makes the algorithm O(|A|) per pivot instead of O(|A|·n).
- Amestoy, P. R., Davis, T. A., and Duff, I. S. (2004). Algorithm 837: AMD, an Approximate Minimum Degree Ordering Algorithm. ACM Transactions on Mathematical Software, 30(3), 381–388. Describes the implementation details pinned down in SuiteSparse: dense-row handling, mass elimination, supervariable detection, the in-place quotient graph, and garbage collection.
- George, A. (1973). Nested Dissection of a Regular Finite Element Mesh. SIAM Journal on Numerical Analysis, 10(2), 345–363. The alternative ordering paradigm; AMD is complementary rather than competitive — they target different matrix regimes.
- Davis, T. A., Rajamanickam, S., and Sid-Lakhdar, W. M. (2016). A Survey of Direct Methods for Sparse Linear Systems. Acta Numerica, 25, 383–566. Places AMD in the broader ecosystem of direct methods.
Full BibTeX entries live in ../../dev/references.bib. Curated
organisation-level notes are in ../../.crucible/wiki/concepts/approximate-minimum-degree.org
and ../../.crucible/wiki/summaries/amestoy1996-amd.org / amestoy2004-amd-implementation.org.
Algorithm
AMD operates on the quotient graph G/S of A, where S is the set
of supervariables eliminated so far. At each step it:
- Selects the lowest-degree supervariable
mefrom degree buckets indexed bymindeg(linear scan, LIFO tie-break). - Forms a new element by merging
me's adjacency with every element already inme's list. Supervariables that were in those elements are absorbed into the new element; the old elements are marked dead (standard absorption,amd_2.caround line 355). - Updates the approximate external degree of every surviving
neighbour using the two-pass
w[e]trick from the 1996 paper. Dead elements encountered during the degree computation are folded intomeon the spot (aggressive absorption). - Mass-eliminates neighbours
iwhose only remaining element ismeand whose outside variable list is empty — they pivot concurrently withme. - Detects and merges supervariables that are structurally indistinguishable: variables with the same adjacency hash, same element count and variable count, and literally the same neighbour set. Merged variables share a single pivot block in the final permutation.
- Re-inserts the surviving neighbours into the degree buckets with their updated approximate degree.
Dense-row handling
Variables whose initial degree exceeds max(16, min(n, α·√n)) with
α = 10 by default are classified as "dense" and deferred to the end
of the permutation. This is a structural device — it keeps a few hub
vertices (e.g. an arrow-matrix's central row) from dominating the
degree computation for every other variable. A negative α disables
the heuristic.
Garbage collection
The algorithm maintains its entire quotient-graph state in a single
Vec<i32> of length nzaat + nzaat/5 + n (integer division). When
the workspace cursor pfree catches up with the end of that arena,
an inline compaction sweep slides every live adjacency list down to
close the holes left by absorbed elements. The number of compactions
fired is reported as AmdStats::ncmpa.
Element post-order
After elimination the algorithm performs a postorder traversal of the AMD-internal assembly tree (roots = final pivots without a parent), using a big-child-last heuristic so the final permutation groups siblings by size. Supervariables absorbed during step 5 are expanded into their pivot block at this point, and dense-deferred variables are appended.
Usage
Library
use ;
// Full symmetric CSC pattern (both halves present, rows sorted).
// Example: 4x4 tridiagonal.
let col_ptr = ;
let row_idx = ;
let pattern = new.expect;
// Shortest form: just the permutation.
let perm = amd_order.expect;
assert_eq!;
// Permutation + diagnostic counters (compactions, flops, ...).
let = amd_order_with_stats.expect;
println!;
// Custom options.
let opts = AmdOptions ;
let _ = amd_order_opts;
The input must be the full symmetric pattern — both the upper and
lower triangles. If your matrix is stored as upper-triangular only,
symmetrise with A + Aᵀ - diag(A) before handing it to feral-amd.
CLI
$ cargo run --release --bin feral-amd -- path/to/triplet.txt
n: 5
nnz: 13
ncmpa: 0
n_dense_deferred: 0
ndiv: 4
nms_ldl: 4
nms_lu: 4
perm: 4 3 2 0 1
The triplet format is one row col pair per line, 0-indexed, with
# and % comments allowed. The CLI symmetrises the input.
Benchmark skeleton
$ cargo run --release --bin feral-amd-bench
arrow(200) n= 200 nnz= 598 ncmpa= 0 ndense= 1 ndiv= 199 time=0.005ms
band(100, 5) n= 100 nnz= 1070 ncmpa= 1 ndense= 0 ndiv= 488 time=0.017ms
band(500, 10) n= 500 nnz= 10390 ncmpa= 1 ndense= 0 ndiv= 4955 time=0.120ms
grid(30x30) n= 900 nnz= 4380 ncmpa= 1 ndense= 0 ndiv= 9331 time=0.163ms
grid(60x60) n= 3600 nnz= 17760 ncmpa= 1 ndense= 0 ndiv= 56165 time=0.513ms
Evidence that it works
External-oracle match
tests/data/amd_oracle/ contains seven fixtures whose reference
output was pinned by a throwaway harness running the BSD-licensed
amd 0.2.2 Rust crate (a port of
Timothy Davis's SuiteSparse AMD). For each fixture we store the
permutation and flop counters that the SuiteSparse implementation
produced.
tests/oracle_match.rs rebuilds each pattern from the same
programmatic generator the harness used, runs
amd_order_with_stats, and asserts exact equality — not just on
the permutation, but on ncmpa, ndiv, nms_ldl, nms_lu, and
n_dense_deferred.
| fixture | n | description | perm + stats match? |
|---|---|---|---|
diag_4 |
4 | pure diagonal | ✓ |
tridiag_10 |
10 | tridiagonal | ✓ |
arrow_5 |
5 | hub at 0 + diagonal | ✓ |
arrow_200 |
200 | hub at 0 + diagonal (tests dense) | ✓ |
band_20_3 |
20 | banded, bandwidth 3 (tests GC) | ✓ |
grid_7x7 |
49 | 2D 5-point stencil | ✓ |
amd_demo_24 |
24 | 6×4 grid (canonical demo substitute) | ✓ |
Two additional tests assert n_mass_elim > 0 on patterns where
mass elimination must fire (tridiag_10, band_20_3, grid_7x7), and
two assert n_supervar_merge > 0 on patterns where supervariable
detection must fire (band_20_3, grid_7x7). These guard against a
regression that silently disables either Slice B branch even if the
permutation still happens to line up.
Unit tests
src/ contains 36 #[cfg(test)] unit tests covering:
- Input validation in
CscPattern::new(bad lengths, monotone col_ptr, out-of-range row indices). AmdWorkspace::newfast paths: empty pattern, pure diagonal, dense-deferred hub, zero-degree variable,dense_alpha < 0.clear_flagwrap behaviour at thewbigboundary.flipinvolution.- Individual
run_eliminationfixtures (arrow, grid, band with GC). finalize_permutationbijection on every fixture.
Run everything with:
$ cargo test -p feral-amd
test result: ok. 36 passed; 0 failed; 0 ignored
test result: ok. 12 passed; 0 failed; 0 ignored
Static checks
$ cargo clippy -p feral-amd --all-targets -- -D warnings # clean
$ cargo fmt --check # clean
The library uses #![forbid(unsafe_code)] and #![deny(missing_docs)].
Clean-room isolation
The amd crate appears only in the throwaway fixture-generation
harness at tests/data/amd_oracle/harness/*.txt, which is not built
by Cargo. crates/feral-amd/Cargo.toml has zero runtime dependencies
and no dev-dependencies on amd. A workspace grep guards the
invariant:
$ grep -r 'amd = "' crates/feral-amd/Cargo.toml | grep -v harness
# (no matches)
Architecture
src/
├── lib.rs Public surface: amd_order, amd_order_with_stats,
│ amd_order_opts, AmdOptions.
├── pattern.rs CscPattern — borrowed CSC sparsity pattern with
│ validation.
├── error.rs AmdError (IndexOverflow, NonSymmetric,
│ MalformedInput).
├── stats.rs AmdStats (ncmpa, n_mass_elim, n_supervar_merge,
│ n_dense_deferred, ndiv, nms_ldl, nms_lu).
├── workspace.rs AmdWorkspace::new — owns pe/iw/len/nv/elen/degree
│ scratch arrays and runs initialization (dense-row
│ classification, zero-degree fast path, degree-bucket
│ construction).
├── algo.rs run_elimination + finalize_permutation — the main
│ loop and the post-order expansion.
└── bin/
├── feral-amd.rs Triplet-file CLI.
└── feral-amd-bench.rs Small in-crate bench skeleton.
Every function that ports a block of faer's amd.rs cites the faer
line range in its doc comment so the translation is auditable.
Limitations and scope
- Input type is deliberately minimal.
CscPattern<'a>borrows two&[usize]slices. The crate does not depend on any sparse matrix library; downstream solvers convert at the boundary. - Not yet integrated into
feral. Integration will come viadev/plans/ordering-integration.mdonce the sibling ordering crates (METIS, SCOTCH, KaHIP) also exist. - Matrices must be structurally symmetric. AMD operates on
A + Aᵀ; handing it an unsymmetric pattern produces meaningless orderings. Debug builds assert symmetry; release builds trust the caller for the sake of speed. i32index space. The scratch arrays usei32with a sentinel of-1, matching SuiteSparse. Matrices too large fori32workspaces returnAmdError::IndexOverflowfromAmdWorkspace::newrather than silently wrapping.
Reproducing the oracle fixtures
The throwaway harness is preserved under tests/data/amd_oracle/harness/
as extensionless .txt files so Cargo does not build them:
mkdir -p /tmp/amd_oracle && cd /tmp/amd_oracle
cp .../tests/data/amd_oracle/harness/main.rs.txt src/main.rs
cp .../tests/data/amd_oracle/harness/Cargo.toml.txt Cargo.toml
cargo run --release -- /tmp/amd_oracle_out
diff -ru /tmp/amd_oracle_out .../tests/data/amd_oracle/
Harness SHA-256s are listed in tests/data/amd_oracle/README.md so
re-generated fixtures can be audit-checked.
Plan and history
- Full design and commit-by-commit plan:
../../dev/plans/ordering-amd-upgrade.md - Per-commit journal (decisions, tried-and-rejected, evidence):
../../dev/journal/2026-04-16-01.org - Session checkpoint:
../../dev/sessions/2026-04-16-01.md