Expand description
§Natural Breaks
A Rust implementation of the Jenks natural breaks classification algorithm
for optimal partitioning of one-dimensional data into k classes that minimise
within-class variance (WCSS).
§Features
- Generic over any numeric type that implements
ToPrimitive(f64,f32,i32,u64, etc.) - Returns clustered values or zero-copy index ranges
- Built-in sort variants for unsorted input (with NaN detection)
- Two algorithm implementations:
- O(kn²) with an early-exit pruning optimisation (
k_n2::KNSquared) - O(kn log n) divide-and-conquer DP (
k_nlogn::KNLogN)
- O(kn²) with an early-exit pruning optimisation (
- Optional
low-memoryfeature: reduces the O(kn log n) variant from O(kn) to O(n) memory at the cost of O(k²n log n) time
§Quick start
Add to your Cargo.toml:
[dependencies]
natural_breaks = "0.2"§Classify unsorted data
use natural_breaks::classify_with_sort;
let data = vec![12.0, 1.0, 11.0, 2.0, 10.0, 3.0];
let clusters = classify_with_sort(data, 2).unwrap();
// [[1.0, 2.0, 3.0], [10.0, 11.0, 12.0]]§Classify pre-sorted data
use natural_breaks::classify;
let data = vec![1, 2, 3, 10, 11, 12];
let clusters = classify(data, 2).unwrap();
// [[1, 2, 3], [10, 11, 12]]§Get index ranges instead of values
use natural_breaks::classify_indices;
let data = [1.0, 2.0, 3.0, 10.0, 11.0, 12.0];
let ranges = classify_indices(&data, 2).unwrap();
// [(0, 3), (3, 6)] — half-open ranges [start, end)§API overview
| Function | Input | Returns | Expects sorted? |
|---|---|---|---|
classify | Vec<T> | ClassifiedResult<T> | Yes |
classify_with_sort | Vec<T> | ClassifiedResult<T> | No |
classify_indices | &[T] | IndexRanges | Yes |
classify_indices_with_sort | Vec<T> | IndexRanges | No |
For direct access to a specific algorithm implementation, use the module:
use natural_breaks::k_n2::KNSquared;
let clusters = KNSquared::classify(vec![1.0, 2.0, 3.0, 10.0, 11.0, 12.0], 2).unwrap();use natural_breaks::k_nlogn::KNLogN;
let clusters = KNLogN::classify(vec![1.0, 2.0, 3.0, 10.0, 11.0, 12.0], 2).unwrap();§Low-memory mode
Enable the low-memory feature to reduce the O(kn log n) algorithm’s memory
usage from O(kn) to O(n), at the cost of increased time complexity
(O(k²n log n)):
[dependencies]
natural_breaks = { version = "0.2", features = ["low-memory"] }§Algorithms
§O(kn²) — k_n2::KNSquared
Based on:
Wang & Song, “Optimal Classification of Quantitative Data”, The R Journal, Vol. 3/2, December 2011. https://journal.r-project.org/articles/RJ-2011-015/
This implementation adds an early-exit pruning step that breaks the inner loop when the running within-cluster sum of squares already exceeds the current best, exploiting the monotonicity of WCSS on sorted data.
§O(kn log n) — k_nlogn::KNLogN
Uses the divide-and-conquer DP optimisation exploiting the “no-crossing-paths” (monotonicity) property of optimal split points. Based on:
Hilferink, “Fisher’s Natural Breaks Classification — Complexity Proof”, Object Vision BV. https://geodms.nl/docs/fisher%27s-natural-breaks-classification-complexity-proof.html
§Roadmap
§O(kn)
I plan to explore an O(kn) algorithm based on:
Xiaolin Song et al., “An optimal-time algorithm for the k-filling problem and its application to the one-dimensional Jenks classification”, Bioinformatics, Vol. 36, Issue 20, October 2020. https://academic.oup.com/bioinformatics/article/36/20/5027/5866975
This one still needs investigation and may take some time.
§Learning resources
If you are new to natural breaks / Jenks classification, this is a good starting point:
§License
MIT — see LICENSE for details.
Modules§
Traits§
- ToPrimitive
- Re-exported so downstream users can use the
ToPrimitivebound in their own generic code without addingnum-traitsto theirCargo.toml. A generic trait for converting a value to a number.
Functions§
- classify
- Classifies pre-sorted data into
kclusters using the natural breaks algorithm. - classify_
indices - Classifies pre-sorted data into
kclusters, returningIndexRanges. - classify_
indices_ with_ sort - Sorts the data and classifies into
kclusters, returningIndexRanges. - classify_
with_ sort - Sorts the data and classifies it into
kclusters using the natural breaks algorithm.
Type Aliases§
- Classified
Result - Clustered values returned by
classifyandclassify_with_sort. - Index
Ranges - Half-open index ranges
[start, end)returned byclassify_indicesandclassify_indices_with_sort.