Skip to main content

Crate natural_breaks

Crate natural_breaks 

Source
Expand description

§Natural Breaks

Crates.io docs.rs

A Rust implementation of the Jenks natural breaks classification algorithm for optimal partitioning of one-dimensional data into k classes that minimise within-class variance (WCSS).

§Features

  • Generic over any numeric type that implements ToPrimitive (f64, f32, i32, u64, etc.)
  • Returns clustered values or zero-copy index ranges
  • Built-in sort variants for unsorted input (with NaN detection)
  • Two algorithm implementations:
    • O(kn²) with an early-exit pruning optimisation (k_n2::KNSquared)
    • O(kn log n) divide-and-conquer DP (k_nlogn::KNLogN)
  • Optional low-memory feature: reduces the O(kn log n) variant from O(kn) to O(n) memory at the cost of O(k²n log n) time

§Quick start

Add to your Cargo.toml:

[dependencies]
natural_breaks = "0.2"

§Classify unsorted data

use natural_breaks::classify_with_sort;

let data = vec![12.0, 1.0, 11.0, 2.0, 10.0, 3.0];
let clusters = classify_with_sort(data, 2).unwrap();
// [[1.0, 2.0, 3.0], [10.0, 11.0, 12.0]]

§Classify pre-sorted data

use natural_breaks::classify;

let data = vec![1, 2, 3, 10, 11, 12];
let clusters = classify(data, 2).unwrap();
// [[1, 2, 3], [10, 11, 12]]

§Get index ranges instead of values

use natural_breaks::classify_indices;

let data = [1.0, 2.0, 3.0, 10.0, 11.0, 12.0];
let ranges = classify_indices(&data, 2).unwrap();
// [(0, 3), (3, 6)]  — half-open ranges [start, end)

§API overview

FunctionInputReturnsExpects sorted?
classifyVec<T>ClassifiedResult<T>Yes
classify_with_sortVec<T>ClassifiedResult<T>No
classify_indices&[T]IndexRangesYes
classify_indices_with_sortVec<T>IndexRangesNo

For direct access to a specific algorithm implementation, use the module:

use natural_breaks::k_n2::KNSquared;

let clusters = KNSquared::classify(vec![1.0, 2.0, 3.0, 10.0, 11.0, 12.0], 2).unwrap();
use natural_breaks::k_nlogn::KNLogN;

let clusters = KNLogN::classify(vec![1.0, 2.0, 3.0, 10.0, 11.0, 12.0], 2).unwrap();

§Low-memory mode

Enable the low-memory feature to reduce the O(kn log n) algorithm’s memory usage from O(kn) to O(n), at the cost of increased time complexity (O(k²n log n)):

[dependencies]
natural_breaks = { version = "0.2", features = ["low-memory"] }

§Algorithms

§O(kn²) — k_n2::KNSquared

Based on:

Wang & Song, “Optimal Classification of Quantitative Data”, The R Journal, Vol. 3/2, December 2011. https://journal.r-project.org/articles/RJ-2011-015/

This implementation adds an early-exit pruning step that breaks the inner loop when the running within-cluster sum of squares already exceeds the current best, exploiting the monotonicity of WCSS on sorted data.

§O(kn log n) — k_nlogn::KNLogN

Uses the divide-and-conquer DP optimisation exploiting the “no-crossing-paths” (monotonicity) property of optimal split points. Based on:

Hilferink, “Fisher’s Natural Breaks Classification — Complexity Proof”, Object Vision BV. https://geodms.nl/docs/fisher%27s-natural-breaks-classification-complexity-proof.html

§Roadmap

§O(kn)

I plan to explore an O(kn) algorithm based on:

Xiaolin Song et al., “An optimal-time algorithm for the k-filling problem and its application to the one-dimensional Jenks classification”, Bioinformatics, Vol. 36, Issue 20, October 2020. https://academic.oup.com/bioinformatics/article/36/20/5027/5866975

This one still needs investigation and may take some time.

§Learning resources

If you are new to natural breaks / Jenks classification, this is a good starting point:

EHDP — Jenks Natural Breaks

§License

MIT — see LICENSE for details.

Modules§

error
k_n2
k_nlogn
kn

Traits§

ToPrimitive
Re-exported so downstream users can use the ToPrimitive bound in their own generic code without adding num-traits to their Cargo.toml. A generic trait for converting a value to a number.

Functions§

classify
Classifies pre-sorted data into k clusters using the natural breaks algorithm.
classify_indices
Classifies pre-sorted data into k clusters, returning IndexRanges.
classify_indices_with_sort
Sorts the data and classifies into k clusters, returning IndexRanges.
classify_with_sort
Sorts the data and classifies it into k clusters using the natural breaks algorithm.

Type Aliases§

ClassifiedResult
Clustered values returned by classify and classify_with_sort.
IndexRanges
Half-open index ranges [start, end) returned by classify_indices and classify_indices_with_sort.