Expand description
gaussian_kde provides multivariate kernel density estimation (KDE) with Gaussian kernels and optionally weighed data points.
Given a dataset $X = {x_1, \cdots, x_n}$ sampled from an arbitrary probability density function (PDF), the underlying PDF is estimated as a sum of kernel functions $K$ centered at the points of the original dataset: \[ f_\mathrm{KDE}(x) = \frac{1}{\sum_i w_i} \sum_{i=1}^n w_i \, K_H\left(\bm{x} - \bm{x}_i\right). \] Here, $H$ is the bandwidth matrix.
Specifically, this crate implements KDE with multivariate normal kernels and covariance based bandwidths, \[ K_H(\bm{y}) = \frac{1}{\sqrt{(2\pi)^d \det H}} \exp\left(- \frac{1}{2} \bm{y}^\top H^{-1} \bm{y}\right) \quad \text{and} \quad H = h^2 V,\] where $h$ is the scalar bandwidth factor and $V$ is the dataset’s covariance matrix. Inserting this into the equation above, the density estimation reads \[ f_\mathrm{KDE}(x) = \frac{1}{h^d \sqrt{(2\pi)^d \det V} \sum_i w_i} \sum_{i=1}^n w_i \, \exp\left(- \frac{1}{2h^2}(\bm{x} - \bm{x}_i)^\top V^{-1}(\bm{x} - \bm{x}_i)\right). \] For more details on (multivariate) kernel density estimation, see e.g. [1, 2].
This implementation is largely based on the one in scipy.
[1] Gramacki, Artur. Nonparametric Kernel Density Estimation and Its Computational Aspects. Vol. 37. Studies in Big Data. Springer, 2018.
[2] Scott, David W. Multivariate Density Estimation: Theory, Practice, and Visualization. Second edition. Wiley, 2014.
Structs§
- GaussianKDE
- Multivariate kernel density estimation with Gaussian kernels and optionally weighed data points.
- KDEError
- General error type for any kind of error appearing during KDE calculation.
- Scott
Bandwidth - Select the scalar bandwidth factor according to Scott’s rule.
- Silverman
Bandwidth - Select the scalar bandwidth factor according to Silverman’s rule of thumb.
Enums§
Traits§
- Bandwidth
- General trait to customize the selection of the scalar bandwidth $h$.