[−][src]Crate goko
Goko
This is an lock-free efficient implementation of a covertree for data science. The traditional application of this is for KNN, but it can be applied and used in lots of other applications.
Parameter Guide
The structure is controlled by 3 parameters, the most important of which is the scale_base. This should be between 1.2 and 2ish. A higher value will create more outliers. Outliers are not loaded into ram at startup, but a high value slows down creation of a tree significantly. Theoretically, this value doesn't matter to the big O time, but I wouldn't go above 2.
The cutoff value controls how many points a leaf is allowed to cover. A smaller value gives faster bottom level queries, but at the cost of higher memory useage. Do not expect a value of 100 will give 1/100 the memory useage of a value of 1. It'd be closer to 1/10 or 1/5th. This is because the distribution of the number of children of a node. A high cutoff will increase the compute of the true-knn by a little bit.
The resolution is the minimum scale index, this again reduces memory footprint and increases the query time for true KNN. Once a node's resolution dips below this value we stop and covert the remaining coverage into a leaf. This is mainly to stop before floating point errors become an issue. Try to choose it to result in a cutoff of about 2^-9.
See the git readme for a description of the algo.
Re-exports
pub use errors::GokoResult; |
Modules
errors | The errors that can occor when a cover tree is loading, working or saving.
Most errors are floated up from |
layer | The Layers |
node | The Node |
plugins | Plugin System |
query_interface | Interfacees that simplify bulk queries |
query_tools | Tools and data structures for assisting cover tree queries. |
utils | Utility functions for i/o |
Structs
CoverTreeBuilder | A construction object for a covertree. See [ |
CoverTreeParameters | Container for the parameters governing the construction of the covertree |
CoverTreeReader | Cover Tree Reader Head |
CoverTreeWriter |
Enums
PartitionType | When 2 spheres overlap under a node, and there is a point in the overlap we have to decide to which sphere it belongs. As we create the nodes in a particular sequence, we can assign them to the first to be created or we can assign it to the nearest. |
Type Definitions
ClusterAddress | Like with a node address, the clusters are segmented by layer so we also reference by layer. The ClusterID is not meaningful, it's just a uint. |
LayerIter | Helper struct for iterating thru the reader's of the the layers. |
NodeAddress | The data structure explicitly seperates the covertree by layer, and the addressing schema for nodes is a pair for the layer index and the center point index of that node. |