1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
/*
* Licensed to Elasticsearch B.V. under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch B.V. licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/
//! # Goko
//! This is an lock-free efficient implementation of a covertree for data science. The traditional
//! application of this is for KNN, but it can be applied and used in lots of other applications.
//!
//! ## Parameter Guide
//! The structure is controlled by 3 parameters, the most important of which
//! is the scale_base. This should be between 1.2 and 2ish. A higher value will create more outliers.
//! Outliers are not loaded into ram at startup, but a high value slows down creation of a tree
//! significantly. Theoretically, this value doesn't matter to the big O time, but I wouldn't go above 2.
//!
//! The cutoff value controls how many points a leaf is allowed to cover. A smaller value gives faster
//! bottom level queries, but at the cost of higher memory useage. Do not expect a value of 100 will give
//! 1/100 the memory useage of a value of 1. It'd be closer to 1/10 or 1/5th. This is because the distribution
//! of the number of children of a node. A high cutoff will increase the compute of the true-knn by a little bit.
//!
//! The resolution is the minimum scale index, this again reduces memory footprint and increases the query time
//! for true KNN.
//! Once a node's resolution dips below this value we stop and covert the remaining coverage into a leaf.
//! This is mainly to stop before floating point errors become an issue. Try to choose it to result in a cutoff of about
//! 2^-9.
//!
//! See the git readme for a description of the algo.
//!
extern crate smallvec;
extern crate rand;
use *;
extern crate assert_approx_eq;
use *;
pub use GokoResult;
pub
pub use *;
/// The data structure explicitly seperates the covertree by layer, and the addressing schema for nodes
/// is a pair for the layer index and the center point index of that node.
pub type NodeAddress = ;
/// Like with a node address, the clusters are segmented by layer so we also reference by layer. The ClusterID is not meaningful, it's just a uint.
pub type ClusterAddress = ;