1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103
// Copyright 2023 Lance Developers.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//! Distance metrics
//!
//! This module provides distance metrics for vectors.
//!
//! - `bf16, f16, f32, f64` types are supported.
//! - SIMD is used when available, on `x86_64` and `aarch64` architectures.
use std::sync::Arc;
use arrow_array::{Array, FixedSizeListArray, Float32Array};
use arrow_schema::ArrowError;
pub mod cosine;
pub mod dot;
pub mod l2;
pub mod norm_l2;
pub use cosine::*;
pub use dot::*;
pub use l2::*;
pub use norm_l2::*;
use crate::Result;
/// Distance metrics type.
#[derive(Debug, Copy, Clone, PartialEq)]
pub enum DistanceType {
L2,
Cosine,
Dot, // Dot product
}
/// For backwards compatibility.
pub type MetricType = DistanceType;
pub type DistanceFunc = fn(&[f32], &[f32]) -> f32;
pub type BatchDistanceFunc = fn(&[f32], &[f32], usize) -> Arc<Float32Array>;
pub type ArrowBatchDistanceFunc = fn(&dyn Array, &FixedSizeListArray) -> Result<Arc<Float32Array>>;
impl DistanceType {
/// Compute the distance from one vector to a batch of vectors.
///
/// This propagates nulls to the output.
pub fn arrow_batch_func(&self) -> ArrowBatchDistanceFunc {
match self {
Self::L2 => l2_distance_arrow_batch,
Self::Cosine => cosine_distance_arrow_batch,
Self::Dot => dot_distance_arrow_batch,
}
}
/// Returns the distance function between two vectors.
pub fn func(&self) -> DistanceFunc {
match self {
Self::L2 => l2::<f32>,
Self::Cosine => cosine_distance,
Self::Dot => dot_distance,
}
}
}
impl std::fmt::Display for DistanceType {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(
f,
"{}",
match self {
Self::L2 => "l2",
Self::Cosine => "cosine",
Self::Dot => "dot",
}
)
}
}
impl TryFrom<&str> for DistanceType {
type Error = ArrowError;
fn try_from(s: &str) -> std::result::Result<Self, Self::Error> {
match s.to_lowercase().as_str() {
"l2" | "euclidean" => Ok(Self::L2),
"cosine" => Ok(Self::Cosine),
"dot" => Ok(Self::Dot),
_ => Err(ArrowError::InvalidArgumentError(format!(
"Metric type '{s}' is not supported"
))),
}
}
}