Similarity
A comprehensive Rust library for calculating similarity metrics between vectors, collections, and spectral data. Features both functional and trait-based APIs with optional parallel processing and FFT optimizations.
Features
- Semantically Correct Trait System: Separate traits for different types of calculations
- Zero-Cost Abstractions: Trait calls compile to direct function calls
- Extensive Metric Coverage: Distance, similarity, correlation, and entropy measures
- Spectral Data Support: Specialized functions for mass spectrometry and signal processing
- Performance Optimizations: Parallel processing and FFT-based algorithms
- Feature Gates: Optional dependencies for parallel and FFT features
Architecture
The library is organized into three main trait categories:
1. Similarity<InputType, OutputType>
For comparing multiple entities and computing similarity or distance metrics:
- Cosine similarity/distance
- Euclidean distance
- Pearson correlation distance
- Jaccard index for sets
- Cross-correlation and time shift detection
- Hit rate and overshoot rate for predictions
- Entropy similarity between spectra
2. EntropyMeasure<InputType, OutputType>
For analyzing single entities with information-theoretic measures:
- Shannon entropy
- Tsallis entropy with parameter q
- Both standard and optimized implementations
3. DataTransform<InputType, OutputType>
For preprocessing and transforming data:
- Weight factor transformation for spectral data
- Optimized implementations for large datasets
Quick Start
Add to your Cargo.toml:
[]
= "0.2.0"
Trait-Based API Examples
use *;
use *;
use *;
use *;
use HashSet;
// Similarity between vectors
let a = ;
let b = ;
let cosine_sim = similarity;
let euclidean_dist = similarity;
// Set similarity
let mut set1 = new;
set1.extend;
let mut set2 = new;
set2.extend;
let jaccard = similarity;
// Entropy of spectral data
let spectrum = from_peaks;
let shannon_entropy = entropy;
let tsallis_entropy = entropy;
// Data transformation
let mzs = ;
let intensities = ;
let transformed = transform;
Functional API (Still Available)
use *;
// All original functions remain available
let cosine_sim = cosine_similarity;
let euclidean_dist = euclidean_distance;
let entropy = calculate_entropy;
Performance Features
Optional Dependencies
[]
= { = "0.2.0", = ["parallel", "fft"] }
parallel: Enables Rayon-based parallel processing for large datasetsfft: Enables FFT-based optimizations for cross-correlation and convolution
Performance Comparison
The trait-based API has zero performance overhead compared to direct function calls:
// These are equivalent in performance:
let result1 = cosine_similarity;
let result2 = similarity;
For large datasets (10K+ elements), use the optimized variants:
let large_a: = .map.collect;
let large_b: = .map.collect;
// 2-3x faster for large vectors
let result = similarity;
// Even faster with parallel processing
let result = similarity;
Examples
Run the comprehensive demo:
This demonstrates all available trait implementations with:
- Similarity and distance metrics
- Entropy measures for spectral data
- Data transformations
- Performance comparisons
- FFT-optimized operations
API Documentation
Trait Definitions
// For comparing two entities
// For analyzing single entities
// For transforming data
Available Implementations
Similarity Traits:
CosineSimilarity,CosineSimilarityOptimized,CosineSimilarityParallelCosineDistance,CosineDistanceOptimized,CosineDistanceParallelEuclideanDistance,SquaredEuclideanDistancePearsonCorrelationDistance,PearsonCorrelationDistanceOptimized,PearsonCorrelationDistanceParallelJaccardIndexHitRate,OvershootRateCrossCorrelationOptimized,CrossCorrelationParallel,CrossCorrelationFFTOptimizedTimeShiftFinder,TimeShiftFinderFFTEntropySimilarity,EntropySimilarityOptimized
Entropy Traits:
ShannonEntropy,ShannonEntropyOptimizedTsallisEntropy,TsallisEntropyOptimized
Transform Traits:
WeightFactorTransformation,WeightFactorTransformationOptimized
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome. Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.