pub struct ComponentIndex { /* private fields */ }Expand description
Index for efficient component candidate lookup.
Reduces the O(n·m) comparison to O(n·k) where k << m by:
- Grouping components by ecosystem
- Bucketing by name prefix
- Bucketing by trigrams (3-char substrings) for fuzzy matching
- Pre-normalizing names for fast comparison
Uses Arc<CanonicalId> internally for efficient cloning during index building.
Implementations§
Source§impl ComponentIndex
impl ComponentIndex
Sourcepub fn build(sbom: &NormalizedSbom) -> Self
pub fn build(sbom: &NormalizedSbom) -> Self
Build an index from an SBOM.
Uses Arc<CanonicalId> internally to avoid expensive cloning of IDs
across multiple index structures.
Sourcepub fn normalize_component(comp: &Component) -> NormalizedEntry
pub fn normalize_component(comp: &Component) -> NormalizedEntry
Normalize a component for indexing.
Sourcepub fn normalize_name(name: &str, ecosystem: Option<&str>) -> String
pub fn normalize_name(name: &str, ecosystem: Option<&str>) -> String
Normalize a component name for comparison.
Applies ecosystem-specific normalization rules:
PyPI: underscores, hyphens, dots are all equivalent (converted to hyphen)- Cargo: hyphens and underscores are equivalent (converted to underscore)
- npm: lowercase only, preserves scope
- Default: lowercase with underscore to hyphen conversion
This is also used by LSH for consistent shingle computation.
Sourcepub fn get_entry(&self, id: &CanonicalId) -> Option<&NormalizedEntry>
pub fn get_entry(&self, id: &CanonicalId) -> Option<&NormalizedEntry>
Get normalized entry for a component.
Sourcepub fn get_by_ecosystem(&self, ecosystem: &str) -> Option<Vec<CanonicalId>>
pub fn get_by_ecosystem(&self, ecosystem: &str) -> Option<Vec<CanonicalId>>
Get components by ecosystem.
Returns cloned CanonicalIds for API stability. The internal storage uses Arc
to avoid expensive cloning during index building.
Sourcepub fn find_candidates(
&self,
source_id: &CanonicalId,
source_entry: &NormalizedEntry,
max_candidates: usize,
max_length_diff: usize,
) -> Vec<CanonicalId>
pub fn find_candidates( &self, source_id: &CanonicalId, source_entry: &NormalizedEntry, max_candidates: usize, max_length_diff: usize, ) -> Vec<CanonicalId>
Find candidate matches for a component.
Returns a list of component IDs that are likely matches, ordered by likelihood. Uses ecosystem and prefix-based filtering to reduce candidates.
Returns cloned CanonicalIds for API stability. The internal storage uses Arc
to avoid expensive cloning during index building.
Sourcepub fn all_ids(&self) -> Vec<CanonicalId>
pub fn all_ids(&self) -> Vec<CanonicalId>
Get all component IDs (for fallback full scan).
Returns cloned CanonicalIds for API stability.
Sourcepub fn find_candidates_parallel<'a>(
&self,
sources: &[(&'a CanonicalId, &NormalizedEntry)],
max_candidates: usize,
max_length_diff: usize,
) -> Vec<(&'a CanonicalId, Vec<CanonicalId>)>
pub fn find_candidates_parallel<'a>( &self, sources: &[(&'a CanonicalId, &NormalizedEntry)], max_candidates: usize, max_length_diff: usize, ) -> Vec<(&'a CanonicalId, Vec<CanonicalId>)>
Find candidates for multiple source components in parallel.
This is significantly faster than calling find_candidates sequentially
for large SBOMs (1000+ components). Uses rayon for parallel iteration.
Returns a vector of (source_id, candidates) pairs in the same order as input.
Sourcepub fn find_all_candidates_from(
&self,
other: &Self,
max_candidates: usize,
max_length_diff: usize,
) -> Vec<(CanonicalId, Vec<CanonicalId>)>
pub fn find_all_candidates_from( &self, other: &Self, max_candidates: usize, max_length_diff: usize, ) -> Vec<(CanonicalId, Vec<CanonicalId>)>
Find candidates for all components in another index in parallel.
Useful for diffing two SBOMs: build an index from the new SBOM, then find candidates for all components from the old SBOM.
Sourcepub fn stats(&self) -> IndexStats
pub fn stats(&self) -> IndexStats
Get statistics about the index.
Sourcepub fn trigram_similarity(
entry_a: &NormalizedEntry,
entry_b: &NormalizedEntry,
) -> f64
pub fn trigram_similarity( entry_a: &NormalizedEntry, entry_b: &NormalizedEntry, ) -> f64
Compute trigram similarity between two entries (Jaccard coefficient).
Returns a value between 0.0 and 1.0 where 1.0 means identical trigram sets.
Auto Trait Implementations§
impl Freeze for ComponentIndex
impl RefUnwindSafe for ComponentIndex
impl Send for ComponentIndex
impl Sync for ComponentIndex
impl Unpin for ComponentIndex
impl UnsafeUnpin for ComponentIndex
impl UnwindSafe for ComponentIndex
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more