Skip to main content

Crate rype

Crate rype 

Source
Expand description

Rype: High-performance genomic sequence classification using minimizer-based k-mer sketching.

This library provides efficient classification of DNA sequences against reference indices using RY-space (purine/pyrimidine) encoding and minimizer sketching.

§Core Concepts

  • RY Encoding: Reduces the 4-base DNA alphabet to 2 bits (purines → 1, pyrimidines → 0)
  • Minimizers: Representative k-mers selected from sliding windows for efficient sketching
  • Inverted Index: Enables O(Q log U) lookups instead of O(B × Q × log M)

§Main Types

§Classification Functions

Modules§

c_api
C API for rype - FFI bindings for external language integration.
config
memory
Memory utilities for adaptive batch sizing.
parquet_index
Parquet-based format implementation for inverted index.

Structs§

BucketData
Bucket data with minimizers for building inverted index.
BucketFileStats
Per-bucket statistics about total sequence lengths of source files.
BucketMetadata
Bucket metadata for a single bucket.
FirstErrorCapture
Thread-safe error capture that stores only the first error.
HitResult
Query ID, Bucket ID, Score
IndexMetadata
Lightweight metadata-only view of an Index (without minimizer data)
InvertedIndex
CSR-format inverted index for fast minimizer → bucket lookups.
InvertedManifest
Manifest section for inverted index shards.
InvertedShardInfo
Per-shard metadata for inverted index.
LogRatioResult
Result of log-ratio computation for a single query.
MergeOptions
Options for index merging.
MergeStats
Statistics from a merge operation.
MinimizerWorkspace
Workspace for minimizer extraction algorithms.
ParquetManifest
Manifest containing index metadata.
ParquetReadOptions
Configuration options for reading Parquet inverted index files.
ParquetWriteOptions
Configuration options for Parquet file writing.
PartitionResult
Result of partitioning reads by numerator score into fast-path and needs-denominator groups.
QueryInvertedIndex
Query inverted index for merge-join classification. Stores sorted COO (coordinate) entries: (minimizer, packed_read_id).
ShardInfo
Information about a single shard in a sharded inverted index.
ShardManifest
Manifest describing a sharded inverted index.
ShardedInvertedIndex
Handle for a sharded inverted index.
StrandMinimizers
Structure-of-arrays for minimizers on a single strand.

Enums§

FastPath
Indicates whether a log-ratio result was determined via a fast path (skipping the denominator classification) or computed exactly.
Orientation
Orientation of a sequence relative to the bucket baseline.
ParquetCompression
Compression codec for Parquet files.
RypeError
Unified error type for the rype library.
Strand
Strand indicator for minimizer origin.

Constants§

BUCKET_SOURCE_DELIM
Delimiter between filename and sequence name in bucket sources. Format: path/to/file.fa::sequence_name
ORIENTATION_FIRST_N
Default number of minimizers to check for orientation. Checking the first N sorted minimizers provides a sample for orientation decisions without iterating the full bucket.

Statics§

ENABLE_TIMING
Controls whether timing diagnostics are printed to stderr.

Functions§

base_to_bit
Convert a nucleotide base to its RY-space bit representation.
choose_orientation
Choose orientation with higher overlap. Ties favor Forward.
choose_orientation_sampled
Choose orientation by checking first N minimizers of each strand.
classify_batch_sharded_merge_join
Classify a batch of records against a sharded inverted index using merge-join.
classify_batch_sharded_parallel_rg
Classify using parallel row group processing.
classify_from_extracted_minimizers
Classify from pre-extracted minimizers against a sharded inverted index using merge-join.
classify_from_extracted_minimizers_parallel_rg
Classify from pre-extracted minimizers using parallel row group processing.
classify_from_query_index
Classify using a pre-built QueryInvertedIndex against a sharded inverted index.
classify_from_query_index_parallel_rg
Classify using a pre-built QueryInvertedIndex with parallel row group processing.
classify_log_ratio_batch
Classify a batch of reads using log-ratio (numerator vs denominator).
classify_with_sharded_negative
Classify with memory-efficient negative filtering using sharded index.
compute_log_ratio
Compute log10(numerator / denominator) with special handling for edge cases.
compute_source_hash
Compute a hash from bucket minimizer counts for validation.
count_hits
Count how many minimizers from a query match a bucket (binary search).
create_parquet_inverted_index
Create a Parquet inverted index directly from bucket data.
extract_batch_minimizers
Extract minimizers from a batch of query records in parallel.
extract_dual_strand_into
Extract minimizers from both strands of a sequence.
extract_into
Extract minimizers from a sequence (single strand).
extract_minimizer_set
Extract sorted, deduplicated minimizer sets per strand.
extract_strand_minimizers
Extract ordered minimizers with positions per strand (SoA layout).
filter_best_hits
Filter a list of HitResults to keep only the best hit per query.
get_paired_minimizers_into
Extract minimizers from paired-end reads.
is_parquet_index
Check if a path is a Parquet-based index directory.
kway_merge_dedup
Merge multiple sorted, deduplicated vectors into one sorted, deduplicated vector.
load_all_minimizers
Load all unique minimizers from a sharded index into a HashSet.
log_timing
Print timing info to stderr if ENABLE_TIMING is enabled.
merge_indices_streaming
Memory-bounded merge with streaming per-shard subtraction.
merge_sorted_into
Merge sorted source into sorted target in-place, deduplicating.
partition_by_numerator_score
Partition reads by numerator classification results into fast-path and needs-denominator groups.
validate_compatible_indices
Validate that two indices are compatible for log-ratio computation.
validate_log_ratio_indices
Validate that two sharded indices are compatible for log-ratio classification.
validate_single_bucket_index
Validate that the index has exactly one bucket and return its ID and name.

Type Aliases§

QueryRecord
ID (i64), Sequence Reference, Optional Pair Sequence Reference
RypeResult
Convenience type alias for Results using RypeError.