docs.rs failed to build lt-fm-index-0.5.0-beta.4
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Visit the last successful build:
lt-fm-index-0.7.0-alpha.2
LT FM-Index
lt-fm-index
is library for locate and count nucleotide and amino acid sequence string.
lt-fm-index
use lookup table (LT) in count table
CAVEAT! This crate
is not stable. Functions can be changed without notification.
Description
- Fm-index is a data structure used for pattern matching.
LT
is precalculated count table containing all kmer occurrences.- With
LT
, you can find the first k-mer pattern at once.
Features
LtFmIndex
is generated fromText
LtFmIndex
have two functions forPattern
- count: Count the number of times the
Pattern
appears inText
. - locate: Locate the start index in which the
Pattern
appears inText
.
- count: Count the number of times the
- Supports four types of text.
NucleotideOnly
supports a text with only genetic nucleotide sequence (ACGT).NucleotideWithNoise
supports a text containing non-nucleotide sequence.AminoacidOnly
supports a text with only amino acid sequence.AminoacidWithNoise
supports a text containing non-amino acid sequence.
- The last character of each text type is treated as a wildcard.
- The last characters of each text type are T, _, Y and _.
- Wildcard is assigned to all non-supported characters.
- For example, in
NucleotideOnly
, pattern of ACGTXYZ can be matched with ACGTTTT. Because X, Y and Z are not in ACG (nucleotide except T). Andlt-fm-index
generated with text of ACGTXYZ indexes the text as ACGTTTT.
- BWT is stored with rank count tables in every 64 or 128 intervals.
Examples
1. Use LtFmIndex
to count and locate pattern.
use ;
// (1) Define configuration for lt-fm-index
let config = for_nucleotide
.with_noise
.change_kmer_size.unwrap
.change_sampling_ratio.unwrap
.change_bwt_interval_to_128;
// (2) Generate fm-index with text
let text = b"CTCCGTACACCTGTTTCGTATCGGANNNN".to_vec;
let lt_fm_index = config.generate.unwrap; // text is consumed
// (3) Match with pattern
let pattern = b"TA".to_vec;
// - count
let count = lt_fm_index.count;
assert_eq!;
// - locate
let locations = lt_fm_index.locate;
assert_eq!;
2. Write and read LtFmIndex
use ;
// (1) Generate `FmIndex`
let config = for_nucleotide;
let text = b"CTCCGTACACCTGTTTCGTATCGGA".to_vec;
let lt_fm_index = config.generate.unwrap; // text is consumed
// (2) Write fm-index to buffer (or file path)
let mut buffer = Vec new;
lt_fm_index.write_to.unwrap;
// (3) Read fm-index from buffer (or file path)
let lt_fm_index_buf = read_from.unwrap;
assert_eq!;
Repository
https://github.com/baku4/lt-fm-index
Doc
Reference
- Ferragina, P., et al. (2004). An Alphabet-Friendly FM-Index, Springer Berlin Heidelberg: 150-160.
- Anderson, T. and T. J. Wheeler (2021). An optimized FM-index library for nucleotide and amino acid search, Cold Spring Harbor Laboratory.
- Wang, Y., X. Li, D. Zang, G. Tan and N. Sun (2018). Accelerating FM-index Search for Genomic Data Processing, ACM.
- Yuta Mori.
libdivsufsort