The goal here is to compress/search save bioinfromatics data on nucleotide polymorphism using BitMagic Library.
Tech.notes:
http://bitmagic.io/succinct-snp-search.html
SNP RS columnar sparse vector example:
1. Download Human SNP report from NCBI FTP
ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/chr_rpts/
2. Take Chr1 for a large experiment
chr_1.txt.gz
ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606/chr_rpts/chr_1.txt.gz
3. unzip the SNP report file
unzip chr_1.txt.gz
4. Load/parse data into compressed sparse vector
./xsample03 -isnp chr_1.txt -svout rs_chr1.sv
(wait, wait, wait, ...)
5. Build RS compressed sparse vector
./xsample03 -svin rs_chr1.sv -rscout rs_chr1.rssv -t
6. Print diagnostics (memory usage, statistics, etc)
./xsample03 -svin rs_chr1.sv -rscin rs_chr1.rssv -t -d
7. Search performance benchmark
./xsample03 -svin rs_chr1.sv -rscin rs_chr1.rssv -b -t