This is example to build a sorted, bit-transposed dictionary vector of names of
celestial objects from NED objects directory.
Example shows how to do sorted search, runs benchmarks, checks self correctness.
0. How to build
- go to the source directory of BitMagic
>cd ../..
- setup the compile root directory
as
>source ./bmenv.sh
or
>. ./bmenv.sh
- get back to the sample
>cd -
>make
or
>./build_all.sh
1. Download the fragment of catalog from
http://bitmagic.io/data/ned.txt.gz
or use NED ADQL interface:
curl -o ned.txt "https://ned.ipac.caltech.edu/tap/sync?query=SELECT+*+FROM+ned_objdir&LANG=ADQL&REQUEST=doQuery&FORMAT=text"
It will produce a columnar report similar to this:
bhextin | dec | gallat | gallon | prefname |pretype |
-------------------|----------------------|--------------------|-------------------|--------------------------------|--------|
0.0 |61.090161208000005 |55.6336642907128 |129.592988077481 |"2MASX J12201807+6105248 "|"G "|
0.0 |56.1873074943 |60.7497277891121 |128.223612727992 |"2MASX J12325147+5611139 "|"G "|
0.0 |58.25105334925 |58.794082336604 |126.228935296354 |"2MASS J12382714+5815046 "|"G "|
0.0 |57.223830104799994 |59.8791937528706 |124.806822462432 |"2MASS J12442918+5713259 "|"* "|
OVERFLOW (more rows were available but have been truncated by the TAP service)
NOTE: Out of this columnar report we need only "prefname" - names of objects in the NED catalog.
Example will parse out the column and create a sorted STL vector, STL map (R-B tree) and
bit transposed sparse vector (BitMagic) to compare memory footprint and search performance.
2. Run xsampl05 to load the report and build index
./xsample05 -idict ned.txt -svout astra.bm -t
Expected output:
Reading input file: 1000000
Loaded 380132 dictionary names.
Performance:
1. parse input data ; 53.74 sec
2. build sparse vector; 1.276 sec
7. Save sparse vector; 4.726 ms
See also:
./run_all_compr.sh
3. Run search benchmarks using the compressed sparse vector of "prefname"
./xsample05 -svin astra.bm -t -b
Expected output:
Picked 15000 samples. Running benchmarks.
Performance:
3. std::lower_bound() search; 427000 ops/sec; 35.08 ms
4. std::map<> search; 285000 ops/sec; 52.47 ms
5. bm::sparse_vector_scanner<> search; 21000 ops/sec; 685.2 ms
6. bm::sparse_vector_scanner<> binary search; 42000 ops/sec; 351 ms
8. Load sparse vector; 3.951 ms
See also:
./run_all_bench.sh