Expand description
kbo is an approximate local aligner based on converting k-bounded matching statistics into a character representation of the underlying alignment sequence.
Currently, kbo supports two main operations:
kbo find
matches the k-mers in a query sequence with the reference and reports the local alignment segments found within the reference. Find is useful for problems that can be solved with blast.kbo map
maps the query sequence against a reference sequence, and reports the nucleotide sequence of the alignment relative to the reference. Map solves the same problem as snippy and ska map.
kbo uses the Spectral Burrows-Wheeler Transform data structure that allows efficient k-mer matching between a target and a query sequence and fast retrieval of the k-bounded matching statistic for each k-mer match.
§Installing the kbo executable
See installation instructions at GitHub.
§Usage
kbo can be run directly on fasta files without an initial indexing step.
Prebuilt indexes are supported via kbo build
but are only
relevant in kbo find
analyses where the reference k-mers can be
concatenated into a single contig.
kbo can read inputs compressed in the DEFLATE format (gzip, zlib, etc.). bzip2 and xz support can be enabled by adding the “bzip2” and “xz” feature flags to needletail in the kbo Cargo.toml.
§kbo find
To set up the example, download the fasta sequence of the Escherichia coli Nissle 1917 genome from the NCBI and the pks island gene sequences from GitHub. Example output was generated with versions ASM71459v1 and rev 43bbd36.
§Find gene sequence locations
In the directory containing the input files, run
kbo find --reference db.fasta GCF_000714595.1_ASM71459v1_genomic.fna
This will produce the output (click to expand)
query | ref | q.start | q.end | strand | length | mismatches | query.contig | ref.contig |
---|---|---|---|---|---|---|---|---|
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 226708 | 227226 | + | 519 | 0 | db.fasta | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 2289596 | 2290543 | + | 949 | 0 | db.fasta | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3152039 | 3161660 | + | 9623 | 1 | db.fasta | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3161701 | 3164301 | + | 2601 | 0 | db.fasta | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3164311 | 3165180 | + | 870 | 0 | db.fasta | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3165210 | 3165458 | + | 249 | 0 | db.fasta | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3165462 | 3167857 | + | 2397 | 1 | db.fasta | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3167905 | 3172701 | + | 4797 | 2 | db.fasta | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3172751 | 3175783 | + | 3033 | 0 | db.fasta | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3175827 | 3182327 | + | 6501 | 0 | db.fasta | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3182338 | 3190258 | + | 7922 | 1 | db.fasta | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3190320 | 3196124 | + | 5807 | 1 | db.fasta | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3196155 | 3198614 | + | 2460 | 0 | db.fasta | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3198627 | 3200856 | + | 2231 | 0 | db.fasta | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3200891 | 3201403 | + | 513 | 1 | db.fasta | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 4502887 | 4503405 | + | 519 | 0 | db.fasta | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 5145962 | 5147210 | + | 1249 | 0 | db.fasta | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 5147272 | 5149449 | + | 2179 | 0 | db.fasta | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 5351015 | 5351533 | + | 519 | 0 | db.fasta | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 5352280 | 5352503 | + | 224 | 0 | db.fasta | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 5354674 | 5356713 | + | 2040 | 1 | db.fasta | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 5381795 | 5381945 | + | 151 | 0 | db.fasta | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
§Find gene sequence locations with names
If you need to know which gene in db.fasta the matches are for, add the --detailed
toggle:
kbo find --detailed --reference db.fasta GCF_000714595.1_ASM71459v1_genomic.fna
This replaces the query.contig column with the name of the contig (click to expand)
query | ref | q.start | q.end | strand | length | mismatches | query.contig | ref.contig |
---|---|---|---|---|---|---|---|---|
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 226708 | 227226 | + | 519 | 0 | clbS-like_4ce09a | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 2289596 | 2289808 | + | 213 | 0 | clbR locus_tag=ECOK1_RS11410 product=“colibactin biosynthesis LuxR family transcriptional regulator ClbR” protein_id=WP_000357141.1 | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 2289809 | 2290543 | + | 735 | 0 | clbA locus_tag=ECOK1_RS11415 product=“colibactin biosynthesis phosphopantetheinyl transferase ClbA” protein_id=WP_001217110.1 | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3152039 | 3161660 | + | 9623 | 1 | clbB locus_tag=ECOK1_RS11405 product=“colibactin hybrid non-ribosomal peptide synthetase/type I polyketide synthase ClbB” protein_id=WP_001518711.1 | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3161701 | 3164301 | + | 2601 | 0 | clbC locus_tag=ECOK1_RS11400 product=“colibactin polyketide synthase ClbC” protein_id=WP_001297908.1 | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3164311 | 3165180 | + | 870 | 0 | clbD locus_tag=ECOK1_RS11395 product=“colibactin biosynthesis dehydrogenase ClbD” protein_id=WP_000982270.1 | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3165210 | 3165458 | + | 249 | 0 | clbE locus_tag=ECOK1_RS11390 product=“colibactin biosynthesis aminomalonyl-acyl carrier protein ClbE” protein_id=WP_001297917.1 | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3165462 | 3166592 | + | 1131 | 0 | clbF locus_tag=ECOK1_RS11385 product=“colibactin biosynthesis dehydrogenase ClbF” protein_id=WP_000337350.1 | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3166589 | 3167857 | + | 1269 | 1 | clbG locus_tag=ECOK1_RS11380 product=“colibactin biosynthesis acyltransferase ClbG” protein_id=WP_000159201.1 | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3167905 | 3172701 | + | 4797 | 2 | clbH locus_tag=ECOK1_RS11375 product=“colibactin non-ribosomal peptide synthetase ClbH” protein_id=WP_001304254.1 | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3172751 | 3175783 | + | 3033 | 0 | clbI locus_tag=ECOK1_RS11370 product=“colibactin polyketide synthase ClbI” protein_id=WP_000829570.1 | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3175827 | 3182327 | + | 6501 | 0 | clbJ locus_tag=ECOK1_RS11365 product=“colibactin non-ribosomal peptide synthetase ClbJ” protein_id=WP_001468003.1 | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3180417 | 3181703 | + | 1287 | 2 | clbK locus_tag=ECOK1_RS11360 product=“colibactin hybrid non-ribosomal peptide synthetase/type I polyketide synthase ClbK” protein_id=WP_000222467.1 | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3182338 | 3188802 | + | 6465 | 2 | clbK locus_tag=ECOK1_RS11360 product=“colibactin hybrid non-ribosomal peptide synthetase/type I polyketide synthase ClbK” protein_id=WP_000222467.1 | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3186070 | 3187356 | + | 1287 | 1 | clbJ locus_tag=ECOK1_RS11365 product=“colibactin non-ribosomal peptide synthetase ClbJ” protein_id=WP_001468003.1 | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3188795 | 3190258 | + | 1464 | 0 | clbL locus_tag=ECOK1_RS11355 product=“colibactin biosynthesis amidase ClbL” protein_id=WP_001297937.1 | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3190320 | 3191759 | + | 1440 | 0 | clbM locus_tag=ECOK1_RS11350 product=“precolibactin export MATE transporter ClbM” protein_id=WP_000217768.1 | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3191756 | 3196124 | + | 4370 | 1 | clbN locus_tag=ECOK1_RS11345 product=“colibactin non-ribosomal peptide synthetase ClbN” protein_id=WP_001327259.1 | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3196155 | 3198614 | + | 2460 | 0 | clbO locus_tag=ECOK1_RS11340 product=“colibactin polyketide synthase ClbO” protein_id=WP_001029878.1 | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3198627 | 3200141 | + | 1515 | 0 | clbP locus_tag=ECOK1_RS11335 product=“precolibactin peptidase ClbP” protein_id=WP_002430641.1 | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3200134 | 3200856 | + | 723 | 0 | clbQ locus_tag=ECOK1_RS11330 product=“colibactin biosynthesis thioesterase ClbQ” protein_id=WP_000065646.1 | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 3200891 | 3201403 | + | 513 | 1 | clbS locus_tag=ECOK1_RS11325 product=“colibactin self-protection protein ClbS” protein_id=WP_000290498.1 | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 4502887 | 4503405 | + | 519 | 0 | clbS-like_4ce09a | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 5145962 | 5147210 | + | 1249 | 0 | clbL locus_tag=ECOK1_RS11355 product=“colibactin biosynthesis amidase ClbL” protein_id=WP_001297937.1 | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 5147272 | 5148479 | + | 1208 | 0 | clbM locus_tag=ECOK1_RS11350 product=“precolibactin export MATE transporter ClbM” protein_id=WP_000217768.1 | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 5148478 | 5149449 | + | 972 | 0 | clbN locus_tag=ECOK1_RS11345 product=“colibactin non-ribosomal peptide synthetase ClbN” protein_id=WP_001327259.1 | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 5351015 | 5351533 | + | 519 | 0 | clbS-like_4ce09a | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 5352280 | 5352503 | + | 224 | 0 | clbS-like_4ce09a | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 5354674 | 5356713 | + | 2040 | 1 | clbN locus_tag=ECOK1_RS11345 product=“colibactin non-ribosomal peptide synthetase ClbN” protein_id=WP_001327259.1 | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
GCF_000714595.1_ASM71459v1_genomic.fna | db.fasta | 5381795 | 5381945 | + | 151 | 0 | clbS-like_4ce09a | NZ_CP007799.1 Escherichia coli Nissle 1917 chromosome, complete genome |
Note that the current implementation --detailed
significantly slows down
the algorithm. Future versions of kbo may address this by incorporating
colors in the index structure.
§Find containment of gene sequences in assembly
Alternatively, if you are only interested in whether the contigs in db.fasta
are present in the assembly, run
kbo find --reference GCF_000714595.1_ASM71459v1_genomic.fna db.fasta
which will return (click to expand)
query | ref | q.start | q.end | strand | length | mismatches | query.contig | ref.contig |
---|---|---|---|---|---|---|---|---|
db.fasta | GCF_000714595.1_ASM71459v1_genomic.fna | 1 | 513 | + | 513 | 1 | GCF_000714595.1_ASM71459v1_genomic.fna | clbS|locus_tag=ECOK1_RS11325|product=“colibactin self-protection protein ClbS”|protein_id=WP_000290498.1 |
db.fasta | GCF_000714595.1_ASM71459v1_genomic.fna | 1 | 723 | + | 723 | 0 | GCF_000714595.1_ASM71459v1_genomic.fna | clbQ|locus_tag=ECOK1_RS11330|product=“colibactin biosynthesis thioesterase ClbQ”|protein_id=WP_000065646.1 |
db.fasta | GCF_000714595.1_ASM71459v1_genomic.fna | 1 | 1515 | + | 1515 | 0 | GCF_000714595.1_ASM71459v1_genomic.fna | clbP|locus_tag=ECOK1_RS11335|product=“precolibactin peptidase ClbP”|protein_id=WP_002430641.1 |
db.fasta | GCF_000714595.1_ASM71459v1_genomic.fna | 1 | 2460 | + | 2460 | 0 | GCF_000714595.1_ASM71459v1_genomic.fna | clbO|locus_tag=ECOK1_RS11340|product=“colibactin polyketide synthase ClbO”|protein_id=WP_001029878.1 |
db.fasta | GCF_000714595.1_ASM71459v1_genomic.fna | 1 | 4368 | + | 4369 | 1 | GCF_000714595.1_ASM71459v1_genomic.fna | clbN|locus_tag=ECOK1_RS11345|product=“colibactin non-ribosomal peptide synthetase ClbN”|protein_id=WP_001327259.1 |
db.fasta | GCF_000714595.1_ASM71459v1_genomic.fna | 1 | 1208 | + | 1208 | 0 | GCF_000714595.1_ASM71459v1_genomic.fna | clbM|locus_tag=ECOK1_RS11350|product=“precolibactin export MATE transporter ClbM”|protein_id=WP_000217768.1 |
db.fasta | GCF_000714595.1_ASM71459v1_genomic.fna | 1 | 1440 | + | 1440 | 0 | GCF_000714595.1_ASM71459v1_genomic.fna | clbM|locus_tag=ECOK1_RS11350|product=“precolibactin export MATE transporter ClbM”|protein_id=WP_000217768.1 |
db.fasta | GCF_000714595.1_ASM71459v1_genomic.fna | 1 | 1464 | + | 1464 | 0 | GCF_000714595.1_ASM71459v1_genomic.fna | clbL|locus_tag=ECOK1_RS11355|product=“colibactin biosynthesis amidase ClbL”|protein_id=WP_001297937.1 |
db.fasta | GCF_000714595.1_ASM71459v1_genomic.fna | 1 | 6465 | + | 6465 | 2 | GCF_000714595.1_ASM71459v1_genomic.fna | clbK|locus_tag=ECOK1_RS11360|product=“colibactin hybrid non-ribosomal peptide synthetase/type I polyketide synthase ClbK”|protein_id=WP_000222467.1 |
db.fasta | GCF_000714595.1_ASM71459v1_genomic.fna | 1 | 6501 | + | 6501 | 0 | GCF_000714595.1_ASM71459v1_genomic.fna | clbJ|locus_tag=ECOK1_RS11365|product=“colibactin non-ribosomal peptide synthetase ClbJ”|protein_id=WP_001468003.1 |
db.fasta | GCF_000714595.1_ASM71459v1_genomic.fna | 1 | 3033 | + | 3033 | 0 | GCF_000714595.1_ASM71459v1_genomic.fna | clbI|locus_tag=ECOK1_RS11370|product=“colibactin polyketide synthase ClbI”|protein_id=WP_000829570.1 |
db.fasta | GCF_000714595.1_ASM71459v1_genomic.fna | 1 | 4797 | + | 4797 | 2 | GCF_000714595.1_ASM71459v1_genomic.fna | clbH|locus_tag=ECOK1_RS11375|product=“colibactin non-ribosomal peptide synthetase ClbH”|protein_id=WP_001304254.1 |
db.fasta | GCF_000714595.1_ASM71459v1_genomic.fna | 1 | 1269 | + | 1269 | 1 | GCF_000714595.1_ASM71459v1_genomic.fna | clbG|locus_tag=ECOK1_RS11380|product=“colibactin biosynthesis acyltransferase ClbG”|protein_id=WP_000159201.1 |
db.fasta | GCF_000714595.1_ASM71459v1_genomic.fna | 1 | 1131 | + | 1131 | 0 | GCF_000714595.1_ASM71459v1_genomic.fna | clbF|locus_tag=ECOK1_RS11385|product=“colibactin biosynthesis dehydrogenase ClbF”|protein_id=WP_000337350.1 |
db.fasta | GCF_000714595.1_ASM71459v1_genomic.fna | 1 | 249 | + | 249 | 0 | GCF_000714595.1_ASM71459v1_genomic.fna | clbE|locus_tag=ECOK1_RS11390|product=“colibactin biosynthesis aminomalonyl-acyl carrier protein ClbE”|protein_id=WP_001297917.1 |
db.fasta | GCF_000714595.1_ASM71459v1_genomic.fna | 1 | 870 | + | 870 | 0 | GCF_000714595.1_ASM71459v1_genomic.fna | clbD|locus_tag=ECOK1_RS11395|product=“colibactin biosynthesis dehydrogenase ClbD”|protein_id=WP_000982270.1 |
db.fasta | GCF_000714595.1_ASM71459v1_genomic.fna | 1 | 2601 | + | 2601 | 0 | GCF_000714595.1_ASM71459v1_genomic.fna | clbC|locus_tag=ECOK1_RS11400|product=“colibactin polyketide synthase ClbC”|protein_id=WP_001297908.1 |
db.fasta | GCF_000714595.1_ASM71459v1_genomic.fna | 1 | 9621 | + | 9622 | 1 | GCF_000714595.1_ASM71459v1_genomic.fna | clbB|locus_tag=ECOK1_RS11405|product=“colibactin hybrid non-ribosomal peptide synthetase/type I polyketide synthase ClbB”|protein_id=WP_001518711.1 |
db.fasta | GCF_000714595.1_ASM71459v1_genomic.fna | 1 | 213 | + | 213 | 0 | GCF_000714595.1_ASM71459v1_genomic.fna | clbR|locus_tag=ECOK1_RS11410|product=“colibactin biosynthesis LuxR family transcriptional regulator ClbR”|protein_id=WP_000357141.1 |
db.fasta | GCF_000714595.1_ASM71459v1_genomic.fna | 1 | 735 | + | 735 | 0 | GCF_000714595.1_ASM71459v1_genomic.fna | clbA|locus_tag=ECOK1_RS11415|product=“colibactin biosynthesis phosphopantetheinyl transferase ClbA”|protein_id=WP_001217110.1 |
db.fasta | GCF_000714595.1_ASM71459v1_genomic.fna | 1 | 519 | + | 519 | 0 | GCF_000714595.1_ASM71459v1_genomic.fna | clbS-like_4ce09a |
db.fasta | GCF_000714595.1_ASM71459v1_genomic.fna | 1 | 519 | + | 519 | 0 | GCF_000714595.1_ASM71459v1_genomic.fna | clbS-like_4ce09a |
db.fasta | GCF_000714595.1_ASM71459v1_genomic.fna | 216 | 1464 | + | 1249 | 0 | GCF_000714595.1_ASM71459v1_genomic.fna | clbL|locus_tag=ECOK1_RS11355|product=“colibactin biosynthesis amidase ClbL”|protein_id=WP_001297937.1 |
db.fasta | GCF_000714595.1_ASM71459v1_genomic.fna | 1156 | 4167 | + | 3013 | 1 | GCF_000714595.1_ASM71459v1_genomic.fna | clbN|locus_tag=ECOK1_RS11345|product=“colibactin non-ribosomal peptide synthetase ClbN”|protein_id=WP_001327259.1 |
§kbo map
kbo map can be used to align a query sequence against a reference sequence. This is useful in for example generating a reference-based alignment of multiple related genomes against a good reference assembly.
To run this example, download the genome sequence of the E. coli UTI89 strain from the NCBI (ASM1326v1).
§Reference-based alignment
Run
kbo map --reference GCF_000714595.1_ASM71459v1_genomic.fna GCF_000013265.1_ASM1326v1_genomic.fna > result.aln
which will write the alignment sequence to result.aln
. Note that kbo map
always writes to stdout.
If you have multiple sequences you need to align, either supply them as
arguments to kbo map
or process them using gnu parallel:
parallel -j 'kbo map --reference GCF_000714595.1_ASM71459v1_genomic.fna {}' < query_paths.txt > result.aln
kbo map also accepts the --threads
argument to parallelise either the
index construction (in the case of a single query), or run in parallel over
the input files (multiple queries).
Modules§
- Derandomizing noisy k-bounded matching statistics.
- Converting alignment representations into various output formats.
- Wrapper for using the sbwt API to build and query SBWT indexes.
- Translating deterministic k-bounded matching statistics into alignments.
Structs§
Functions§
- Builds an SBWT index from some fasta or fastq files.
- Finds the k-mers from an SBWT index in a query fasta or fastq file.
- Maps a query sequence against a reference sequence.
- Matches a query fasta or fastq file against an SBWT index.