rustybam 0.1.5

Mitchell Vollger's utilities for alignments
Documentation

rustybam

Actions Status Actions Status Actions Status

Install

It is easy just make sure you have rust and cargo installed and then:

cargo install rustybam

Alternatively to build from the latest source code:

git clone https://github.com/mrvollger/rustybam.git
cd rustybam
cargo build --release

and the executable will be built here:

target/release/rustybam

Examples

I have a PAF and I want to subset it for just a particular region in the reference.

With rustybam its easy:

./rustybam liftover \
    --bed <(printf "chr1\t0\t250000000\n") \
    input.paf > trimmed.paf

But I also want the alignment statistics for the region.

No problem, rustybam liftover does not just trim the coordinates but also the CIGAR so it is ready for rustybam stats:

./rustybam liftover \
    --bed <(printf "chr1\t0\t250000000\n") \
    input.paf \
    | ./rustybam stats --paf \
    > trimmed.stats.bed

Okay, but Evan asked for an "align slider" so I need to realign in chunks.

No need, just make your bed query to rustybam liftoff a set of sliding windows and it will do the rest.

./rustybam liftover \
    --bed <(bedtools makewindows -w 100000 \
                <(printf "chr1\t0\t250000000\n")
            ) \
    input.paf \
    | ./rustybam stats --paf \
    > trimmed.stats.bed

You can also use rustybam breakpaf to break up the paf records of indels above a certain size to get more "miropeats" like intervals.

./rustybam breakpaf --max-size 1000 input.paf \
    | ./rustybam liftover \
        --bed <(printf "chr1\t0\t250000000\n") \
    | ./rustybam stats --paf \
    > trimmed.stats.bed

Yeah but how do I visualize the data?

Try out SafFire!

General usage

./rustybam 0.1.1
Mitchell R. Vollger's alignment utilities

USAGE:
    rustybam [OPTIONS] <SUBCOMMAND>

FLAGS:
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
    -t, --threads <threads>    Number of threads to use for decompressing

SUBCOMMANDS:
    help        Prints this message or the help of the given subcommand(s)
    liftover    liftover target sequence coordinates onto query sequence using a PAF
    nucfreq     Get the frequencies of each bp at each position.
    repeat      Report the longest repeat length at every position in a fasta.
    stats       Get percent identity stats from a sam/bam/cram or PAF (add --paf)
    suns        Extract the intervals in a genome (fasta) that are made up of SUNs

More details on liftover

This is a function for lifting over coordinates from a reference (-bed) to a query using a PAF file from minimap2 or unimap. minimap2 (or unimap) must be run with --cs or -c --eqx and the output format must be PAF or else the liftover is not possible.

The returned file is a PAF file that is trimmed to the regions in the bed file. Even the cigar in the returned PAF file is trimmed so it can be used downstream! Additionally, a tag with the format id:Z:<> is added to the PAF where <> is either the 4th column of the input bed file or if not present chr_start_end.

Want to liftover from the query to the reference? No problem, just pass the -q flag. Note, that this will make the query in the input PAF the target in the output PAF.

rustybam-liftover
liftover target sequence coordinates onto query sequence using a PAF

USAGE:
    rustybam liftover [FLAGS] [OPTIONS] --bed <bed> [paf]

ARGS:
    <paf>    PAF file from minimap2 or unimap. Must have the cg tag, and n matches will be zero
             unless the cigar uses =X.

FLAGS:
    -h, --help       Prints help information
    -q, --qbed       The bed contains query coordinates to be lifted (note the query in the original
                     PAF will become the target in the output)
    -V, --version    Prints version information

OPTIONS:
    -b, --bed <bed>            Bed file of regions to liftover
    -t, --threads <threads>    Number of threads to use for decompressing

Alignment workflow

This repository also includes a snakemake workflow for aligning genome assemblies to a reference. The config file config/config.yaml should be configured to your use case and the file config/table.asm.tbl should be configured for your input assemblies. The Snakefile can be found under the workflow directory.

Once you have modified table.asm.tbl to have your assemblies and config.yaml with your references you can run the snakemake with:

snakemake all --cores 160 --use-conda  {any extra snakemake options}