drprg 0.1.1

Drug resistance prediction with reference graphs
Documentation
# Predict

The `predict` subcommand is used to predict resistance for a sample from an index.

At its simplest

```
drprg predict -i reads.fq -x mtb -o outdir
```

`drprg` is a bit "new-age" in that it assumes the reads are Nanopore. If they're
Illumina, use the `-I/--illumina` option.

See [Prediction Output](./predict-output.md) documentation for a detailed description of
what results/output files and formats to expect.

## Required

### Index

The index is provided via the `-x/--index` option. It can either be a path to an index,
or the name of a [downloaded index](./download.md). As with the [`index`](./download.md)
subcommmand, you can specify a version if you don't want to use the latest.

### Input reads

A fastq (or fasta) file of the reads you want to predict resistance from - provided via
the `-i/--input` option. If you have paired reads in two files, simply combine them and
pass the combined file - interleave order doesn't matter. For example

```
cat r1.fq r2.fq > combined.fq
drprg predict -i combined.fq ...
```

`gzip`-compressed files are also accepted.

## Optional

### Sample name

Identifier to use for your output files. By default, it will be set to the file name
prefix (e.g. `name` for a fastq named `name.fq.gz`). Provided via the `-s/--sample`
option.

### Minimum allele frequency

Provided via the `-f/--maf` option. If an alternate allele has at least this fraction of
the depth, a minor resistance ("r") prediction is made. By default, this is set to `1.0`
for Nanopore data (i.e. minor allele detection is off) and `0.1` when using
the `--illumina` option. For example, if a variant is called as the reference allele for
Illumina reads, but an alternate allele has more than 10% of the depth on that position,
a minor resistance call is made for the alternate allele.

### Ignore synonymous

Using the `-S/--ignore-synonymous` option will prevent synonymous mutations from
appearing as unknown resistance calls. However, any synonymous mutations in the
catalogue will still be considered.

## Quick usage

```
$ drprg predict -h
Predict drug resistance

Usage: drprg predict [OPTIONS] --index <DIR> --input <FILE>

Options:
  -v, --verbose        Use verbose output
  -t, --threads <INT>  Maximum number of threads to use [default: 1]
  -h, --help           Print help (see more with '--help')

Input/Output:
  -x, --index <DIR>      Name of a downloaded index or path to an index
  -i, --input <FILE>     Reads to predict resistance from
  -o, --outdir <DIR>     Directory to place output [default: .]
  -s, --sample <SAMPLE>  Identifier to use for the sample
  -I, --illumina         Sample reads are from Illumina sequencing

Filter:
  -S, --ignore-synonymous     Ignore unknown (off-catalogue) variants that cause a synonymous substitution
  -f, --maf <FLOAT[0.0-1.0]>  Minimum allele frequency to call variants [default: 1]
```

## Full usage

```
$ drprg predict --help
Predict drug resistance

Usage: drprg predict [OPTIONS] --index <DIR> --input <FILE>

Options:
  -p, --pandora <FILE>
          Path to pandora executable. Will try in src/ext or $PATH if not given

  -v, --verbose
          Use verbose output

  -m, --makeprg <FILE>
          Path to make_prg executable. Will try in src/ext or $PATH if not given

  -t, --threads <INT>
          Maximum number of threads to use

          Use 0 to select the number automatically

          [default: 1]

  -M, --mafft <FILE>
          Path to MAFFT executable. Will try in src/ext or $PATH if not given

  -h, --help
          Print help (see a summary with '-h')

Input/Output:
  -x, --index <DIR>
          Name of a downloaded index or path to an index

  -i, --input <FILE>
          Reads to predict resistance from

          Both fasta and fastq are accepted, along with compressed or uncompressed.

  -o, --outdir <DIR>
          Directory to place output

          [default: .]

  -s, --sample <SAMPLE>
          Identifier to use for the sample

          If not provided, this will be set to the input reads file path prefix

  -I, --illumina
          Sample reads are from Illumina sequencing

Filter:
  -S, --ignore-synonymous
          Ignore unknown (off-catalogue) variants that cause a synonymous substitution

  -f, --maf <FLOAT[0.0-1.0]>
          Minimum allele frequency to call variants

          If an alternate allele has at least this fraction of the depth, a minor resistance ("r") prediction is made. Set to 1 to disable. If --illumina is passed, the default is 0.1

          [default: 1]

      --debug
          Output debugging files. Mostly for development purposes

  -d, --min-covg <INT>
          Minimum depth of coverage allowed on variants

          [default: 3]

  -D, --max-covg <INT>
          Maximum depth of coverage allowed on variants

          [default: 2147483647]

  -b, --min-strand-bias <FLOAT>
          Minimum strand bias ratio allowed on variants

          For example, setting to 0.25 requires >=25% of total (allele) coverage on both strands for an allele.

          [default: 0.01]

  -g, --min-gt-conf <FLOAT>
          Minimum genotype confidence (GT_CONF) score allow on variants

          [default: 0]

  -L, --max-indel <INT>
          Maximum (absolute) length of insertions/deletions allowed

  -K, --min-frs <FLOAT>
          Minimum fraction of read support

          For example, setting to 0.9 requires >=90% of coverage for the variant to be on the called allele

          [default: 0]
```