scattr 0.2.0

A tool for estimating the copy number of large tandem repeats
Documentation
# Example of using ScatTR

## Running the example

In this example, we are estimating the copy number of a single TR in a simulated WGS sample. To run the example, make sure you have `scattr` or `cargo` installed in your `PATH`:

```
./run.sh
```

## Input files

In the `input/` directory, there are example input files to run ScatTR. In general, running ScatTR requires the following input files:

- A tab-delimeted **catalog** containing TR loci of interest: `catalog.tsv`
    - The file is expected to have columns: `id`, `contig`, `start`, `end`, and `motif`
- Aligned reads of the **sample** and their index: `input/sample.bam` and `input/sample.bam.bai`
- A **reference genome** and its index: `input/reference.fa` and `input/reference.fa.fai`

## Output files

In the `output/` directory, there are output files generated by ScatTR based on the example input files. ScatTR produces the following output files:

- The extracted bag of reads: `sample.bag.bam`
- The extracted depth and insert distributions: `sample.stats.json`
    - By default, ScatTR produces plots of these distributions (`sample.depth_distr.png` and `sample.insert_distr.png`). The `--no-plot` option disables this behavior
- The optimization problem definitions: `sample.defs.json`
    - This file information about all the reads associated with each TR and their relative positions with the decoy references (refer to manuscript methods section for more details)
- The estimated copy numbers: `sample.genotypes.json`
    - This file contains estimates for each TR locus in addition to a 95% confidence interval

## Generating input files

Generating the input files requires having `python`, `art_illumina`, `bwa`, `samtools` in your `PATH`. The input files are generated by running:

```
./scripts/make_inputs.sh
```

The script creates a random reference genome (~200 kbp in length) that is used to generate reads for a heteroyzygous TR expansion with normal allele copy number being 3 and with expanded copy number being 100. The motif of the TR is `CAGATA`.