rumi 0.2.1

PCR Deduplication via directional adjacency
Documentation
8/25/19
- Fixed stats and added stats to groups. Still a little different than how umi_tools
  does things, but I think it's reasonable
- ToDo's benchmark on some large datasets
- Clean up the code a bit
- Set it free

8/23/19
- First pass at adding stats, very much broken on the reads_unmapped

8/22/19
- Added first pass at support for paired end reads.
- Next, add tests for paired end reads? maybe not worth it
- Investigate diffs with umi_tools, expecially around default settings for skipping / tlen
- Collect some stats similar to umi_tools 

8/21/19
- Added test to make sure that the determine_umi step couldn't assign a umi to multiple 
  masters, which results in the read showing up twice in the group_only option.
- Restored rayon for run_dedup and run_group
- Check that the groups are correct still in their numbering

8/20/19
- Working on finding why rumi gets more reads on the example.bam file than umi_tools does.
  It seems like umi_tools is in some cases pulling reads that are dist 3 away into groups
  where they may not belong. The following is the ouput of compare_reads.pl
  ```bash
$ perl ./scripts/compare_reads.pl (samtools view /mnt/d/dev/UMI-tools/tests_out/example_umitools.bam | psub) (samtools view /mnt/d/dev/UMI-tools/tests_out/example_rumi_deduped.bam  | psub) (samtools view /mnt/d/dev/UMI-tools/tests_out/example_group.bam|psub) (samtools view /mnt/d/dev/UMI-tools/tests_out/example_rumi.bam  | psub)
FOUND:	 SRR2057595.3345647_TTTGGTTTA	16	chr8	82003435	255	21M	*	0	0	*	*	XA:i:2	MD:Z:1G2T16	NM:i:2	BX:Z:TTTGGTTTA	UG:i:15127
EXPECTED:	 SRR2057595.3345647_TTTGGTTTA	16	chr8	82003435	255	21M	*	0	0	*	*	XA:i:2	MD:Z:1G2T16	NM:i:2	UG:i:15059	BX:Z:TGAGGTTGA
FOUND:	 SRR2057595.7255940_TGTGGTTAC	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	BX:Z:TGTGGTTAC	UG:i:15122
EXPECTED:	 SRR2057595.7255940_TGTGGTTAC	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	UG:i:15059	BX:Z:TGAGGTTGA
FOUND:	 SRR2057595.7609280_GCCGGTTTT	16	chr6	128748879	255	48M	*	0	0	*	*	XA:i:1	MD:Z:0A47	NM:i:1	BX:Z:GCCGGTTTT	UG:i:13709
EXPECTED:	 SRR2057595.7609280_GCCGGTTTT	16	chr6	128748879	255	48M	*	0	0	*	*	XA:i:1	MD:Z:0A47	NM:i:1	UG:i:13685	BX:Z:GTAGGTTTC
FOUND:	 SRR2057595.5016607_GCAGGTTTA	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	BX:Z:GCAGGTTTA	UG:i:15129
EXPECTED:	 SRR2057595.5016607_GCAGGTTTA	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	UG:i:15059	BX:Z:TGAGGTTGA
FOUND:	 SRR2057595.1514218_AAGGGTTAT	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	BX:Z:AAGGGTTAT	UG:i:15125
EXPECTED:	 SRR2057595.1514218_AAGGGTTAT	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	UG:i:15060	BX:Z:ATGGGTTGA
FOUND:	 SRR2057595.897659_ATAGGTTTC	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	BX:Z:ATAGGTTTC	UG:i:15128
EXPECTED:	 SRR2057595.897659_ATAGGTTTC	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	UG:i:15060	BX:Z:ATGGGTTGA
FOUND:	 SRR2057595.3245577_GGAGGTTCT	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	BX:Z:GGAGGTTCT	UG:i:15130
EXPECTED:	 SRR2057595.3245577_GGAGGTTCT	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	UG:i:15059	BX:Z:TGAGGTTGA
FOUND:	 SRR2057595.13317470_TTGGGTTAA	16	chr3	96263947	255	38M	*	0	0	*	*	XA:i:1	MD:Z:11G26	NM:i:1	BX:Z:TTGGGTTAA	UG:i:10683
EXPECTED:	 SRR2057595.13317470_TTGGGTTAA	16	chr3	96263947	255	38M	*	0	0	*	*	XA:i:1	MD:Z:11G26	NM:i:1	UG:i:10632	BX:Z:TCAGGTTCA
FOUND:	 SRR2057595.8903949_GCTGGTTCT	16	chr11	101507078	255	67M	*	0	0	*	*	XA:i:2	MD:Z:56T1C8	NM:i:2	BX:Z:GCTGGTTAT	UG:i:3777
EXPECTED:	 SRR2057595.8903949_GCTGGTTCT	16	chr11	101507078	255	67M	*	0	0	*	*	XA:i:2	MD:Z:56T1C8	NM:i:2	UG:i:3735	BX:Z:ATGGGTTAT
FOUND:	 SRR2057595.6107476_TCGGGTTAC	0	chr11	83085100	255	67M	*	0	0	*	*	XA:i:1	MD:Z:58T8	NM:i:1	BX:Z:TCGGGTTAC	UG:i:2555
EXPECTED:	 SRR2057595.6107476_TCGGGTTAC	0	chr11	83085100	255	67M	*	0	0	*	*	XA:i:1	MD:Z:58T8	NM:i:1	UG:i:2527	BX:Z:TTCGGTTGC
FOUND:	 SRR2057595.5405752_AACGGTTGG	0	chr1	72283620	255	67M	*	0	0	*	*	XA:i:1	MD:Z:56G10	NM:i:1	BX:Z:AACGGTTGG	UG:i:412
EXPECTED:	 SRR2057595.5405752_AACGGTTGG	0	chr1	72283620	255	67M	*	0	0	*	*	XA:i:1	MD:Z:56G10	NM:i:1	UG:i:376	BX:Z:ATTGGTTCG
FOUND:	 SRR2057595.2806735_AAAGGTTCC	0	chr11	87275932	255	67M	*	0	0	*	*	XA:i:2	MD:Z:28C5T32	NM:i:2	BX:Z:AAAGGTTCC	UG:i:3102
EXPECTED:	 SRR2057595.2806735_AAAGGTTCC	0	chr11	87275932	255	67M	*	0	0	*	*	XA:i:2	MD:Z:28C5T32	NM:i:2	UG:i:3097	BX:Z:GTAGGTTAC
FOUND:	 SRR2057595.8391205_GGGGGTTGT	16	chr3	96263946	255	39M	*	0	0	*	*	XA:i:2	MD:Z:0T8C29	NM:i:2	BX:Z:GGGGGTTGT	UG:i:10686
EXPECTED:	 SRR2057595.8391205_GGGGGTTGT	16	chr3	96263946	255	39M	*	0	0	*	*	XA:i:2	MD:Z:0T8C29	NM:i:2	UG:i:10625	BX:Z:CTGGGTTGA
FOUND:	 SRR2057595.482451_TCGGGTTGG	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	BX:Z:TCGGGTTGG	UG:i:15110
EXPECTED:	 SRR2057595.482451_TCGGGTTGG	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	UG:i:15059	BX:Z:TGAGGTTGA
FOUND:	 SRR2057595.2938337_AATGGTTAC	16	chr9	65044379	255	27M	*	0	0	*	*	XA:i:0	MD:Z:27	NM:i:0	BX:Z:AATGGTTAC	UG:i:15687
EXPECTED:	 SRR2057595.2938337_AATGGTTAC	16	chr9	65044379	255	27M	*	0	0	*	*	XA:i:0	MD:Z:27	NM:i:0	UG:i:15646	BX:Z:TCTGGTTTC
FOUND:	 SRR2057595.12752032_TTTGGTTGA	16	chr11	101507078	255	67M	*	0	0	*	*	XA:i:2	MD:Z:48C9C8	NM:i:2	BX:Z:TTTGGTTGA	UG:i:3776
EXPECTED:	 SRR2057595.12752032_TTTGGTTGA	16	chr11	101507078	255	67M	*	0	0	*	*	XA:i:2	MD:Z:48C9C8	NM:i:2	UG:i:3749	BX:Z:ATTGGTTCG
FOUND:	 SRR2057595.502927_CAAGGTTAA	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	BX:Z:CAAGGTTAA	UG:i:15120
EXPECTED:	 SRR2057595.502927_CAAGGTTAA	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	UG:i:15059	BX:Z:TGAGGTTGA
FOUND:	 SRR2057595.9402462_TATGGTTGG	16	chr3	96263947	255	38M	*	0	0	*	*	XA:i:1	MD:Z:11G26	NM:i:1	BX:Z:TATGGTTGG	UG:i:10684
EXPECTED:	 SRR2057595.9402462_TATGGTTGG	16	chr3	96263947	255	38M	*	0	0	*	*	XA:i:1	MD:Z:11G26	NM:i:1	UG:i:10631	BX:Z:CATGGTTCT
FOUND:	 SRR2057595.4041993_CAAGGTTGA	0	chr3	96132476	255	38M	*	0	0	*	*	XA:i:0	MD:Z:38	NM:i:0	BX:Z:CAAGGTTGA	UG:i:10268
EXPECTED:	 SRR2057595.4041993_CAAGGTTGA	0	chr3	96132476	255	38M	*	0	0	*	*	XA:i:0	MD:Z:38	NM:i:0	UG:i:10172	BX:Z:GGAGGTTAA
FOUND:	 SRR2057595.4828384_TCCGGTTCA	0	chr1	72290887	255	67M	*	0	0	*	*	XA:i:1	MD:Z:37C29	NM:i:1	BX:Z:TCCGGTTCA	UG:i:558
EXPECTED:	 SRR2057595.4828384_TCCGGTTCA	0	chr1	72290887	255	67M	*	0	0	*	*	XA:i:1	MD:Z:37C29	NM:i:1	UG:i:517	BX:Z:CACGGTTTA
FOUND:	 SRR2057595.2554282_AGTGGTTCT	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	BX:Z:AGTGGTTCT	UG:i:15134
EXPECTED:	 SRR2057595.2554282_AGTGGTTCT	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	UG:i:15059	BX:Z:TGAGGTTGA
FOUND:	 SRR2057595.4109672_TTGGGTTAC	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	BX:Z:TTGGGTTAC	UG:i:15137
EXPECTED:	 SRR2057595.4109672_TTGGGTTAC	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	UG:i:15059	BX:Z:TGAGGTTGA
FOUND:	 SRR2057595.11515838_ATGGTTCTT	0	chr3	96132477	255	37M	*	0	0	*	*	XA:i:0	MD:Z:37	NM:i:0	BX:Z:ATGGTTCTT	UG:i:10287
EXPECTED:	 SRR2057595.11515838_ATGGTTCTT	0	chr3	96132477	255	37M	*	0	0	*	*	XA:i:0	MD:Z:37	NM:i:0	UG:i:10267	BX:Z:ACGGTTACT
FOUND:	 SRR2057595.3345647_TTTGGTTTA	16	chr8	82003435	255	21M	*	0	0	*	*	XA:i:2	MD:Z:1G2T16	NM:i:2	BX:Z:TTTGGTTTA	UG:i:15127
EXPECTED:	 SRR2057595.3345647_TTTGGTTTA	16	chr8	82003435	255	21M	*	0	0	*	*	XA:i:2	MD:Z:1G2T16	NM:i:2	UG:i:15059	BX:Z:TGAGGTTGA
FOUND:	 SRR2057595.7255940_TGTGGTTAC	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	BX:Z:TGTGGTTAC	UG:i:15122
EXPECTED:	 SRR2057595.7255940_TGTGGTTAC	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	UG:i:15059	BX:Z:TGAGGTTGA
FOUND:	 SRR2057595.7609280_GCCGGTTTT	16	chr6	128748879	255	48M	*	0	0	*	*	XA:i:1	MD:Z:0A47	NM:i:1	BX:Z:GCCGGTTTT	UG:i:13709
EXPECTED:	 SRR2057595.7609280_GCCGGTTTT	16	chr6	128748879	255	48M	*	0	0	*	*	XA:i:1	MD:Z:0A47	NM:i:1	UG:i:13685	BX:Z:GTAGGTTTC
FOUND:	 SRR2057595.5016607_GCAGGTTTA	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	BX:Z:GCAGGTTTA	UG:i:15129
EXPECTED:	 SRR2057595.5016607_GCAGGTTTA	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	UG:i:15059	BX:Z:TGAGGTTGA
FOUND:	 SRR2057595.1514218_AAGGGTTAT	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	BX:Z:AAGGGTTAT	UG:i:15125
EXPECTED:	 SRR2057595.1514218_AAGGGTTAT	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	UG:i:15060	BX:Z:ATGGGTTGA
FOUND:	 SRR2057595.897659_ATAGGTTTC	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	BX:Z:ATAGGTTTC	UG:i:15128
EXPECTED:	 SRR2057595.897659_ATAGGTTTC	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	UG:i:15060	BX:Z:ATGGGTTGA
FOUND:	 SRR2057595.3245577_GGAGGTTCT	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	BX:Z:GGAGGTTCT	UG:i:15130
EXPECTED:	 SRR2057595.3245577_GGAGGTTCT	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	UG:i:15059	BX:Z:TGAGGTTGA
FOUND:	 SRR2057595.13317470_TTGGGTTAA	16	chr3	96263947	255	38M	*	0	0	*	*	XA:i:1	MD:Z:11G26	NM:i:1	BX:Z:TTGGGTTAA	UG:i:10683
EXPECTED:	 SRR2057595.13317470_TTGGGTTAA	16	chr3	96263947	255	38M	*	0	0	*	*	XA:i:1	MD:Z:11G26	NM:i:1	UG:i:10632	BX:Z:TCAGGTTCA
FOUND:	 SRR2057595.8903949_GCTGGTTCT	16	chr11	101507078	255	67M	*	0	0	*	*	XA:i:2	MD:Z:56T1C8	NM:i:2	BX:Z:GCTGGTTAT	UG:i:3777
EXPECTED:	 SRR2057595.8903949_GCTGGTTCT	16	chr11	101507078	255	67M	*	0	0	*	*	XA:i:2	MD:Z:56T1C8	NM:i:2	UG:i:3735	BX:Z:ATGGGTTAT
FOUND:	 SRR2057595.6107476_TCGGGTTAC	0	chr11	83085100	255	67M	*	0	0	*	*	XA:i:1	MD:Z:58T8	NM:i:1	BX:Z:TCGGGTTAC	UG:i:2555
EXPECTED:	 SRR2057595.6107476_TCGGGTTAC	0	chr11	83085100	255	67M	*	0	0	*	*	XA:i:1	MD:Z:58T8	NM:i:1	UG:i:2527	BX:Z:TTCGGTTGC
FOUND:	 SRR2057595.5405752_AACGGTTGG	0	chr1	72283620	255	67M	*	0	0	*	*	XA:i:1	MD:Z:56G10	NM:i:1	BX:Z:AACGGTTGG	UG:i:412
EXPECTED:	 SRR2057595.5405752_AACGGTTGG	0	chr1	72283620	255	67M	*	0	0	*	*	XA:i:1	MD:Z:56G10	NM:i:1	UG:i:376	BX:Z:ATTGGTTCG
FOUND:	 SRR2057595.2806735_AAAGGTTCC	0	chr11	87275932	255	67M	*	0	0	*	*	XA:i:2	MD:Z:28C5T32	NM:i:2	BX:Z:AAAGGTTCC	UG:i:3102
EXPECTED:	 SRR2057595.2806735_AAAGGTTCC	0	chr11	87275932	255	67M	*	0	0	*	*	XA:i:2	MD:Z:28C5T32	NM:i:2	UG:i:3097	BX:Z:GTAGGTTAC
FOUND:	 SRR2057595.8391205_GGGGGTTGT	16	chr3	96263946	255	39M	*	0	0	*	*	XA:i:2	MD:Z:0T8C29	NM:i:2	BX:Z:GGGGGTTGT	UG:i:10686
EXPECTED:	 SRR2057595.8391205_GGGGGTTGT	16	chr3	96263946	255	39M	*	0	0	*	*	XA:i:2	MD:Z:0T8C29	NM:i:2	UG:i:10625	BX:Z:CTGGGTTGA
FOUND:	 SRR2057595.482451_TCGGGTTGG	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	BX:Z:TCGGGTTGG	UG:i:15110
EXPECTED:	 SRR2057595.482451_TCGGGTTGG	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	UG:i:15059	BX:Z:TGAGGTTGA
FOUND:	 SRR2057595.2938337_AATGGTTAC	16	chr9	65044379	255	27M	*	0	0	*	*	XA:i:0	MD:Z:27	NM:i:0	BX:Z:AATGGTTAC	UG:i:15687
EXPECTED:	 SRR2057595.2938337_AATGGTTAC	16	chr9	65044379	255	27M	*	0	0	*	*	XA:i:0	MD:Z:27	NM:i:0	UG:i:15646	BX:Z:TCTGGTTTC
FOUND:	 SRR2057595.12752032_TTTGGTTGA	16	chr11	101507078	255	67M	*	0	0	*	*	XA:i:2	MD:Z:48C9C8	NM:i:2	BX:Z:TTTGGTTGA	UG:i:3776
EXPECTED:	 SRR2057595.12752032_TTTGGTTGA	16	chr11	101507078	255	67M	*	0	0	*	*	XA:i:2	MD:Z:48C9C8	NM:i:2	UG:i:3749	BX:Z:ATTGGTTCG
FOUND:	 SRR2057595.502927_CAAGGTTAA	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	BX:Z:CAAGGTTAA	UG:i:15120
EXPECTED:	 SRR2057595.502927_CAAGGTTAA	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	UG:i:15059	BX:Z:TGAGGTTGA
FOUND:	 SRR2057595.9402462_TATGGTTGG	16	chr3	96263947	255	38M	*	0	0	*	*	XA:i:1	MD:Z:11G26	NM:i:1	BX:Z:TATGGTTGG	UG:i:10684
EXPECTED:	 SRR2057595.9402462_TATGGTTGG	16	chr3	96263947	255	38M	*	0	0	*	*	XA:i:1	MD:Z:11G26	NM:i:1	UG:i:10631	BX:Z:CATGGTTCT
FOUND:	 SRR2057595.4041993_CAAGGTTGA	0	chr3	96132476	255	38M	*	0	0	*	*	XA:i:0	MD:Z:38	NM:i:0	BX:Z:CAAGGTTGA	UG:i:10268
EXPECTED:	 SRR2057595.4041993_CAAGGTTGA	0	chr3	96132476	255	38M	*	0	0	*	*	XA:i:0	MD:Z:38	NM:i:0	UG:i:10172	BX:Z:GGAGGTTAA
FOUND:	 SRR2057595.4828384_TCCGGTTCA	0	chr1	72290887	255	67M	*	0	0	*	*	XA:i:1	MD:Z:37C29	NM:i:1	BX:Z:TCCGGTTCA	UG:i:558
EXPECTED:	 SRR2057595.4828384_TCCGGTTCA	0	chr1	72290887	255	67M	*	0	0	*	*	XA:i:1	MD:Z:37C29	NM:i:1	UG:i:517	BX:Z:CACGGTTTA
FOUND:	 SRR2057595.2554282_AGTGGTTCT	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	BX:Z:AGTGGTTCT	UG:i:15134
EXPECTED:	 SRR2057595.2554282_AGTGGTTCT	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	UG:i:15059	BX:Z:TGAGGTTGA
FOUND:	 SRR2057595.4109672_TTGGGTTAC	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	BX:Z:TTGGGTTAC	UG:i:15137
EXPECTED:	 SRR2057595.4109672_TTGGGTTAC	16	chr8	82003436	255	20M	*	0	0	*	*	XA:i:2	MD:Z:0G2T16	NM:i:2	UG:i:15059	BX:Z:TGAGGTTGA
FOUND:	 SRR2057595.11515838_ATGGTTCTT	0	chr3	96132477	255	37M	*	0	0	*	*	XA:i:0	MD:Z:37	NM:i:0	BX:Z:ATGGTTCTT	UG:i:10287
EXPECTED:	 SRR2057595.11515838_ATGGTTCTT	0	chr3	96132477	255	37M	*	0	0	*	*	XA:i:0	MD:Z:37	NM:i:0	UG:i:10267	BX:Z:ACGGTTACT
  ```
  This accounts for 23 of the 26 extra reads from rumi. I don't know why umi_tools does this. I'm betting the other missing reads are of a similar vein
- SRR2057595.3354975_CGGGTTGGT: rumi correctly uses the umi starting
  with C since there are two reads with that umi. umi_tools uses the umi
  with only a feq of 1.
- SRR2057595.4915638_TTGGTTAAA: rumi correctly chooses the read with the
  decided upon umi as the best read.
- SRR2057595.5405752_AACGGTTGG: rumi correctly leaves as it's own group.
  umi_tools corrects it dist 3 away to ATTGGTTCG. I expect this to be
  the end source of the 30 extra reads in rumi's output. What causes
  this in umi_tools?
- Next up is to add a test for making sure that reads aren't doubled up on in the
  determine_umi step. Then make that step better and faster.
- At this point I feel pretty confident in the calls that rumi makes. 

8/18/19
- Updated tests to work with new BTreeMap structure and read_groups types
- Took second pass at a group_only option. Currently all reads are being collapsed
  in the group_reads function. Need to figure out how to keep them around, ideally
  without compromising the performance of the dedup procedure itself. group_only 
  can be slow. dedup must be fast
Later the same da
- I have an extra 3000 reads for no reason I can figure you. Also chrY is being ordered weird
...
- The determin_umi step was double adding some reads. That has been fixed, but it's an inefficient fix.
- group_only now ouptus the right number of reads. The diffs between the two should
  help figure out the remaining diffs in the dedup only.

8/17/19
- Update tests to work with new read_groups types and BTreeMap
- Then I need to add the UG and BX tags to them for the group_only option
- I think that my missing reads stem from positional grouping, not from
  the directional adjacnecy. Find a way to test this?