bamnado 0.4.4

Tools and utilities for manipulation of BAM files for unusual use cases. e.g. single cell, MCC
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
# BamNado

High-performance tools and utilities for manipulation of BAM files for specialized use cases, including single cell and MCC (Multi-modal cellular characterization) workflows.

## Overview

BamNado is a Rust-based toolkit designed to handle complex BAM file operations that are common in modern genomics workflows, particularly in single-cell and multi-modal cellular characterization experiments. It provides efficient, cross-platform tools for coverage calculation, read filtering, file splitting, and various BAM file transformations.

## Python Interface

BamNado also provides a Python interface for direct access to its high-performance BAM processing capabilities.

### Installation

You can install the Python package directly from the source using `pip` or `uv`:

```bash
pip install .
# or
uv pip install .
```

### Usage

```python
import bamnado
import numpy as np

# Get coverage signal for a chromosome
signal = bamnado.get_signal_for_chromosome(
    bam_path="path/to/file.bam",
    chromosome_name="chr1",
    bin_size=50,
    scale_factor=1.0,
    use_fragment=False,
    ignore_scaffold_chromosomes=True
)

# signal is a numpy array of floats
print(f"Mean coverage: {np.mean(signal)}")
```

## Installation

BamNado can be installed in several ways. Choose the method that best fits your needs:

### Method 0: Docker Container (Easiest for Linux/macOS)

If you have Docker installed, you can run BamNado directly from a container:

```bash
# Pull the latest image
docker pull ghcr.io/alsmith151/bamnado:latest

# Run any bamnado command
docker run --rm -v /path/to/data:/data ghcr.io/alsmith151/bamnado:latest coverage --help
```

**Multi-platform support**: Container images are available for both `linux/amd64` and `linux/arm64`. macOS users with Apple Silicon can run the ARM64 image natively via Docker Desktop.

**Example: Calculate coverage from a BAM file**

```bash
docker run --rm -v /path/to/data:/data ghcr.io/alsmith151/bamnado:latest \
  coverage \
  --bam /data/input.bam \
  --output /data/output.bw
```

**Using specific version tags**

```bash
# Use a specific release version
docker pull ghcr.io/alsmith151/bamnado:v0.4.0

# Run with version tag
docker run --rm -v /path/to/data:/data ghcr.io/alsmith151/bamnado:v0.4.0 coverage --help
```

### Method 1: Pre-built Binaries (Recommended)

The easiest way to get started is to download a pre-compiled binary from our [releases page](https://github.com/alsmith151/BamNado/releases).

#### Available Platforms

| Platform | Architecture | File Name |
|----------|-------------|-----------|
| Linux | x86_64 | `bamnado-x86_64-unknown-linux-gnu.tar.gz` |
| macOS | Intel (x86_64) | `bamnado-x86_64-apple-darwin.tar.gz` |
| macOS | Apple Silicon (ARM64) | `bamnado-aarch64-apple-darwin.tar.gz` |
| Windows | x86_64 | `bamnado-x86_64-pc-windows-msvc.zip` |

#### Installation Steps

1. **Download the binary**

   Go to the [releases page]https://github.com/alsmith151/BamNado/releases and download the appropriate file for your system.

2. **Extract the archive**

   **Linux/macOS:**

   ```bash
   tar -xzf bamnado-*.tar.gz
   ```

   **Windows:**
   - Right-click the zip file and select "Extract All"
   - Or use your preferred extraction tool (7-Zip, WinRAR, etc.)

3. **Make executable** (Linux/macOS only)

   ```bash
   chmod +x bamnado
   ```

4. **Test the installation**

   ```bash
   ./bamnado --version
   ```

   You should see output like: `bamnado 0.4.0`

5. **Install system-wide** (optional but recommended)

   **Option A: System-wide installation (requires admin privileges)**

   ```bash
   # Linux/macOS
   sudo cp bamnado /usr/local/bin/

   # Windows (as Administrator)
   # Copy bamnado.exe to C:\Windows\System32\ or add to PATH
   ```

   **Option B: User-local installation (no admin required)**

   ```bash
   # Linux/macOS
   mkdir -p ~/.local/bin
   cp bamnado ~/.local/bin/

   # Add to your shell profile if not already in PATH
   echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
   # or for zsh users:
   echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.zshrc

   # Reload your shell or run:
   source ~/.bashrc  # or ~/.zshrc
   ```

6. **Verify system installation**

   Open a new terminal and run:

   ```bash
   bamnado --version
   ```

#### Troubleshooting Pre-built Binaries

##### Linux: "No such file or directory" error

- Your system might be missing required libraries. Try:

  ```bash
  ldd bamnado  # Check dependencies
  ```

- For older Linux distributions, you may need to build from source.

##### macOS: "Cannot be opened because the developer cannot be verified"

- Run: `xattr -d com.apple.quarantine bamnado`
- Or go to System Preferences → Security & Privacy and allow the app

##### Windows: "Windows protected your PC"

- Click "More info" → "Run anyway"
- Or add an exception in Windows Defender

### Method 3: Install via Cargo

If you have Rust and Cargo installed, you can install BamNado directly from crates.io:

```bash
cargo install bamnado
```

**Prerequisites:**

- Rust 1.70+ (install from [rustup.rs]https://rustup.rs/)
- Cargo (comes with Rust)

**Advantages:**

- Always gets the latest published version
- Automatically handles dependencies
- Works on any platform supported by Rust

### Method 4: Build from Source

For the latest development version or if pre-built binaries don't work on your system:

#### Prerequisites

- Rust 2024 edition or later
- Git
- C compiler (for some dependencies)

**Install Rust if you haven't already:**

```bash
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source ~/.cargo/env
```

#### Build Steps

1. **Clone the repository**

   ```bash
   git clone https://github.com/alsmith151/BamNado.git
   cd BamNado
   ```

2. **Build the project**

   ```bash
   # Debug build (faster compilation, slower execution)
   cargo build

   # Release build (slower compilation, faster execution - recommended)
   cargo build --release
   ```

3. **Test the build**

   ```bash
   # For debug build
   ./target/debug/bamnado --version

   # For release build
   ./target/release/bamnado --version
   ```

4. **Install system-wide** (optional)

   ```bash
   # Install from source
   cargo install --path .

   # Or manually copy the binary
   sudo cp target/release/bamnado /usr/local/bin/
   ```

#### Build Troubleshooting

##### Common Issues

##### Error: "linker 'cc' not found"

- **Ubuntu/Debian:** `sudo apt install build-essential`
- **CentOS/RHEL:** `sudo yum groupinstall "Development Tools"`
- **macOS:** Install Xcode Command Line Tools: `xcode-select --install`
- **Windows:** Install Visual Studio Build Tools or use WSL

##### Error: "failed to run custom build command for 'openssl-sys'"

- **Ubuntu/Debian:** `sudo apt install libssl-dev pkg-config`
- **CentOS/RHEL:** `sudo yum install openssl-devel pkgconf-pkg-config`
- **macOS:** Usually works out of the box with Homebrew
- **Windows:** Consider using the pre-built binaries instead

### Quick Start Verification

After installation, verify everything works:

```bash
# Check version
bamnado --version

# See available commands
bamnado --help

# Test with a simple command (replace with your BAM file)
bamnado bam-coverage --bam /path/to/your/file.bam --output test.bedgraph
```

## Usage

### Available Commands

BamNado provides several commands for different BAM file operations:

- `bam-coverage` - Calculate coverage from a BAM file and write to a bedGraph or bigWig file
- `multi-bam-coverage` - Calculate coverage from multiple BAM files and write to a bedGraph or bigWig file
- `split-exogenous` - Split a BAM file into endogenous and exogenous reads
- `split` - Split a BAM file based on a set of defined filters
- `modify` - Modify BAM files with various transformations
- `bigwig-compare` - Compare two BigWig files and write the result to a new BigWig file
- `bigwig-aggregate` - Aggregate multiple BigWig files into one using sum, mean, median, max, or min

For detailed help on any command, use:

```bash
bamnado <command> --help
```

### Example: Calculating Coverage from a BAM File

#### Command

```bash
bamnado bam-coverage \
  --bam input.bam \
  --output output.bedgraph \
  --bin-size 100 \
  --norm-method rpkm \
  --scale-factor 1.5 \
  --use-fragment \
  --proper-pair \
  --min-mapq 30 \
  --min-length 50 \
  --max-length 500 \
  --blacklisted-locations blacklist.bed \
  --whitelisted-barcodes barcodes.txt
```

#### Explanation of Options

- `--bam`: Path to the input BAM file.
- `--output`: Path to the output file (e.g., `bedGraph` or `BigWig`).
- `--bin-size`: Size of genomic bins for coverage calculation.
- `--norm-method`: Normalization method (`raw`, `rpkm`, or `cpm`).
- `--scale-factor`: Scaling factor for normalization.
- `--use-fragment`: Use fragments instead of individual reads for counting.
- `--proper-pair`: Include only properly paired reads.
- `--min-mapq`: Minimum mapping quality for reads to be included (default: 20).
- `--min-length`: Minimum read length (default: 20).
- `--max-length`: Maximum read length (default: 1000).
- `--blacklisted-locations`: Path to a BED file specifying regions to exclude.
- `--whitelisted-barcodes`: Path to a file with barcodes to include.
- `--strand`: Filter reads based on strand (both, forward, reverse).
- `--shift`: Shift options for the pileup (default: 0,0,0,0).
- `--truncate`: Truncate options for the pileup.
- `--ignore-scaffold`: Ignore scaffold chromosomes.
- `--read-group`: Selected read group.

#### Output

The output file (`output.bedgraph`) will contain the normalized coverage data for the BAM file, filtered based on the specified criteria. BigWig files can also be generated by specifying the `--output` option with a `.bw` extension.

### Additional Commands

#### Multi-BAM Coverage

To calculate coverage from multiple BAM files:

```bash
bamnado multi-bam-coverage \
  --bams file1.bam file2.bam \
  --output output.bedgraph \
  --bin-size 100 \
  --norm-method rpkm \
  --scale-factor 1.5 \
  --use-fragment \
  --proper-pair \
  --min-mapq 30 \
  --min-length 50 \
  --max-length 500
```

#### Split BAM File into Endogenous and Exogenous Reads

To split a BAM file into endogenous and exogenous reads:

```bash
bamnado split-exogenous \
  --input input.bam \
  --output output_prefix \
  --exogenous-prefix "exo_" \
  --stats stats.json \
  --allow-unknown-mapq \
  --proper-pair \
  --min-mapq 30 \
  --min-length 50 \
  --max-length 500
```

#### Split BAM File by Cell Barcodes

To split a BAM file based on cell barcodes:

```bash
bamnado split \
  --input input.bam \
  --output output_prefix \
  --whitelisted-barcodes barcodes.txt \
  --proper-pair \
  --min-mapq 30 \
  --min-length 50 \
  --max-length 500
```

#### Modify BAM Files

To modify BAM files with various transformations:

```bash
bamnado modify \
  --input input.bam \
  --output output_prefix \
  --proper-pair \
  --min-mapq 30 \
  --min-length 50 \
  --max-length 500 \
  --tn5-shift
```

The `modify` command supports various filtering options and transformations like Tn5 shifting for ATAC-seq data processing.

#### Compare BigWig Files

To compare two BigWig files and write the result to a new BigWig file:

```bash
bamnado bigwig-compare \
   --bw1 sample1.bw \
   --bw2 sample2.bw \
   --comparison subtraction \
   -s 50 \
   -o output.bw
```

Supported comparison methods:

- `subtraction`: $bw1 - bw2$
- `ratio`: $bw1 / (bw2 + pseudocount)$
- `log-ratio`: $\ln\left((bw1 + pseudocount) / (bw2 + pseudocount)\right)$

Common options:

- `-s, --bin-size`: Bin size in base pairs used to compute the mean score per bin (default: 50)
- `--chunk-size`: Chunk size in base pairs for streaming reads from BigWigs (tune for IO/memory)
- `--pseudocount`: Pseudocount used for `ratio` / `log-ratio` to avoid division by zero

#### Aggregate BigWig Files

To aggregate multiple BigWig files into a single output file:

```bash
bamnado bigwig-aggregate \
   --bigwigs sample1.bw sample2.bw sample3.bw \
   --method mean \
   -s 50 \
   -o aggregated.bw
```

Supported aggregation methods:

- `sum`: Sum of all values across all BigWigs at each position
- `mean`: Mean of all values across all BigWigs at each position
- `median`: Median of all values across all BigWigs at each position (computed post-binning)
- `max`: Maximum value across all BigWigs at each position
- `min`: Minimum value across all BigWigs at each position

Common options:

- `--bigwigs`: Space-separated list of BigWig files to aggregate (at least one required)
- `-s, --bin-size`: Bin size in base pairs used to compute aggregated score per bin (default: 50)
- `--chunk-size`: Chunk size in base pairs for streaming reads from BigWigs (tune for IO/memory)
- `--pseudocount`: Pseudocount value to add to all values before aggregation (useful for sum/mean/median to avoid zeros)

Examples:

```bash
# Sum coverage across 3 samples
bamnado bigwig-aggregate \
   --bigwigs sample1.bw sample2.bw sample3.bw \
   --method sum \
   -o total_coverage.bw

# Calculate mean coverage with pseudocount
bamnado bigwig-aggregate \
   --bigwigs replicate1.bw replicate2.bw replicate3.bw \
   --method mean \
   --pseudocount 1e-3 \
   -o mean_coverage.bw

# Calculate median coverage across many samples
bamnado bigwig-aggregate \
   --bigwigs $(ls *.bw) \
   --method median \
   -s 100 \
   -o median_coverage.bw
```

## Help

For more details on available commands and options, run:

```bash
bamnado --help
```

Or for specific command help:

```bash
bamnado <command> --help
```

## Features

- **High Performance**: Built in Rust for maximum speed and memory efficiency
- **Cross-platform**: Available for Linux, macOS, and Windows
- **Multiple Output Formats**: Support for bedGraph and BigWig output formats
- **Flexible Filtering**: Comprehensive read filtering options including mapping quality, read length, proper pairs, and more
- **Single Cell Support**: Built-in support for cell barcode-based operations
- **MCC Workflows**: Specialized tools for Multi-modal Cellular Characterization
- **Strand-specific Analysis**: Support for strand-specific coverage calculations
- **Blacklist/Whitelist Support**: Region and barcode filtering capabilities

## Development

### Requirements

- Rust 2024 edition or later
- Cargo package manager

### Building from Source

```bash
git clone https://github.com/alsmith151/BamNado.git
cd BamNado
cargo build --release
```

### Running Tests

```bash
cargo test
```

### Pre-commit Hooks

This project uses pre-commit hooks to ensure code quality and consistency. The hooks run the same checks as the CI workflow:

- Code formatting (`cargo fmt`)
- Linting (`cargo clippy`)
- Basic checks (`cargo check`)
- Tests (`cargo test` on push)

#### Quick Setup

Run the setup script to install and configure pre-commit hooks:

```bash
./setup-precommit.sh
```

#### Manual Setup

If you prefer to set up pre-commit manually:

```bash
# Install pre-commit (choose one method)
pip install pre-commit
# or: brew install pre-commit
# or: conda install -c conda-forge pre-commit

# Install the hooks
pre-commit install
pre-commit install --hook-type pre-push

# Test the setup
pre-commit run --all-files
```

#### Configuration Options

Two pre-commit configurations are available:

- `.pre-commit-config.yaml` - Full checks including `cargo check` on every commit
- `.pre-commit-config-fast.yaml` - Faster setup with formatting/linting only, tests on push

To use the fast configuration:

```bash
mv .pre-commit-config.yaml .pre-commit-config-full.yaml
mv .pre-commit-config-fast.yaml .pre-commit-config.yaml
pre-commit install
```

#### Useful Commands

```bash
pre-commit run --all-files       # Run all hooks on all files
pre-commit run cargo-fmt         # Run specific hook
pre-commit autoupdate            # Update hook versions
pre-commit uninstall             # Remove hooks
```

## Release Information

### Version 0.4.0

- High-performance BAM coverage and manipulation tools
- Python bindings (via `maturin`) for selected functionality
- BigWig comparison via `bigwig-compare` (subtraction/ratio/log-ratio)
- BigWig aggregation via `bigwig-aggregate` (sum/mean/median/max/min)

For detailed changelog information, see [CHANGELOG.md](CHANGELOG.md).

## License

This project is licensed under either of:

- Apache License, Version 2.0 ([LICENSE-APACHE]LICENSE-APACHE or <http://www.apache.org/licenses/LICENSE-2.0>)
- MIT license ([LICENSE-MIT]LICENSE-MIT or <http://opensource.org/licenses/MIT>)

at your option.

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.