barbell 0.3.0

Extremely fast and accurate Nanopore demultiplexing
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
# πŸ¦€ Barbell β€” Pattern aware demux


## Why Barbell?

- **>1000Γ— fewer trimming errors** compared to Dorado.
- Equivalent or **better assemblies**.
- **Contamination-free** assemblies by removing artefact reads.
- Easily applicable to **custom experiments**.
- Still **very fast**.

If you have any issues or if something is unclear, just create an [issue](https://github.com/rickbeeloo/barbell/issues).

---

## Quick links

- [Installing barbell]#installing-barbell
- Workflow examples
  - [Quickstart using Nanopore kit]#quickstart
  - [In-depth workflow for Nanopore kits]#in-depth-inspection-of-nanopore-kit-results
  - [Custom experiment (simple)]#custom-experiment
  - [Custom experiment (multi protocol)]#custom-experiment-with-mixed-sequences
- [Understanding barbell]#understanding-barbell
  - [What are barbell patterns?]#patterns
  - [How to cut concat reads]#how-to-handle-concat-reads
  - [Output column descriptions (annotate & filter)]#output-columns-annotate--filter
- [Paper evals]#paper-evals
- [Notes & tips]#notes--tips

---

## Paper 
See the paper for more details on the Barbell scoring and comparisons with other tools:
> ### [Barbell Resolves Demultiplexing and Trimming Issues in Nanopore Data]https://doi.org/10.1101/2025.10.22.683865
>
> **Rick Beeloo**, **Ragnar Groot Koerkamp**, **Xiu Jia**, **Marian J. Broekhuizen-Stins**, **Lieke van Ijken**, **Bas E. Dutilh**, **Aldert L. Zomer**  
> *bioRxiv*, 2025  
> [https://doi.org/10.1101/2025.10.22.683865]https://doi.org/10.1101/2025.10.22.683865


---

## Installing Barbell
Barbell is written in Rust.

### Executables
Download the latest release from
[releases](https://github.com/rickbeeloo/barbell/releases). These are also
available via

``` bash
cargo binstall --git https://github.com/rickbeeloo/barbell barbell
```
or via conda/mamba/pixi:

``` sh
conda install -c bioconda barbell
```

### From source (recommended)
Check whether Rust is installed:

```bash
rustc --version
```

If not install it via [rustup](https://rustup.rs/), more info in their
[docs](https://rust-lang.github.io/rustup/installation/index.html). Use `rustup
update` to get the latest stable version.

Then we can install Barbell:

```bash
RUSTFLAGS="-C target-cpu=native" cargo install --git https://github.com/rickbeeloo/barbell barbell
```

See [here](https://github.com/ragnargrootkoerkamp/ensure_simd) for
details on `target-cpu=native`.

---

## πŸ“Έ Preview
<img src="data/barbell_example.gif" alt="Demo GIF" width="700">



---

## Quickstart

Barbell includes built-in kit *presets* for many Nanopore kits (most DNA kits; RNA and Twist kits are an exception). Presets let you run analyses quickly, but we recommend reading **Understanding barbell** to interpret the results correctly.

Basic command:

```bash
barbell kit -k <kit-name> -i <reads.fastq> -o <output_folder> --maximize
```

The `--maximize` option is recommended (e.g., for assembly) unless you need an ultra-strict barcode configuration.

### Native barcoding example (SQK-NBD114-96)

```bash
barbell kit -k SQK-NBD114-96 -i reads.fastq -o output_folder --maximize
```

This uses a conservative flank-based edit-distance cutoff. If many reads are missed during `annotate`, you can relax the flank error threshold, for example:

```bash
--flank-max-errors 5
```

β€”but always inspect the results after changing thresholds to avoid random matches (which show up as `Fflank` matches).

### Rapid barcoding example (SQK-RBK114-96)

```bash
barbell kit -k SQK-RBK114-96 -i reads.fastq -o output_folder --maximize
```

For a list of supported kits (see `data/supported_kits.txt`). Note that we thoroughly tested the rapid and native kits but not others, if you experience any issues please report them.
**Note**, there is an option `--use-extended` which enables searches for fusion points, and other artefacts. This is around 3 times slower, 
but worth it if quality is essential (i.e. generating consensus sequences).


### General Pointers

- **Too few annotated reads (many missed):**  
  Slightly increase `--flank-max-errors` (the automatically derived cutoff is reported).  

- **Too many `Fflank` matches:**  
  1. Check `annotation.tsv` to see if matches occur at unexpected locations (not near read ends).  
     - **Random locations:** Lower `--flank-max-errors`.  
     - **Non-random locations:** Adjust `--min-score-diff`.  
       > This is usually unnecessary as defaults are lenient; report an issue if it occurs.

- **Using a kit other than native or rapid** and observing unexpected annotations: Please report an issue.


---

## In-depth inspection of Nanopore kit results

Barbell's typical manual workflow: **annotate β†’ inspect β†’ filter β†’ trim**.

### Annotate

Run `annotate` to find matches in reads and output an annotation table (TSV):

```bash
barbell annotate --kit SQK-RBK114-96 -i pass_sample.fastq -t 10 -o anno.tsv
```

(Using 10 threads.)

Example `anno.tsv` rows:

```
read_id read_len        rel_dist_to_end read_start_bar  read_end_bar    read_start_flank        read_end_flank   bar_start       bar_end match_type  flank_cost      barcode_cost    label   strand  cuts
c5f925b2-fc0b-4053-b615-d70950d41436    19783   14      14      104     14      104     0       0       Fflank   14      14      flank   Fwd
dbca7fb9-d6c8-4417-8ae7-bc32ebce9b27    2972    29      48      70      29      111     0       23      Ftag     13      7       BC29    Fwd
6c089f0a-50cd-4215-94f0-c7babb87f5fe    7599    27      51      74      27      121     0       23      Ftag     11      5       BC45    Fwd
..etc
```

For column descriptions see [Output columns (annotate & filter)](#output-columns-annotate--filter) below.

This file shows, per read, which barcodes/flanks were matched and with what costs. Use `inspect` next to summarize patterns.

### Inspect

Summarize patterns across the annotation file:

```bash
barbell inspect -i anno.tsv
```

By default `inspect` shows the top 10 pattern groups; use `-n <amount>` to increase.

Example summary:

```
Found 64 unique patterns
  Pattern 1: 82421 occurrences
    Ftag[fw, *, @left(0..250)]
  Pattern 2: 5003 occurrences
    Ftag[fw, *, @left(0..250)]__Ftag[fw, *, @right(0..250)]
  Pattern 3: 3545 occurrences
    Fflank[fw, *, @left(0..250)]
  ...
Showed 10 / 64 patterns
Inspection complete!
```

Some observations:
- `Ftag` on the left (`@left`) is the expected pattern for the rapid barcoding kit - that is good news.
- A contamination pattern can be a barcode on the left *and* another barcode on the right (`@right`). We can decide to just trim of the right side (see filtering later)
- `Fflank` indicates that flanks matched but no confident barcode was found.
- `@prev_left` indicates additional tags close to a previous element (e.g., double-barcode ligation).

#### Per-read patterns

To output the selected pattern per read:

```bash
barbell inspect -i anno.tsv -o pattern_per_read.tsv
```

Example `pattern_per_read.tsv` contents:

```
85ef... \t Ftag[fw, *, @left(0..250)]
2f67... \t Ftag[fw, *, @left(0..250)]__Ftag[fw, *, @prev_left(0..250)]
...
```

This is useful when you want to inspect a single "weird" read in detail.

### Filter

Create a `filters.txt` file listing the patterns you want to keep, one per line.
For example:

```
Ftag[fw, *, @left(0..250)]
Ftag[fw, *, @left(0..250)]__Ftag[fw, *, @right(0..250)]
Ftag[fw, *, @left(0..250)]__Ftag[fw, *, @prev_left(0..250)]
```

Then run:

```bash
barbell filter -i anno.tsv -f filters.txt -o filtered.tsv
```

The resulting `filtered.tsv` contains only reads that match the specified patterns.

#### Cutting / trimming metadata

Barbell needs to know *where* to cut reads for trimming. You mark cut positions by adding `>>` (cut **after** this element) or `<<` (cut **before** this element) inside the tag's bracket list. Where within the brackets does not matter.

Examples (note the comma-separated fields inside the brackets):

```text
Ftag[fw, *, @left(0..250), >>]
Ftag[fw, *, @left(0..250), >>]__Ftag[<<. fw, *, @right(0..250)]
Ftag[fw, *, @left(0..250)]__Ftag[fw, *, @prev_left(0..250), >>]
```

In the middle pattern we retain the read sequence between the left tag (cut after it) and the right tag (cut before it).

Run `filter` again (same command as above) to populate the `cuts` column in `filtered.tsv`. This is required before trimming.

---

## Trim

Trim reads using the `cuts` metadata produced by `filter`:

```bash
barbell trim -i filtered.tsv -r reads.fastq -o trimmed
```

Output files are organized by pattern-based folder/filenames, for example:

```
BC14_fw__BC14_fw.trimmed.fastq   BC31_fw__BC04_fw.trimmed.fastq  ...
```

If you prefer different filename conventions, use these flags:

```
  --no-label               Disable label in output filenames
  --no-orientation         Disable orientation in output filenames
  --no-flanks              Disable flanks in output filenames
  --sort-labels            Sort barcode labels in output filenames
  --only-side <left|right> Only keep left or right label in output filenames
```

Example to remove orientation and keep only the left label:

```bash
barbell trim -i filtered.tsv -r reads.fastq -o trimmed --no-orientation --only-side left
```

Gives:
```
BC01.trimmed.fastq  BC11.trimmed.fastq ...
```


## Custom experiment
We first create a Fasta, or multiple Fastas containing your queries depending on whether 
you have a single-end or dual-end experiment.


### Creating a query Fasta
If you have your own barcodes/primers/adapters, create FASTA files containing the full expected sequences. 
Note that each FASTA contains all sequences that <u>share the same prefix/suffix</u>, and are **unique**. 

The Fasta format should be as follows:
```text
>NB01
<left_flank_sequence><bar1_sequence><right_flank_sequence>
>NB02
<left_flank_sequence><bar2_sequence><right_flank_sequence>
...
```

Example (adapter + barcode + primer):

```
>NB01
TCGTTCAGTTACGTATTGCTCACAAAGACACCGACAACTTTCTTAGRGTTYGATYATGGCTCAG
>NB02
TCGTTCAGTTACGTATTGCTACAGACGACTACAAACGGAATCGAAGRGTTYGATYATGGCTCAG
```

Here:
```
TCGTTCAGTTACGTATTGCT CACAAAGACACCGACAACTTTCTT AGRGTTYGATYATGGCTCAG
    adapter               barcode                 primer
```

Barbell extracts the shared prefix (adapter) and shared suffix (primer) as flanks β€” only the barcode region should differ between FASTA entries.


### Single end
See above to create a fasta file, say `left.fasta` for single-end then we can run `annotate`:

```bash
barbell annotate -q left.fasta -b Ftag -i reads.fastq -o anno.tsv -t 10
```
After that you can follow the same `inspect β†’ filter β†’ trim` [steps described above]#inspect.


### Dual end
For dual-end, we create two FASTAs, `left.fasta` and `right.fasta` for example, then we run:

```bash
barbell annotate -q left.fasta,right.fasta -b Ftag,Rtag -i reads.fastq -o anno.tsv -t 10
```
Note: there must be **no spaces** between the comma-separated file list and the tag list: `-q left.fasta,right.fasta -b Ftag,Rtag`.


In case you have concatenated reads in your `inspect` also see [concat reads](#how-to-handle-concat-reads)

---



## Custom experiment with mixed sequences
Often we combine multiple samples together which share the same flanks, for example all rapid barcoding. 
But we could also combine completely different experiments, with different primers for example.
How do we demux that?

First go through "[custom experiment](#custom-experiment)" setup, then continue reading here how we adjust the query files.

That's quite simple. We create fasta files for all possible groups. 
Lets say we have:

**group1**
- group1_left.fasta: adapter1-barcode-primer1
- group1_right.fasta: primer1-barcode-adapter1

**group2**:
- group2_left.fasta: adapter2-barcode-primer2
- group2_right.fasta: primer2-barcode-adapter2

Then we <u>make sure our labels in the fasta have some unique substring</u>, like `group1` and `group2`:
for example:

`group1_left.fasta`:
```
>group1_bar1
AACGACA...
```
`group2_left.fasta`:
```
>group2_bar1
AGGGCAC...
```

We then run annotate like:

```
barbell annotate -q group1_left.fasta,group1_right.fasta,group2_left.fasta,group2_right.fasta -b Ftag,Rtag,Ftag,Rtag -i reads.fastq -o anno.tsv -t 10
```

**Note**: While you could run annotate separately for each query file, it’s generally better to combine all into a single annotate run (as mentioned here). This way, they are competing with one another.


Then in the filtering step we can create filtered files for each group:
(just check `inspect` first to see your patterns)


`group1_filters.txt`:
```
Ftag[fw, ~group1, @left(0..250), >>]__Rtag[<<, rc, ~group1, @right(0..250)]
```

`group2_filters.txt`:
```
Ftag[fw, ~group2, @left(0..250), >>]__Rtag[<<, rc, ~group2, @right(0..250)]
```

And pull them out!
```
barbell filter -i anno.tsv -f group1_filters.txt -o group1_reads.tsv
barbell filter -i anno.tsv -f group2_filters.txt -o group2_reads.tsv
```
and then we can just trim them to separate files:

```
barbell trim -i group1_reads.tsv -r reads.fastq -o group1_trimmed
barbell trim -i group2_reads.tsv -r reads.fastq -o group2_trimmed
```



---

## Output columns (annotate & filter)

- `read_id`: read identifier as in the provided FASTQ
- `read_len`: length of read in bp
- `rel_dist_to_end`: relative distance to the read end. `>= 0` means X bases from the **left** end; a negative value means X bases from the **right** end (e.g. `-10` is 10 bp from the right).
- `read_start_bar`: start position of the barcode match in the read
- `read_end_bar`: end position of the barcode match in the read
- `read_start_flank`: start position of the flank match in the read
- `read_end_flank`: end position of the flank match in the read
- `bar_start`, `bar_end`: coordinates where the barcode was aligned (previous partial-barcode matches are disabled; full barcode length is expected)
- `match_type`:
  - `Ftag`: forward barcode + flank matched
  - `Fflank`: forward flank matched (barcode undetectable)
  - `Rtag`: rear barcode + flank matched
  - `Rflank`: rear flank matched (barcode undetectable)

  *Note*: for some kits (e.g., rapid barcoding) there's effectively a single barcode; we still call this `Ftag`. For native dual barcoding both ends may use the same barcode set; orientation (forward/reverse complement) is available in the `strand` column.

- `flank_cost`: number of edits in the flank sequence (excluding the barcode)
- `barcode_cost`: number of edits in the barcode region
- `label`: label from your FASTA (or preset kit) β€” e.g., `BC14`, `RBK60`, etc.
- `strand`: orientation of the match (`fw` or `rc`)
- `cuts`: empty after `annotate`; populated after `filter` to inform `trim` where to cut

---



## Patterns

Patterns describe how elements are combined in a read. Single elements have the form:

```
<tag>[<orientation>, <label>, <relative position>, <optional cut specifier>]
```

Multiple elements are combined with `__` (double underscore):

```
<tag>[...]__<tag>[...]
```

Fields:
- `tag`: `Ftag`, `Fflank`, `Rtag`, `Rflank`, whether a sequence is `Ftag` or `Rtag` is user specified (or within the kit), see [custom experiment](#custom-experiment), the `Fflank,Rflank` are the "incomplete" forms where the barcode was undetectable.
- `orientation`: `fw` or `rc`.
- `label`: exact label (e.g. `NB01`), `*` for any label, or `~substring` to match headers containing `substring` (for an example see [custom exp. mixing](#custom-experiment-with-mixed-sequences)).
- `relative position`: e.g. `@left(0..250)`: to left side of read, `@right(0..250)`: to right side of read, `@prev_left(0..250)`: relative to the <u>previous</u> element.
- `cut specifier` (optional): `>>` (cut after this element) or `<<` (cut before this element).

Examples:

```
Ftag[fw, *, @left(0..250), >>]
Ftag[fw, *, @left(0..250), >>]__Ftag[<<, rc, *, @right(0..250)]
Ftag[fw, *, @left(0..100), >>]__Rtag[<<, fw, *, @prev_left(1500..1700)]
```
(for examples of concat reads see [concat reads](#how-to-handle-concat-reads) section)

These let you express typical cases such as single-barcode-left, left-and-right barcodes, or expected amplicon sizes via `@prev_left`.

---


### How to handle concat reads 
It is possible that you have concat reads like:

```
Ftag[fw, *, @left(0..250), >>]__Ftag[<<, rc, *, @pev_left(1500..1750)]__Ftag[fw, *, @prev_left(0..250), >>]__Ftag[<<, rc, *, @right(0..250)]
```
We always read the pattern from <u> left to right </u>, then we can deduce the read matches looked like this:
```
[Ftag,fw]---------[Ftag,rc][Ftag, fw]---------[Ftag, rc]
```
If we want to cut more than once we have to use **cut group identifiers**, we do this by placing a number <u> after </u> the `<<` or `>>`:

```
Ftag[fw, *, @left(0..250), >>1]__Ftag[<<1, rc, *, @pev_left(1500..1750)]__Ftag[fw, *, @prev_left(0..250), >>2]__Ftag[<<2, rc, *, @right(0..250)]
```
Note the `>>1, <<1, >>2, <<2`, now barbell knows exactly where you want the reads to be cut.

If the labels were: `NB01---NB01-NB02----NB02`, barbell will write the first read to the `NB01` folder and the second to `NB02`. 

#### Getting *all* concats

If concats are a big part of your read set (ideally they are not) their patterns will show up high in `inspect` anyway, and copying just those will 
probably be enough. If you really want *all* the concat reads out you could use the `-o` in `inspect` to write the patterns
per read. Then, get all unique patterns from there and use a regex or a python script to insert the `>>1` and `<<1`, you can just dump all of them 
to `filters.txt` and use that in `barbell filter`. 


---

## Paper evals
Since this involves substantial amount of extra code we moved these to the [paper-evals](https://github.com/rickbeeloo/barbell-evals) repo. All information on how to reproduce results and set up environments to do so can be found there. 

---

## Notes & tips

- Keep an eye on `Fflank` matches β€” these are often lower-confidence and may indicate reads with poor barcode sequence quality.
- Start with conservative filters to see how many reads match expected patterns, then relax thresholds if necessary. 
- When experimenting with custom FASTAs, keep queries clean (shared flanks, differing barcode region only).

---