granges 0.2.2

A Rust library and command line tool for genomic range operations.
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
// Copyright (2024) Vince Buffalo
#![crate_name = "granges"]
//#![doc(html_root_url = "https://docs.rs/granges/")]

//! # GRanges: Generic Genomic Range and Data Containers
//!
//! GRanges is a Rust library for working with genomic ranges and their associated data. GRanges
//! also implements a [bedtools](https://bedtools.readthedocs.io/en/latest/)-like command line tool
//! as a test and benchmark of the library. In benchmarks, this command line tool is reliably
//! 30-40% faster than bedtools. Internally, GRanges uses the very fast
//! [coitrees](https://github.com/dcjones/coitrees) library written by Daniel C. Jones for overlap
//! operations.
//!
//! The GRanges library aims to simplify the creation of powerful, performant genomics tools in
//! Rust. GRanges is inspired by the design and ease of use of Bioconductor's
//! [GenomicRanges](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003118)
//! and [plyranges](https://www.bioconductor.org/packages/release/bioc/html/plyranges.html)
//! packages, but tools using GRanges are optimized at compile time and thus are typically much
//! faster.
//!
//! ## Table of Contents
//!
//! - [GRanges: Generic Genomic Range and Data Containers](#granges-generic-genomic-range-and-data-containers)
//!   - [GRanges Design](#granges-design)
//!   - [The `GRanges` container](#the-granges-container)
//! - [The Overlaps--Map-Combine Pattern](#the-overlapsmapcombine-pattern)
//!   - [Overlap Joins and the Left Grouped Join](#overlap-joins-and-the-left-grouped-join)
//!   - [Map-Combine over Joins](#map-combine-over-joins)
//! - [Example GRanges Workflow](#example-granges-workflow)
//! - [Loading Data into GRanges](#loading-data-into-granges)
//! - [Sequence types](#sequence-types)
//! - [Manipulating GRanges objects](#manipulating-granges-objects)
//! - [Future Development](#future-development)
//!   - [Current Limitations](#current-limitations)
//!   - [Contributing](#contributing)
//! - [Documentation Guide](#documentation-guide)
//!
//!
//! ## GRanges Design
//!
//! Nearly all common data processing and analytics tasks in science can be thought of as taking
//! some raw input data and refining it down some pipeline, until eventually outputting some sort
//! of statistical result. In bioinformatics, genomics data processing workflows are often composed
//! from command line tools and Unix pipes. Analogously, tidyverse unifies data tidying,
//! summarizing, and modeling by passing and manipulating a datafame down a similar pipeline. Unix
//! pipes and tidyverse are powerful because they allow the scientist to build highly specialized
//! analytic tools just by remixing the same core set of data operations.
//!
//! GRanges implements many of the same core data joining and manipulation operations as
//! the [plyranges](https://www.bioconductor.org/packages/release/bioc/html/plyranges.html) library and
//! R's [tidyverse](https://www.tidyverse.org).
//!
//! ## The [`GRanges`] container
//!
//! Central to the GRanges library is the generic [`GRanges<R, T>`] data container. This is generic
//! over the *range container* type `R`, because range data structures have inherent performance
//! tradeoffs for different tasks. For example, when parsing input data into memory, the total size
//! is often not known in advanced. Consequently, the most efficient range container data structure
//! is a dynamically-growing [`Vec<U>`]. However, when overlaps need to be computed between ranges
//! across two containers, the *interval tree* data structures are needed. [`GRanges`] supports
//! conversion between these range container types.
//!
//! [`GRanges<R, T>`] is also generic over its *data container*, `T`. This allows it to hold *any*
//! type of data, in any data structure: a [`Vec<f64>`] of numeric stores, an
//! [`ndarray::Array2`](https://docs.rs/ndarray/latest/ndarray/index.html) matrix, a
//! [polars](https://pola.rs) dataframe, custom `struct`s, etc. Each range in the ranges container
//! indexes an element in the data container. There is also a special [`GRangesEmpty`] type for
//! data-less [`GRanges`] objects.
//!
//! The [`GRanges`] type implements a small but powerful set of methods that can be used to process
//! and analyze genomic data. GRanges relies heavily on *method-chaining* operations that pass
//! genomic range data down a pipeline, similar to tidyverse and Unix pipes.
//!
//! ## The Overlaps--Map-Combine Pattern
//!
//! A very common data analysis strategy is **Split-Apply-Combine** pattern ([Wickham,
//! 2011](https://vita.had.co.nz/papers/plyr.html)). When working with genomic overlaps
//! and their data, GRanges library relies on its own data analysis pattern known
//! as **Overlaps--Map-Combine** pattern, which is better suited for common genomic operations and analyses.
//!
//! ### Overlap Joins and the Left Grouped Join
//!
//! In this pattern, the *left* genomic ranges are "joined" by their *overlaps* with another
//! *right* genomic ranges. In other words, the *right ranges* that overlap a single *left range* can
//! be thought of "joined" to this left range.
//!
//! With genomic data, downstream processing is greatly simplified if the results of this join
//! are *grouped* by what left range they overlap (see illustration below). This is a special
//! kind of join known as a **left grouped join**.
//!
//!
//! ```text
//!     Left range:   ███████████████████████████████████
//!                                                      
//!   Right ranges:     ██    ██████        ██████    ██████
//! ```
//!
//! In GRanges, a left grouped join ([`GRanges::left_overlaps()`]) returns a [`GRanges`] object
//! containing the exact same ranges as the input *left ranges*. In other words, *left grouped
//! joins* are *endomorphic in the range container*. Computationally, this is also extremely
//! efficient, because the left ranges can be passed through to the results directly (a
//! zero-overheard operation).
//!
//!
//!
//! The [`GRanges`] object returned by [`GRanges::left_overlaps()`] contains a [`JoinData`] (or related)
//! type as its data container. The [`JoinData`] contains information about each left range's overlaps (e.g. number of
//! overlapping bases, fraction of overlaps, etc) in a [`Vec<LeftGroupedJoin>`]. This information about
//! overlapping ranges may then be used by downstream calculations.
//! Additionally [`JoinData`] stores the left ranges' data container, and has a *reference* to
//! the right ranges' data container. In both cases, the data is *unchanged* by the join.
//! Downstream processing can also easily access the left ranges's data and all overlapping right ranges'
//! data for calculations.
//!
//! ### Map-Combine over Joins
//!
//! After a left grouped join, each left range can have zero or more overlapping right ranges.
//! A *map-combine* operation (optionally) maps a function over all right ranges' data and
//! overlaps information, and then combines that into a single data entry per left range. These
//! combined data entries make up a new [`GRanges<R, Vec<V>>`] data container, returned by [`GRanges::map_joins()`].
//!
//! Note that like [`GRanges::left_overlaps()`], the [`GRanges::map_joins()`] is endomorphic
//! over its range container. This means it can be passed through without modification, which
//! is computationally efficient. This results from a Map-Combine operation then can be overlap joined with
//! other genomic ranges, filtered, have its data arbitrarily manipulated by [`GRanges::map_data()`], etc.
//!
//! ## Example GRanges Workflow
//!
//! To illustrate these ideas, lets look at an example of how we might use GRanges to do a
//! a commonly-encountered genomic calculation. Suppose you had a set of exon genomic ranges (the *left
//! ranges*) and a multicolumn TSV (*right ranges*) of various genomic features (e.g. assay
//! results, recombination hotspots, variants, etc). Imagine you wanted to form some statistic per
//! each exon, based on data in the overlapping right ranges. This operation is very
//! similar to `bedtools map`, except that suppose the statistic you want to calculate is not one of the
//! few offered by bedtools. GRanges makes it simple to compose your own, very fast genomic tools to answer these
//! questions.
//!
//! To see how GRanges would make it simple to compose a specialized fast tool to solve this
//! problem, let's fist see how few lines of code it would take to implement `bedtools map`,
//! since our problem is akin to using `bedtools map` with our own function to calculate our statistic.
//! Let's start by getting the mean score by seeing how to get the mean BED5 score across all
//! overlapping right ranges for each left range (i.e. `bedtools map -a <left> -b <right> -c 5 mean`).
//! Here is the Rust code to do this using GRanges:
//!
//! ```
//! # use granges::prelude::*;
//! # fn try_main() -> Result<(), granges::error::GRangesError> {
//!
//! // Mock sequence lengths (e.g. "genome" file)
//! let genome = seqlens!("chr1" => 100, "chr2" => 100);
//!
//! // Create parsing iterators to the left and right BED files.
//! let left_iter = Bed3Iterator::new("tests_data/bedtools/map_a.txt")?;
//! let right_iter = Bed5Iterator::new("tests_data/bedtools/map_b.txt")?;
//!
//! // Filter out any ranges from chromosomes not in our genome file.
//! let left_gr = GRangesEmpty::from_iter(left_iter, &genome)?;
//! let right_gr = GRanges::from_iter(right_iter, &genome)?;
//!
//! // Create the "right" GRanges object, convert the ranges to an
//! // interval trees, and tidy it by selecting out a f64 score.
//! let right_gr = right_gr
//!         // Convert to interval trees.
//!         .into_coitrees()?
//!         // Extract out just the score from the additional BED5 columns.
//!         .map_data(|bed5_cols| {
//!             bed5_cols.score
//!         })?;
//!
//! // Find the overlaps by doing a *left grouped join*.
//! let left_join_gr = left_gr.left_overlaps(&right_gr)?;
//!
//! // Process all the overlaps.
//! let results_gr = left_join_gr.map_joins(|join_data| {
//!     // Get the "right data" -- the BED5 scores.
//!     let overlap_scores: Vec<f64> = join_data.right_data.into_iter()
//!            // filter out missing values ('.' in BED)
//!            .filter_map(|x| x).collect();
//!
//!     // calculate the mean
//!     let score_sum: f64 = overlap_scores.iter().sum();
//!     score_sum / (overlap_scores.len() as f64)
//! })?;
//!
//! // Write to a TSV file, using the BED TSV format standards
//! // for missing values, etc.
//! let tempfile = tempfile::NamedTempFile::new().unwrap();
//! results_gr.write_to_tsv(Some(tempfile.path()), &BED_TSV)?;
//! # Ok(())
//! # }
//! # fn main() { try_main().unwrap(); }
//! ```
//!
//! In relatively few lines of code, we can implement core functionality of bedtools.
//! As mentioned earlier, GRanges code typically runs 30-40% faster than bedtools.
//!
//! ## Loading Data into GRanges
//!
//! Nearly all genomic range data enters GRanges through **parsing iterators** (see the [parsers]
//! module). Parsing iterators read input data from some bioinformatics file format (e.g. `.bed`,
//! `.gtf.gz`, etc) and it then goes down one of two paths:
//!
//!  1. Streaming mode: the entire data set is never read into memory all at once, but rather
//!     is processed entry by entry.
//!
//!  2. In-memory mode: the entire data set *is* read into memory all at once, usually as
//!     a [`GRanges<R, T>`] object. The [`GRanges<R, T>`] object is composed of two parts:
//!     a *ranges container* ([ranges]) and a *data container* ([data]).
//!     [`GRanges`] objects then provide high-level in-memory methods for working with this
//!     genomic data.
//!
//! In the example above, we loaded the items in the parsing iterators directly into
//! [`GRanges`] objects, since we had to do overlap operations (in the future, GRanges will
//! support streaming overlap joins for position-sorted input data).
//!
//! Both processing modes eventually output something, e.g. a TSV of some statistic calculated
//! on the genomic data, the output of 10 million block bootstraps, or the coefficients of
//! linear regressions run every kilobase on sequence data.
//!
//! The GRanges workflow idea is illustrated in this (super cool) ASCII diagram:
//!
//!
//! ```text
//!   +-------+           +-------------------+           +----------------------+          
//!   | Input |  ----->   |  Parsing Iterator |  ----->   |    GRanges object    |
//!   +------ +           |  (streaming mode) |           |   (in-memory mode)   |          
//!                       +-------------------+           +----------------------+          
//!                                 |                                |            
//!                                 |                                |            
//!                                 v                                v            
//!                        +-----------------+               +-----------------+       
//!                        | Data Processing |               | Data Processing |       
//!                        +-----------------+               +-----------------+       
//!                                 |                                |            
//!                                 v                                v            
//!                             +--------+                       +--------+       
//!                             | Output |                       | Output |       
//!                             +--------+                       +--------+       
//!
//! ```
//!
//! ## Sequence types
//!
//! Genomic processing also involves working with *sequence* data types, which abstractly are
//! per-basepair data that cover the entire genome (and in the case of the reference sequence,
//! *define* the genome coordinates). Nucleotide sequences are just a special case of this,
//! where the per-basepair data are nucleotides (which under the hood are just an unsigned
//! byte-length integer). There are many other per-basepair data types, e.g. conservation scores.
//!
//! Sequence data types are like a [`GRanges`] where the range container
//! is implied as per-basepair, and the data container contains the per-basepair data.
//! GRanges allows arbitrary data to be stored and manipulated (see the [sequences]
//! module) in sequence types: numeric data, nucleotides, and arrays per basepair.
//! Since loading an entire genome's per-basepair data into memory would be inefficient,
//! sequence types implement lazy-loading.
//!
//! ## Manipulating GRanges objects
//!
//! 1. *Creation*: [`GRanges::new_vec()`], [`GRanges::from_iter()`], [`GRangesEmpty::from_windows()`].
//!
//! 2. *Range-modifying* functions: [`GRanges::into_coitrees()`], [`GRanges::adjust_ranges()`], [`GRanges::sort()`],
//!        [`GRanges::flanking_ranges()`], [`GRanges::filter_overlaps()`].
//!
//! 3. *Data-modifying* functions: [`GRanges::left_overlaps()`], [`GRanges::map_joins()`], [`GRanges::map_data()`].
//!
//! 4. *Range and Data modifying* functions: [`GRanges::push_range()`].
//!
//! Note that not all range and data container types support these operations, e.g.
//! [`GRanges::push_range()`] is not available for ranges stored in an interval tree.
//! However, by design, this is known *at compile time*, due Rust's typing system. Thus
//! there is lower risk of runtime panics, since more potential issues are caught at
//! compile time.
//!
//! ## Future Development
//!
//! When I was first learning Rust as a developer, one thing that struck me about the language
//! is that it feels *minimal but never constraining*. This is quite a contrast compared to
//! languages like C++. As Rust developers know, the minimalness was by design; the
//! cultural norm of slow growth produced a better final product in the end.
//!
//! The GRanges library attempts to follow this design too. Currently, the alpha release is
//! about implementating a subset of essential core functionality to benchmark against
//! alternative software. Since the design and ergonomics of the API are under active development,
//! please, *please*, create a [GitHub issue](http://github.com/vsbuffalo/granges/issues) if:
//!
//!  1. You want a particular feature.
//!
//!  2. You don't know how you'd implement a feature yourself with GRanges.
//!
//!  3. The ["ergonomics"](https://blog.rust-lang.org/2017/03/02/lang-ergonomics.html) don't
//!     feel right.
//!
//! ### Current Limitations
//!  
//! The biggest current limitations are:
//!  
//!  1. *No native support for GTF/GFF/VCF*. It is relatively easy to write custom parsers
//!      for these types, or use the [noodles](https://crates.io/crates/noodles) libary's
//!      parsers.
//!
//!  2. *No out-of-memory overlap operations*. This is relatively easy to add, and will be
//!      added in future.
//!  
//!  3. *No BAM (BCF, or other binary) file support*. Again, these formats can be easily loaded
//!      by wrapping the parsers in the [noodles](https://crates.io/crates/noodles) libary.
//!      Future version of GRanges will likely have a `--features=nooodles` option,
//!      which will provide convenience for this.
//!
//!  4. *Parallelized operations are not in the `main` branch yet*. I have written parallel
//!      iterators that can be used for operations, but these need to be ported over to the
//!      latest design.
//!
//!  5. Lack of TSV column type inference methods. This would make loading and working with arbitrary
//!     TSV files with heterogeneous column types easier.
//!
//!  ⚠️:Note that the GRanges API, as well as type names, may change.
//!
//! ### Contributing
//!
//! If you are interested in contributing to GRanges, please contact Vince Buffalo (@vsbuffalo on
//! Twitter and GitHub), starting [a new
//! discussion](https://github.com/vsbuffalo/granges/discussions), or [creating a GitHub
//! issue](https://github.com/vsbuffalo/granges/discussions).
//!
//! ## Documentation Guide
//!
//!  - ⚙️ : technical details that might only be useful to developers handling tricky cases.
//!  - ⚠️: indicates a method or type that is unstable (it may change in future implementations),
//!     or is "sharp" (developer error could cause a panic).
//!  - 🚀: optimization tip.
//!  - 🐌: indicates slow code in need of future optimization.
//!
//! [`JoinData`]: crate::join::JoinData
//! [`GRanges`]: crate::granges::GRanges
//! [`GRangesEmpty`]: crate::granges::GRangesEmpty
//! [`GRanges::filter_overlaps()`]: granges::GRanges::filter_overlaps
//! [`GRanges<R, T>`]: crate::granges::GRanges
//! [parsers]: crate::io::parsers
//! [sequences]: crate::sequences
//! [`ndarray::Array2`]: ndarray::Array2
//! [`GRanges::left_overlaps()`]: crate::traits::LeftOverlaps::left_overlaps
//! [`GRanges<R, Vec<V>>`]: crate::granges::GRanges
//! [`GRanges::map_joins()`]: crate::granges::GRanges::map_joins
//! [`GRanges::map_data()`]: crate::granges::GRanges::map_data
//! [`GRanges::new_vec()`]: crate::granges::GRanges::new_vec
//! [`GRanges::into_coitrees()`]: crate::granges::GRanges::into_coitrees
//! [`GRanges::adjust_ranges()`]: crate::granges::GRanges::adjust_ranges
//! [`GRanges::push_range()`]: crate::granges::GRanges::push_range
//! [`GRanges::flanking_ranges()`]: crate::granges::GRanges::flanking_ranges
//! [`GRanges::sort()`]: crate::granges::GRanges::sort
//! [`GRanges::from_iter()`]: crate::granges::GRanges::from_iter
//! [`GRangesEmpty::from_windows()`]: crate::granges::GRangesEmpty::from_windows

pub use indexmap;

pub mod data;
pub mod error;
pub mod granges;
pub mod io;
pub mod iterators;
pub mod join;
pub mod merging_iterators;
pub mod ranges;
pub mod sequences;
pub mod test_utilities;
pub mod traits;
pub mod unique_id;

// bringing these CLI modules into lib.rs rather than main/ allows for
// use in integration tests and other Rust-side command line work
pub mod commands;
pub mod reporting;

pub use crate::error::GRangesError;

/// The main position type in GRanges.
///
/// This type is currently an unwrapped [`u32`]. This should handle
/// chromosome lengths for nearly all species. In fact, the only exception
/// known so far is lungfush (*Neoceratodus forsteri*), which has a chromosomes
/// that reaches 5.4Gb (<https://www.nature.com/articles/s41586-021-03198-8l>).
/// The [`u32::MAX`] is 4,294,967,295, i.e. 4.29 Gigabases, which means [`u32`] is
/// just barely suitable for even the largest known chromosome. There is a
/// performance and memory-efficiency tradeoff when using [`u64`] over [`u32`],
/// so [`u32`] is used by default since it handles nearly all cases.
///
/// # Feature support for large chromosomes
///
/// If you are working with data from a species with unusually large chromosomes,
/// you can compile GRanges using the `--features=big-position` option, which will set
/// the [`Position`] and [`PositionOffset`] to [`u64`] and [`i64`], respectively.
///
/// [`u32::MAX`]: std::u32::MAX
#[cfg(not(feature = "big-position"))]
pub type Position = u32;
#[cfg(feature = "big-position")]
pub type Position = u64;

/// The main *signed* position type in GRanges, to represent offsets (e.g.
/// for adjust range coordinates, etc).
#[cfg(not(feature = "big-position"))]
pub type PositionOffset = i32;
#[cfg(feature = "big-position")]
pub type PositionOffset = i64;

/// The main exports of the GRanges library.
pub mod prelude {
    pub use crate::error::GRangesError;
    pub use crate::granges::{GRanges, GRangesEmpty};
    pub use crate::io::file::read_seqlens;
    pub use crate::io::tsv::BED_TSV;
    pub use crate::io::{
        Bed3Iterator, Bed4Iterator, Bed5Iterator, BedlikeIterator, GenomicRangesFile,
        GenomicRangesParser, TsvRecordIterator,
    };
    pub use crate::join::{
        CombinedJoinData, CombinedJoinDataBothEmpty, CombinedJoinDataLeftEmpty,
        CombinedJoinDataRightEmpty,
    };

    pub use crate::data::DatumType;
    pub use crate::ranges::{
        coitrees::{COITreesEmpty, COITreesIndexed},
        try_range,
        vec::{VecRangesEmpty, VecRangesIndexed},
    };

    pub use crate::merging_iterators::{
        ConditionalMergingIterator, ConditionalMergingResultIterator, MergingEmptyIterator,
        MergingEmptyResultIterator, MergingResultIterator,
    };
    pub use crate::traits::{
        AsGRangesRef, GeneralRangeRecordIterator, GenericRange, GenericRangeOperations,
        GenomicRangeRecordUnwrappable, GenomicRangesTsvSerialize, IndexedDataContainer,
        IntoDatumType, IntoIterableRangesContainer, IterableRangeContainer, JoinDataOperations,
        LeftOverlaps,
    };

    pub use crate::seqlens;
}

pub const INTERNAL_ERROR_MESSAGE: &str = r#"
An internal error has occurred. Please file a GitHub issue at 
https://github.com/vsbuffalo/granges/issues
"#;

/// Create an [`IndexMap`] of sequence names and their lengths.
///
/// [`IndexMap`]: indexmap::IndexMap
#[macro_export]
macro_rules! seqlens {
    ($($key:expr => $value:expr),* $(,)?) => {
        $crate::indexmap::indexmap!($($key.to_string() => $value),*)
    };
}

/// Create a new [`GRanges<R, T>`] with sequence length information (used primarily for small examples)
///
/// [`GRanges<R, T>`]: crate::granges::GRanges
///
#[macro_export]
macro_rules! create_granges_with_seqlens {
    ($range_type:ty, $data_type:ty, { $($chr:expr => [$(($start:expr, $end:expr, $data:expr)),*]),* }, seqlens: { $($chr_len:expr => $len:expr),* }) => {
        {
            let mut seqlens = ::indexmap::IndexMap::new();
            $(seqlens.insert($chr_len.to_string(), $len);)*

            let mut gr: GRanges<$range_type, $data_type> = GRanges::new_vec(&seqlens);

            $(
                $(
                    gr.push_range(&$chr.to_string(), $start, $end, $data).expect("Failed to push range");
                )*
            )*

            gr
        }
    };
}

/// a version of [assert_eq!] that is more user-friendly.
#[macro_export]
macro_rules! ensure_eq {
    ($left:expr, $right:expr $(,)?) => ({
        match (&$left, &$right) {
            (left_val, right_val) => {
                if !(*left_val == *right_val) {
                    panic!("{}\nExpected `{}` but found `{}`.", $crate::INTERNAL_ERROR_MESSAGE, stringify!($left), stringify!($right));

                }
            }
        }
    });
    ($left:expr, $right:expr, $($arg:tt)+) => ({
        match (&$left, &$right) {
            (left_val, right_val) => {
                if !(*left_val == *right_val) {
                    panic!("{}\n{}\nExpected `{}` but found `{}`.", $crate::INTERNAL_ERROR_MESSAGE, format_args!($($arg)+), stringify!($left), stringify!($right));
                }
            }
        }
    });
}

#[cfg(test)]
mod tests {
    #[test]
    #[should_panic]
    fn test_ensure_eq() {
        ensure_eq!(1, 2);
    }

    #[test]
    #[should_panic]
    fn test_ensure_eq_msg() {
        ensure_eq!(1, 2, "Some description of the problem.");
    }

    #[test]
    fn test_ensure_eq_pass() {
        ensure_eq!(1, 1);
    }
}