Crate nanalogue_core

Crate nanalogue_core 

Source
Expand description

§Nanalogue Core

§Introduction

Nanalogue = Nucleic Acid Analogue. Nanalogue is a tool to parse or analyse BAM/Mod BAM files with a single-molecule focus.

Cargo Build & Test Code test coverage > 92% crates.io License: MIT

A common pain point in genomics analyses is that BAM files are information-dense which makes it difficult to gain insight from them. Nanalogue hopes to make it easy to extract and process this information, with a particular focus on single-molecule aspects and DNA/RNA modifications. Despite this focus, some of nanalogue’s commands and functions are quite general and can be applied to almost any BAM file.

We process and calculate data associated with DNA/RNA molecules, their alignments to reference genomes, modification information on them, and other miscellaneous information. We can process any type of DNA/RNA modifications occuring in any pattern (single/multiple mods, spatially-isolated/non-isolated etc.). All we require is that the data is stored in a BAM file in the mod BAM format (i.e. using MM/ML tags as laid down in the specifications).

Nanalogue is both an executable that can be run from the command line and a library whose functionality can be used by others writing rust code. The library’s functions are presented here. The executable exposes the modules subcommands::*, and a separate executable nanalogue_sim_bam exposes the simulate_mod_bam functionality (see below).

For developers: if you are looking to make a custom BAM file containing synthetic, simulated DNA/RNA modification data to develop/test your tool, you may be interested in nanalogue_sim_bam. This is an executable that ships with nanalogue that can create a BAM file according to your specifications. Please run nanalogue_sim_bam --help. If you are a rust developer looking to use this functionality in your library, please look at the documentation of the module crate::simulate_mod_bam.

This documentation is supplemented by a companion cookbook.

§Sample code

The InputBam and the InputMods structs allow us to set input options for BAM/modBAM calculations. An example is shown below where the crate::read_info::run command is called to process data from a BAM file with some input options.

use nanalogue_core::{BamRcRecords, BamPreFilt, Error, InputBamBuilder, InputModsBuilder,
    OptionalTag, PathOrURLOrStdin, ThresholdState, nanalogue_bam_reader, read_info};

let mut bam = InputBamBuilder::default()
    .bam_path(PathOrURLOrStdin::Path("./examples/example_1.bam".into()))
    .region("dummyI".into())
    .build()?;
let mut mods = InputModsBuilder::<OptionalTag>::default()
    .mod_prob_filter(ThresholdState::GtEq(0))
    .build()?;

let mut buffer = Vec::new();
let mut reader = nanalogue_bam_reader(&bam.bam_path.to_string())?;
let bam_rc_records = BamRcRecords::new(&mut reader, &mut bam, &mut mods)?;
read_info::run(
    &mut buffer,
    bam_rc_records.rc_records
        .filter(|r| r.as_ref().map_or(true, |v| v.pre_filt(&bam))),
    mods,
    None,
)?;
assert!(str::from_utf8(buffer.as_slice())?
    .contains("5d10eb9a-aae1-4db8-8ec6-7ebb34d32575"));

If you want to write custom functionality yourself, please familiarize yourself with the crate::read_utils::CurrRead struct. This is the centerpiece of our library, which receives BAM record data, processes the DNA/RNA modification information amongst other pieces of information, and exposes them for downstream usage.

Re-exports§

pub use cli::InputBam;
pub use cli::InputBamBuilder;
pub use cli::InputModOptions;
pub use cli::InputMods;
pub use cli::InputModsBuilder;
pub use cli::InputRegionOptions;
pub use cli::InputWindowing;
pub use cli::InputWindowingBuilder;
pub use cli::OptionalTag;
pub use cli::RequiredTag;
pub use cli::SeqDisplayOptions;
pub use error::Error;
pub use file_utils::nanalogue_bam_reader;
pub use file_utils::nanalogue_bam_reader_from_stdin;
pub use file_utils::nanalogue_bam_reader_from_url;
pub use file_utils::nanalogue_indexed_bam_reader;
pub use file_utils::nanalogue_indexed_bam_reader_from_url;
pub use file_utils::write_bam_denovo;
pub use file_utils::write_fasta;
pub use read_utils::AlignmentInfoBuilder;
pub use read_utils::CurrRead;
pub use read_utils::CurrReadBuilder;
pub use read_utils::ModTableEntryBuilder;
pub use read_utils::curr_reads_to_dataframe;
pub use simulate_mod_bam::SimulationConfig;
pub use subcommands::find_modified_reads;
pub use subcommands::peek;
pub use subcommands::read_info;
pub use subcommands::read_stats;
pub use subcommands::reads_table;
pub use subcommands::window_reads;
pub use utils::AllowedAGCTN;
pub use utils::Contains;
pub use utils::DNARestrictive;
pub use utils::F32AbsValAtMost1;
pub use utils::F32Bw0and1;
pub use utils::FilterByRefCoords;
pub use utils::GenomicRegion;
pub use utils::GetDNARestrictive;
pub use utils::Intersects;
pub use utils::ModChar;
pub use utils::OrdPair;
pub use utils::PathOrURLOrStdin;
pub use utils::ReadState;
pub use utils::ReadStates;
pub use utils::RestrictModCalledStrand;
pub use utils::SeqCoordCalls;
pub use utils::ThresholdState;

Modules§

analysis
Analysis functions for modification data processing.
cli
Command line interface (CLI) options, including input processing options
commands
Commands run in main.rs
error
Error
file_utils
Utility functions for file I/O operations with BAM and FASTA files.
read_utils
Implements CurrRead Struct for processing information and the mod information in the BAM file using a parser implemented in another module.
simulate_mod_bam
Write Simulated Mod BAM
subcommands
Mod
utils
Utils module providing shared datatypes for nanalogue Includes genomic coordinates, constrained numerics, and BAM-related types

Structs§

BamRcRecords
A global struct which contains BAM records for further usage. NOTE: we don’t derive many traits here as the RcRecords object does not have many traits.

Traits§

BamPreFilt
Trait that performs filtration

Functions§

init_ssl_certificates
Initialize SSL certificate paths for HTTPS support.
nanalogue_mm_ml_parser
Extracts mod information from BAM record to the fibertools-rs BaseMods Struct.