Expand description
§Nanalogue Core
§Introduction
Nanalogue = Nucleic Acid Analogue. Nanalogue is a tool to parse or analyse BAM/Mod BAM files with a single-molecule focus.
A common pain point in genomics analyses is that BAM files are information-dense which makes it difficult to gain insight from them. Nanalogue hopes to make it easy to extract and process this information, with a particular focus on single-molecule aspects and DNA/RNA modifications. Despite this focus, some of nanalogue’s commands and functions are quite general and can be applied to almost any BAM file.
We process and calculate data associated with DNA/RNA molecules, their alignments to
reference genomes, modification information on them, and other miscellaneous
information. We can process any type of DNA/RNA modifications occuring in any pattern
(single/multiple mods, spatially-isolated/non-isolated etc.). All we require is that
the data is stored in a BAM file in the mod BAM format (i.e. using MM/ML tags as
laid down in the specifications).
Nanalogue is both an executable that can be run from the command line and a library
whose functionality can be used by others writing rust code. The library’s functions
are presented here. The executable exposes the modules subcommands::*, and a
separate executable nanalogue_sim_bam exposes the simulate_mod_bam functionality
(see below).
For developers: if you are looking to make a custom BAM file containing synthetic, simulated
DNA/RNA modification data to develop/test your tool, you may be interested in nanalogue_sim_bam.
This is an executable that ships with nanalogue that can create a BAM file according to your
specifications. Please run nanalogue_sim_bam --help. If you are a rust developer looking
to use this functionality in your library, please look at the documentation of the module
crate::simulate_mod_bam.
This documentation is supplemented by a companion cookbook.
§Sample code
The InputBam and the InputMods structs allow us to set input options
for BAM/modBAM calculations. An example is shown below where the crate::read_info::run
command is called to process data from a BAM file with some input options.
use nanalogue_core::{BamRcRecords, BamPreFilt, Error, InputBamBuilder, InputModsBuilder,
OptionalTag, PathOrURLOrStdin, ThresholdState, nanalogue_bam_reader, read_info};
let mut bam = InputBamBuilder::default()
.bam_path(PathOrURLOrStdin::Path("./examples/example_1.bam".into()))
.region("dummyI".into())
.build()?;
let mut mods = InputModsBuilder::<OptionalTag>::default()
.mod_prob_filter(ThresholdState::GtEq(0))
.build()?;
let mut buffer = Vec::new();
let mut reader = nanalogue_bam_reader(&bam.bam_path.to_string())?;
let bam_rc_records = BamRcRecords::new(&mut reader, &mut bam, &mut mods)?;
read_info::run(
&mut buffer,
bam_rc_records.rc_records
.filter(|r| r.as_ref().map_or(true, |v| v.pre_filt(&bam))),
mods,
None,
)?;
assert!(str::from_utf8(buffer.as_slice())?
.contains("5d10eb9a-aae1-4db8-8ec6-7ebb34d32575"));If you want to write custom functionality yourself, please familiarize yourself with the
crate::read_utils::CurrRead struct. This is the centerpiece of our library, which receives
BAM record data, processes the DNA/RNA modification information amongst other pieces of information,
and exposes them for downstream usage.
Re-exports§
pub use cli::InputBam;pub use cli::InputBamBuilder;pub use cli::InputModOptions;pub use cli::InputMods;pub use cli::InputModsBuilder;pub use cli::InputRegionOptions;pub use cli::InputWindowing;pub use cli::InputWindowingBuilder;pub use cli::OptionalTag;pub use cli::RequiredTag;pub use cli::SeqDisplayOptions;pub use error::Error;pub use file_utils::nanalogue_bam_reader;pub use file_utils::nanalogue_bam_reader_from_stdin;pub use file_utils::nanalogue_bam_reader_from_url;pub use file_utils::nanalogue_indexed_bam_reader;pub use file_utils::nanalogue_indexed_bam_reader_from_url;pub use file_utils::write_bam_denovo;pub use file_utils::write_fasta;pub use read_utils::AlignmentInfoBuilder;pub use read_utils::CurrRead;pub use read_utils::CurrReadBuilder;pub use read_utils::ModTableEntryBuilder;pub use read_utils::curr_reads_to_dataframe;pub use simulate_mod_bam::SimulationConfig;pub use subcommands::find_modified_reads;pub use subcommands::peek;pub use subcommands::read_info;pub use subcommands::read_stats;pub use subcommands::reads_table;pub use subcommands::window_reads;pub use utils::AllowedAGCTN;pub use utils::Contains;pub use utils::DNARestrictive;pub use utils::F32AbsValAtMost1;pub use utils::F32Bw0and1;pub use utils::FilterByRefCoords;pub use utils::GenomicRegion;pub use utils::GetDNARestrictive;pub use utils::Intersects;pub use utils::ModChar;pub use utils::OrdPair;pub use utils::PathOrURLOrStdin;pub use utils::ReadState;pub use utils::ReadStates;pub use utils::RestrictModCalledStrand;pub use utils::SeqCoordCalls;pub use utils::ThresholdState;
Modules§
- analysis
- Analysis functions for modification data processing.
- cli
- Command line interface (CLI) options, including input processing options
- commands
- Commands run in
main.rs - error
- Error
- file_
utils - Utility functions for file I/O operations with BAM and FASTA files.
- read_
utils - Implements
CurrReadStruct for processing information and the mod information in the BAM file using a parser implemented in another module. - simulate_
mod_ bam - Write Simulated Mod BAM
- subcommands
- Mod
- utils
- Utils module providing shared datatypes for nanalogue Includes genomic coordinates, constrained numerics, and BAM-related types
Structs§
- BamRc
Records - A global struct which contains BAM records for further usage.
NOTE: we don’t derive many traits here as the
RcRecordsobject does not have many traits.
Traits§
- BamPre
Filt - Trait that performs filtration
Functions§
- init_
ssl_ certificates - Initialize SSL certificate paths for HTTPS support.
- nanalogue_
mm_ ml_ parser - Extracts mod information from BAM record to the
fibertools-rsBaseModsStruct.