#[non_exhaustive]pub struct InputBam {Show 15 fields
pub bam_path: PathOrURLOrStdin,
pub min_seq_len: u64,
pub min_align_len: Option<i64>,
pub read_id: Option<String>,
pub read_id_list: Option<String>,
pub read_id_set: Option<HashSet<String>>,
pub threads: NonZeroU32,
pub include_zero_len: bool,
pub read_filter: Option<ReadStates>,
pub sample_fraction: F32Bw0and1,
pub mapq_filter: u8,
pub exclude_mapq_unavail: bool,
pub region: Option<GenomicRegion>,
pub region_bed3: Option<Bed3<i32, u64>>,
pub full_region: bool,
}Expand description
Options to parse the input bam file and the filters that should be applied to the bam file.
This struct is parsed to create command line arguments and then passed to many functions.
We have copied and edited a similar struct from the fibertools-rs repository.
You can build this through InputBamBuilder.
In CLI mode, clap populates this struct.
This and the InputMods struct are used to set almost all input options
to many of our functions that process BAM/modBAM files.
§Examples
We first begin with an example to build the struct.
The next example shows how this struct and InputMods can be used
to construct inputs to one of our BAM processing functions.
Sample way to build the struct. Some of the parameters are optional
and can be left unset which would give them default values.
We do not check if the specified bam path or URL exists as there are
use cases where files are generated before the InputBam object is used.
Not all options are listed here; for a full list, please see all the
methods of the builder.
use nanalogue_core::{Error, F32Bw0and1, InputBamBuilder, PathOrURLOrStdin};
let bam = InputBamBuilder::default()
.bam_path(PathOrURLOrStdin::Path("/some/path/to/bam.bam".into()))
.min_seq_len(30000u64)
.min_align_len(20000i64)
.read_id("some-id")
.read_filter("primary_forward,secondary_forward".into())
.sample_fraction(F32Bw0and1::new(1.0).expect("no error"))
.mapq_filter(20)
.exclude_mapq_unavail(true)
.region("chr4:1000-2000".into())
.full_region(true)
.build()?;This struct and the InputMods struct allow us to set input options
for BAM/modBAM calculations. An example is shown below where the crate::read_info::run
command is called to process data from a BAM file with some input options.
use nanalogue_core::{BamRcRecords, BamPreFilt, Error, InputBamBuilder, InputModsBuilder,
OptionalTag, PathOrURLOrStdin, ThresholdState, nanalogue_bam_reader, read_info};
let mut bam = InputBamBuilder::default()
.bam_path(PathOrURLOrStdin::Path("./examples/example_1.bam".into()))
.region("dummyI".into())
.build()?;
let mut mods = InputModsBuilder::<OptionalTag>::default()
.mod_prob_filter(ThresholdState::GtEq(0))
.build()?;
let mut buffer = Vec::new();
let mut reader = nanalogue_bam_reader(&bam.bam_path.to_string())?;
let bam_rc_records = BamRcRecords::new(&mut reader, &mut bam, &mut mods)?;
read_info::run(
&mut buffer,
bam_rc_records.rc_records
.filter(|r| r.as_ref().map_or(true, |v| v.pre_filt(&bam))),
mods,
None,
)?;
assert!(str::from_utf8(buffer.as_slice())?
.contains("5d10eb9a-aae1-4db8-8ec6-7ebb34d32575"));§Examples resulting in errors
Full region without actually setting a region
use nanalogue_core::{Error, InputBamBuilder, PathOrURLOrStdin};
let bam = InputBamBuilder::default()
.bam_path(PathOrURLOrStdin::Path("/some/path/to/bam.bam".into()))
.read_id("some-id")
.full_region(true)
.build()?;Setting both region and region_bed3. region can be converted to
region_bed3 using GenomicRegion::try_to_bed3 and a BAM header.
use bedrs::prelude::Bed3;
use nanalogue_core::{Error, InputBamBuilder, PathOrURLOrStdin};
let bam = InputBamBuilder::default()
.bam_path(PathOrURLOrStdin::Path("/some/path/to/bam.bam".into()))
.read_id("some-id")
.region("chr4:1000-2000".into())
.region_bed3(Bed3::<i32,u64>::new(3, 1000, 2000))
.build()?;Setting more than one of read_id, read_id_list and read_id_set.
read_idmeans filter to retain only this read.read_id_listis a path to a file with a list of read ids.read_id_setis a set of read ids supplied directly.
use bedrs::prelude::Bed3;
use nanalogue_core::{Error, InputBamBuilder, PathOrURLOrStdin};
use std::collections::HashSet;
let _ = InputBamBuilder::default()
.bam_path(PathOrURLOrStdin::Path("/some/path/to/bam.bam".into()))
.read_id("some-id")
.read_id_list("/some/file.txt")
.build().unwrap_err();
let mut read_id_set = HashSet::<String>::new();
read_id_set.insert("some-read-a".to_owned());
read_id_set.insert("some-read-b".to_owned());
let _ = InputBamBuilder::default()
.bam_path(PathOrURLOrStdin::Path("/some/path/to/bam.bam".into()))
.read_id_list("/some/file.txt")
.read_id_set(read_id_set.clone())
.build().unwrap_err();
let _ = InputBamBuilder::default()
.bam_path(PathOrURLOrStdin::Path("/some/path/to/bam.bam".into()))
.read_id("some-id")
.read_id_set(read_id_set)
.build().unwrap_err();Fields (Non-exhaustive)§
This struct is marked as non-exhaustive
Struct { .. } syntax; cannot be matched against without a wildcard ..; and struct update syntax will not work.bam_path: PathOrURLOrStdinInput BAM file. Set to a local file path, or set to - to read from stdin,
or set to a URL to read from a remote file. If using stdin and piping in
from samtools view, always include the header with the -h option.
min_seq_len: u64Exclude reads whose sequence length in the BAM file is below this value. Defaults to 0.
min_align_len: Option<i64>Exclude reads whose alignment length in the BAM file is below this value. Defaults to unused.
read_id: Option<String>Only include this read id, defaults to unused i.e. all reads are used. NOTE: if there are multiple alignments corresponding to this read id, all of them are used.
read_id_list: Option<String>Path to file containing list of read IDs (one per line). Lines starting with ‘#’ are treated as comments and ignored. Cannot be used together with –read-id.
read_id_set: Option<HashSet<String>>Internal HashSet of read IDs loaded from read_id_list file.
This is populated automatically and not exposed to users.
threads: NonZeroU32Number of threads used during some aspects of program execution
include_zero_len: boolInclude “zero-length” sequences e.g. sequences with “*” in the sequence field. By default, these sequences are excluded to avoid processing errors. If this flag is set, these reads are included irrespective of any minimum sequence or align length criteria the user may have set. WARNINGS: (1) Some functions of the codebase may break or produce incorrect results if you use this flag. (2) due to a technical reason, we need a DNA sequence in the sequence field and cannot infer sequence length from other sources e.g. CIGAR strings.
read_filter: Option<ReadStates>Only retain reads of this type. Allowed types are primary_forward,
primary_reverse, secondary_forward, secondary_reverse, supplementary_forward,
supplementary_reverse and unmapped. Specify more than one type if needed
separated by commas, in which case reads of any type in list are retained.
Defaults to retain reads of all types.
sample_fraction: F32Bw0and1Subsample BAM to retain only this fraction of total number of reads,
defaults to 1.0. The sampling algorithm considers every read according
to the specified probability, so due to this, you may not always get
the same number of reads e.g. if you set -s 0.05 in a file with 1000 reads,
you will get 50 +- sqrt(50) reads.
NOTE: a new subsample is drawn every time as the seed is not fixed.
If you want reproducibility, consider piping the output of samtools view -s
to our program.
mapq_filter: u8Exclude reads whose MAPQ (Mapping quality of position) is below this value. Defaults to zero i.e. do not exclude any read.
Exclude sequences with MAPQ unavailable. In the BAM format, a value of 255 in this column means MAPQ is unavailable. These reads are allowed by default, set this flag to exclude.
region: Option<GenomicRegion>Only keep reads passing through this region. If a BAM index is available with a name same as the BAM file but with the .bai suffix, the operation of selecting such reads will be faster. If you are using standard input as your input e.g. you are piping in the output from samtools, then you cannot use an index as a BAM filename is not available.
region_bed3: Option<Bed3<i32, u64>>Only keep read data from this region. This is an internal option not exposed to the user, we will set it based on the other options that the user sets.
full_region: boolOnly keep reads if they pass through the specified region in full.
Related to the input --region; has no effect if that is not set.
Trait Implementations§
Source§impl Args for InputBam
impl Args for InputBam
Source§fn augment_args<'b>(__clap_app: Command) -> Command
fn augment_args<'b>(__clap_app: Command) -> Command
Source§fn augment_args_for_update<'b>(__clap_app: Command) -> Command
fn augment_args_for_update<'b>(__clap_app: Command) -> Command
Command so it can instantiate self via
FromArgMatches::update_from_arg_matches_mut Read moreSource§impl<'de> Deserialize<'de> for InputBam
impl<'de> Deserialize<'de> for InputBam
Source§fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>where
__D: Deserializer<'de>,
Source§impl From<PathOrURLOrStdin> for InputBam
impl From<PathOrURLOrStdin> for InputBam
Source§fn from(val: PathOrURLOrStdin) -> Self
fn from(val: PathOrURLOrStdin) -> Self
Converts a PathOrURLOrStdin into an InputBam with default settings.
This creates an InputBam with the given BAM path and all other fields set to their defaults.
Source§impl FromArgMatches for InputBam
impl FromArgMatches for InputBam
Source§fn from_arg_matches(__clap_arg_matches: &ArgMatches) -> Result<Self, Error>
fn from_arg_matches(__clap_arg_matches: &ArgMatches) -> Result<Self, Error>
Source§fn from_arg_matches_mut(
__clap_arg_matches: &mut ArgMatches,
) -> Result<Self, Error>
fn from_arg_matches_mut( __clap_arg_matches: &mut ArgMatches, ) -> Result<Self, Error>
Source§fn update_from_arg_matches(
&mut self,
__clap_arg_matches: &ArgMatches,
) -> Result<(), Error>
fn update_from_arg_matches( &mut self, __clap_arg_matches: &ArgMatches, ) -> Result<(), Error>
ArgMatches to self.Source§fn update_from_arg_matches_mut(
&mut self,
__clap_arg_matches: &mut ArgMatches,
) -> Result<(), Error>
fn update_from_arg_matches_mut( &mut self, __clap_arg_matches: &mut ArgMatches, ) -> Result<(), Error>
ArgMatches to self.Source§impl InputRegionOptions for InputBam
impl InputRegionOptions for InputBam
Source§fn region_filter_genomic_string(&self) -> Option<GenomicRegion>
fn region_filter_genomic_string(&self) -> Option<GenomicRegion>
Source§fn is_full_overlap(&self) -> bool
fn is_full_overlap(&self) -> bool
Source§fn convert_region_to_bed3(&mut self, header: HeaderView) -> Result<(), Error>
fn convert_region_to_bed3(&mut self, header: HeaderView) -> Result<(), Error>
Auto Trait Implementations§
impl Freeze for InputBam
impl RefUnwindSafe for InputBam
impl Send for InputBam
impl Sync for InputBam
impl Unpin for InputBam
impl UnwindSafe for InputBam
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
Source§impl<T> Instrument for T
impl<T> Instrument for T
Source§fn instrument(self, span: Span) -> Instrumented<Self>
fn instrument(self, span: Span) -> Instrumented<Self>
Source§fn in_current_span(self) -> Instrumented<Self>
fn in_current_span(self) -> Instrumented<Self>
Source§impl<T> IntoEither for T
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self> ⓘ
fn into_either(self, into_left: bool) -> Either<Self, Self> ⓘ
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self> ⓘ
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self> ⓘ
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§impl<T> Key for Twhere
T: Clone,
impl<T> Key for Twhere
T: Clone,
Source§impl<T> Pointable for T
impl<T> Pointable for T
Source§impl<T> PolicyExt for Twhere
T: ?Sized,
impl<T> PolicyExt for Twhere
T: ?Sized,
Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
self is actually part of its subset T (and can be converted to it).Source§fn to_subset_unchecked(&self) -> SS
fn to_subset_unchecked(&self) -> SS
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
self to the equivalent element of its superset.Source§impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
impl<SS, SP> SupersetOf<SS> for SPwhere
SS: SubsetOf<SP>,
Source§fn to_subset(&self) -> Option<SS>
fn to_subset(&self) -> Option<SS>
self from the equivalent element of its
superset. Read moreSource§fn is_in_subset(&self) -> bool
fn is_in_subset(&self) -> bool
self is actually part of its subset T (and can be converted to it).Source§fn to_subset_unchecked(&self) -> SS
fn to_subset_unchecked(&self) -> SS
self.to_subset but without any property checks. Always succeeds.Source§fn from_subset(element: &SS) -> SP
fn from_subset(element: &SS) -> SP
self to the equivalent element of its superset.