InputBam

Struct InputBam 

Source
#[non_exhaustive]
pub struct InputBam {
Show 15 fields pub bam_path: PathOrURLOrStdin, pub min_seq_len: u64, pub min_align_len: Option<i64>, pub read_id: Option<String>, pub read_id_list: Option<String>, pub read_id_set: Option<HashSet<String>>, pub threads: NonZeroU32, pub include_zero_len: bool, pub read_filter: Option<ReadStates>, pub sample_fraction: F32Bw0and1, pub mapq_filter: u8, pub exclude_mapq_unavail: bool, pub region: Option<GenomicRegion>, pub region_bed3: Option<Bed3<i32, u64>>, pub full_region: bool,
}
Expand description

Options to parse the input bam file and the filters that should be applied to the bam file.

This struct is parsed to create command line arguments and then passed to many functions. We have copied and edited a similar struct from the fibertools-rs repository. You can build this through InputBamBuilder. In CLI mode, clap populates this struct. This and the InputMods struct are used to set almost all input options to many of our functions that process BAM/modBAM files.

§Examples

We first begin with an example to build the struct. The next example shows how this struct and InputMods can be used to construct inputs to one of our BAM processing functions.

Sample way to build the struct. Some of the parameters are optional and can be left unset which would give them default values. We do not check if the specified bam path or URL exists as there are use cases where files are generated before the InputBam object is used. Not all options are listed here; for a full list, please see all the methods of the builder.

use nanalogue_core::{Error, F32Bw0and1, InputBamBuilder, PathOrURLOrStdin};

let bam = InputBamBuilder::default()
    .bam_path(PathOrURLOrStdin::Path("/some/path/to/bam.bam".into()))
    .min_seq_len(30000u64)
    .min_align_len(20000i64)
    .read_id("some-id")
    .read_filter("primary_forward,secondary_forward".into())
    .sample_fraction(F32Bw0and1::new(1.0).expect("no error"))
    .mapq_filter(20)
    .exclude_mapq_unavail(true)
    .region("chr4:1000-2000".into())
    .full_region(true)
    .build()?;

This struct and the InputMods struct allow us to set input options for BAM/modBAM calculations. An example is shown below where the crate::read_info::run command is called to process data from a BAM file with some input options.

use nanalogue_core::{BamRcRecords, BamPreFilt, Error, InputBamBuilder, InputModsBuilder,
    OptionalTag, PathOrURLOrStdin, ThresholdState, nanalogue_bam_reader, read_info};

let mut bam = InputBamBuilder::default()
    .bam_path(PathOrURLOrStdin::Path("./examples/example_1.bam".into()))
    .region("dummyI".into())
    .build()?;
let mut mods = InputModsBuilder::<OptionalTag>::default()
    .mod_prob_filter(ThresholdState::GtEq(0))
    .build()?;

let mut buffer = Vec::new();
let mut reader = nanalogue_bam_reader(&bam.bam_path.to_string())?;
let bam_rc_records = BamRcRecords::new(&mut reader, &mut bam, &mut mods)?;
read_info::run(
    &mut buffer,
    bam_rc_records.rc_records
        .filter(|r| r.as_ref().map_or(true, |v| v.pre_filt(&bam))),
    mods,
    None,
)?;
assert!(str::from_utf8(buffer.as_slice())?
    .contains("5d10eb9a-aae1-4db8-8ec6-7ebb34d32575"));

§Examples resulting in errors

Full region without actually setting a region

use nanalogue_core::{Error, InputBamBuilder, PathOrURLOrStdin};

let bam = InputBamBuilder::default()
    .bam_path(PathOrURLOrStdin::Path("/some/path/to/bam.bam".into()))
    .read_id("some-id")
    .full_region(true)
    .build()?;

Setting both region and region_bed3. region can be converted to region_bed3 using GenomicRegion::try_to_bed3 and a BAM header.

use bedrs::prelude::Bed3;
use nanalogue_core::{Error, InputBamBuilder, PathOrURLOrStdin};

let bam = InputBamBuilder::default()
    .bam_path(PathOrURLOrStdin::Path("/some/path/to/bam.bam".into()))
    .read_id("some-id")
    .region("chr4:1000-2000".into())
    .region_bed3(Bed3::<i32,u64>::new(3, 1000, 2000))
    .build()?;

Setting more than one of read_id, read_id_list and read_id_set.

  • read_id means filter to retain only this read.
  • read_id_list is a path to a file with a list of read ids.
  • read_id_set is a set of read ids supplied directly.
use bedrs::prelude::Bed3;
use nanalogue_core::{Error, InputBamBuilder, PathOrURLOrStdin};
use std::collections::HashSet;

let _ = InputBamBuilder::default()
    .bam_path(PathOrURLOrStdin::Path("/some/path/to/bam.bam".into()))
    .read_id("some-id")
    .read_id_list("/some/file.txt")
    .build().unwrap_err();

let mut read_id_set = HashSet::<String>::new();
read_id_set.insert("some-read-a".to_owned());
read_id_set.insert("some-read-b".to_owned());

let _ = InputBamBuilder::default()
    .bam_path(PathOrURLOrStdin::Path("/some/path/to/bam.bam".into()))
    .read_id_list("/some/file.txt")
    .read_id_set(read_id_set.clone())
    .build().unwrap_err();

let _ = InputBamBuilder::default()
    .bam_path(PathOrURLOrStdin::Path("/some/path/to/bam.bam".into()))
    .read_id("some-id")
    .read_id_set(read_id_set)
    .build().unwrap_err();

Fields (Non-exhaustive)§

This struct is marked as non-exhaustive
Non-exhaustive structs could have additional fields added in future. Therefore, non-exhaustive structs cannot be constructed in external crates using the traditional Struct { .. } syntax; cannot be matched against without a wildcard ..; and struct update syntax will not work.
§bam_path: PathOrURLOrStdin

Input BAM file. Set to a local file path, or set to - to read from stdin, or set to a URL to read from a remote file. If using stdin and piping in from samtools view, always include the header with the -h option.

§min_seq_len: u64

Exclude reads whose sequence length in the BAM file is below this value. Defaults to 0.

§min_align_len: Option<i64>

Exclude reads whose alignment length in the BAM file is below this value. Defaults to unused.

§read_id: Option<String>

Only include this read id, defaults to unused i.e. all reads are used. NOTE: if there are multiple alignments corresponding to this read id, all of them are used.

§read_id_list: Option<String>

Path to file containing list of read IDs (one per line). Lines starting with ‘#’ are treated as comments and ignored. Cannot be used together with –read-id.

§read_id_set: Option<HashSet<String>>

Internal HashSet of read IDs loaded from read_id_list file. This is populated automatically and not exposed to users.

§threads: NonZeroU32

Number of threads used during some aspects of program execution

§include_zero_len: bool

Include “zero-length” sequences e.g. sequences with “*” in the sequence field. By default, these sequences are excluded to avoid processing errors. If this flag is set, these reads are included irrespective of any minimum sequence or align length criteria the user may have set. WARNINGS: (1) Some functions of the codebase may break or produce incorrect results if you use this flag. (2) due to a technical reason, we need a DNA sequence in the sequence field and cannot infer sequence length from other sources e.g. CIGAR strings.

§read_filter: Option<ReadStates>

Only retain reads of this type. Allowed types are primary_forward, primary_reverse, secondary_forward, secondary_reverse, supplementary_forward, supplementary_reverse and unmapped. Specify more than one type if needed separated by commas, in which case reads of any type in list are retained. Defaults to retain reads of all types.

§sample_fraction: F32Bw0and1

Subsample BAM to retain only this fraction of total number of reads, defaults to 1.0. The sampling algorithm considers every read according to the specified probability, so due to this, you may not always get the same number of reads e.g. if you set -s 0.05 in a file with 1000 reads, you will get 50 +- sqrt(50) reads. NOTE: a new subsample is drawn every time as the seed is not fixed. If you want reproducibility, consider piping the output of samtools view -s to our program.

§mapq_filter: u8

Exclude reads whose MAPQ (Mapping quality of position) is below this value. Defaults to zero i.e. do not exclude any read.

§exclude_mapq_unavail: bool

Exclude sequences with MAPQ unavailable. In the BAM format, a value of 255 in this column means MAPQ is unavailable. These reads are allowed by default, set this flag to exclude.

§region: Option<GenomicRegion>

Only keep reads passing through this region. If a BAM index is available with a name same as the BAM file but with the .bai suffix, the operation of selecting such reads will be faster. If you are using standard input as your input e.g. you are piping in the output from samtools, then you cannot use an index as a BAM filename is not available.

§region_bed3: Option<Bed3<i32, u64>>

Only keep read data from this region. This is an internal option not exposed to the user, we will set it based on the other options that the user sets.

§full_region: bool

Only keep reads if they pass through the specified region in full. Related to the input --region; has no effect if that is not set.

Trait Implementations§

Source§

impl Args for InputBam

Source§

fn group_id() -> Option<Id>

Report the ArgGroup::id for this set of arguments
Source§

fn augment_args<'b>(__clap_app: Command) -> Command

Append to Command so it can instantiate Self via FromArgMatches::from_arg_matches_mut Read more
Source§

fn augment_args_for_update<'b>(__clap_app: Command) -> Command

Append to Command so it can instantiate self via FromArgMatches::update_from_arg_matches_mut Read more
Source§

impl Clone for InputBam

Source§

fn clone(&self) -> InputBam

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl Debug for InputBam

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for InputBam

Implements a default class for InputBAM

Source§

fn default() -> Self

Returns the “default value” for a type. Read more
Source§

impl<'de> Deserialize<'de> for InputBam

Source§

fn deserialize<__D>(__deserializer: __D) -> Result<Self, __D::Error>
where __D: Deserializer<'de>,

Deserialize this value from the given Serde deserializer. Read more
Source§

impl From<PathOrURLOrStdin> for InputBam

Source§

fn from(val: PathOrURLOrStdin) -> Self

Converts a PathOrURLOrStdin into an InputBam with default settings.

This creates an InputBam with the given BAM path and all other fields set to their defaults.

Source§

impl FromArgMatches for InputBam

Source§

fn from_arg_matches(__clap_arg_matches: &ArgMatches) -> Result<Self, Error>

Instantiate Self from ArgMatches, parsing the arguments as needed. Read more
Source§

fn from_arg_matches_mut( __clap_arg_matches: &mut ArgMatches, ) -> Result<Self, Error>

Instantiate Self from ArgMatches, parsing the arguments as needed. Read more
Source§

fn update_from_arg_matches( &mut self, __clap_arg_matches: &ArgMatches, ) -> Result<(), Error>

Assign values from ArgMatches to self.
Source§

fn update_from_arg_matches_mut( &mut self, __clap_arg_matches: &mut ArgMatches, ) -> Result<(), Error>

Assign values from ArgMatches to self.
Source§

impl InputRegionOptions for InputBam

Source§

fn region_filter_genomic_string(&self) -> Option<GenomicRegion>

returns region requested but region in genomic string format
Source§

fn region_filter(&self) -> &Option<Bed3<i32, u64>>

returns region requested
Source§

fn set_region_filter(&mut self, value: Option<Bed3<i32, u64>>)

sets region requested
Source§

fn is_full_overlap(&self) -> bool

returns true if full overlap with region is requested as opposed to only partial overlap. defaults to false.
Source§

fn convert_region_to_bed3(&mut self, header: HeaderView) -> Result<(), Error>

converts region from genomic string representation to bed3 representation Read more
Source§

impl Serialize for InputBam

Source§

fn serialize<__S>(&self, __serializer: __S) -> Result<__S::Ok, __S::Error>
where __S: Serializer,

Serialize this value into the given Serde serializer. Read more

Auto Trait Implementations§

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> DynClone for T
where T: Clone,

Source§

fn __clone_box(&self, _: Private) -> *mut ()

Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> IntoEither for T

Source§

fn into_either(self, into_left: bool) -> Either<Self, Self>

Converts self into a Left variant of Either<Self, Self> if into_left is true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

Converts self into a Left variant of Either<Self, Self> if into_left(&self) returns true. Converts self into a Right variant of Either<Self, Self> otherwise. Read more
Source§

impl<T> Key for T
where T: Clone,

Source§

fn align() -> usize

The alignment necessary for the key. Must return a power of two.
Source§

fn size(&self) -> usize

The size of the key in bytes.
Source§

unsafe fn init(&self, ptr: *mut u8)

Initialize the key in the given memory location. Read more
Source§

unsafe fn get<'a>(ptr: *const u8) -> &'a T

Get a reference to the key from the given memory location. Read more
Source§

unsafe fn drop_in_place(ptr: *mut u8)

Drop the key in place. Read more
Source§

impl<T> Pointable for T

Source§

const ALIGN: usize

The alignment of pointer.
Source§

type Init = T

The type for initializers.
Source§

unsafe fn init(init: <T as Pointable>::Init) -> usize

Initializes a with the given initializer. Read more
Source§

unsafe fn deref<'a>(ptr: usize) -> &'a T

Dereferences the given pointer. Read more
Source§

unsafe fn deref_mut<'a>(ptr: usize) -> &'a mut T

Mutably dereferences the given pointer. Read more
Source§

unsafe fn drop(ptr: usize)

Drops the object pointed to by the given pointer. Read more
Source§

impl<T> PolicyExt for T
where T: ?Sized,

Source§

fn and<P, B, E>(self, other: P) -> And<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow only if self and other return Action::Follow. Read more
Source§

fn or<P, B, E>(self, other: P) -> Or<T, P>
where T: Policy<B, E>, P: Policy<B, E>,

Create a new Policy that returns Action::Follow if either self or other returns Action::Follow. Read more
Source§

impl<T> Same for T

Source§

type Output = T

Should always be Self
Source§

impl<SS, SP> SupersetOf<SS> for SP
where SS: SubsetOf<SP>,

Source§

fn to_subset(&self) -> Option<SS>

The inverse inclusion map: attempts to construct self from the equivalent element of its superset. Read more
Source§

fn is_in_subset(&self) -> bool

Checks if self is actually part of its subset T (and can be converted to it).
Source§

fn to_subset_unchecked(&self) -> SS

Use with care! Same as self.to_subset but without any property checks. Always succeeds.
Source§

fn from_subset(element: &SS) -> SP

The inclusion map: converts self to the equivalent element of its superset.
Source§

impl<SS, SP> SupersetOf<SS> for SP
where SS: SubsetOf<SP>,

Source§

fn to_subset(&self) -> Option<SS>

The inverse inclusion map: attempts to construct self from the equivalent element of its superset. Read more
Source§

fn is_in_subset(&self) -> bool

Checks if self is actually part of its subset T (and can be converted to it).
Source§

fn to_subset_unchecked(&self) -> SS

Use with care! Same as self.to_subset but without any property checks. Always succeeds.
Source§

fn from_subset(element: &SS) -> SP

The inclusion map: converts self to the equivalent element of its superset.
Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<V, T> VZip<V> for T
where V: MultiLane<T>,

Source§

fn vzip(self) -> V

Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

impl<T> DeserializeOwned for T
where T: for<'de> Deserialize<'de>,

Source§

impl<T> ErasedDestructor for T
where T: 'static,

Source§

impl<T> MetaBounds for T
where T: Clone + Default + Debug + Send + Sync,

Source§

impl<T> PlanCallbackArgs for T

Source§

impl<T> PlanCallbackOut for T