Crate liblrge

Crate liblrge 

Source
Expand description

§liblrge

liblrge is a Rust library that provides utilities for estimating genome size for a given set of reads.

You can find a command-line interface (CLI) tool that uses this library in the lrge crate.

§Usage

The library provides two strategies for estimating genome size:

§TwoSetStrategy

The two-set strategy uses two (random) sets of reads to estimate the genome size. The query set, which is generally smaller, is overlapped against a target set of reads. A genome size estimate is generated for each read in the query set, based on the number of overlaps and the average read length. The median of these estimates is taken as the final genome size estimate.

use liblrge::{Estimate, TwoSetStrategy};
use liblrge::twoset::{Builder, DEFAULT_TARGET_NUM_READS, DEFAULT_QUERY_NUM_READS};

let input = "path/to/reads.fastq";
let mut strategy = Builder::new()
   .target_num_reads(DEFAULT_TARGET_NUM_READS)
   .query_num_reads(DEFAULT_QUERY_NUM_READS)
   .threads(4)
   .build(input);

let est_result = strategy.estimate(false, None, None).expect("Failed to generate estimate");
let estimate = est_result.estimate;
// do something with the estimate

§AvaStrategy

The all-vs-all (ava) strategy takes a (random) set of reads and overlaps it against itself to estimate the genome size. The genome size estimate is generated for each read in the set, based on the number of overlaps and the average read length - minus the read being assessed. The median of these estimates is taken as the final genome size estimate.

use liblrge::{Estimate, AvaStrategy};
use liblrge::ava::{Builder, DEFAULT_AVA_NUM_READS};

let input = "path/to/reads.fastq";
let mut strategy = Builder::new()
   .num_reads(DEFAULT_AVA_NUM_READS)
  .threads(4)
  .build(input);

let est_result = strategy.estimate(false, None, None).expect("Failed to generate estimate");
let estimate = est_result.estimate;
// do something with the estimate

§Features

This library includes optional support for compressed file formats, controlled by feature flags. By default, the compression feature is enabled, which activates support for all included compression formats.

§Available Features

  • compression (default): Enables all available compression formats (gzip, zstd, bzip2, xz).
  • gzip: Enables support for gzip-compressed files (.gz) using the flate2 crate.
  • zstd: Enables support for zstd-compressed files (.zst) using the zstd crate.
  • bzip2: Enables support for bzip2-compressed files (.bz2) using the bzip2 crate.
  • xz: Enables support for xz-compressed files (.xz) using the liblzma crate.

§Enabling and Disabling Features

By default, all compression features are enabled. However, you can selectively enable or disable them in your Cargo.toml to reduce dependencies or target specific compression formats:

To disable all compression features:

liblrge = { version = "0.1.1", default-features = false }

To enable only specific compression formats, list the desired features in Cargo.toml:

liblrge = { version = "0.1.1", default-features = false, features = ["gzip", "zstd"] }

In this example, only gzip (flate2) and zstd are enabled, so liblrge will support .gz and .zst files.

§Compression Detection

The library uses magic bytes at the start of the file to detect its compression format before deciding how to read it. Supported formats include gzip, zstd, bzip2, and xz, with automatic decompression if the appropriate feature is enabled.

§Disabling logging

liblrge will output some logging information via the log crate. If you wish to suppress this logging you can configure the logging level in your application. For example, using the env_logger crate you can do the following:

use log::LevelFilter;

let mut log_builder = env_logger::Builder::new();
log_builder
    .filter(None, LevelFilter::Info)
    .filter_module("liblrge", LevelFilter::Off);
log_builder.init();

// Your application code here

This will set the global logging level to Info and disable all logging from the liblrge library.

Re-exports§

pub use self::ava::AvaStrategy;
pub use self::estimate::Estimate;
pub use self::twoset::TwoSetStrategy;

Modules§

ava
A strategy that compares overlaps between the same set of reads - i.e., all-vs-all.
error
Error handling for liblrge.
estimate
A trait for generating genome size estimates, and calculating the median of those estimates.
twoset
A strategy that compares overlaps between two different sets of reads.

Enums§

Platform
The sequencing platform used to generate the reads.

Type Aliases§

Result
A type alias for Result with LrgeError as the error type.