Expand description
§liblrge
liblrge
is a Rust library that provides utilities for estimating genome size for a given set
of reads.
You can find a command-line interface (CLI) tool that uses this library in the lrge
crate.
§Usage
The library provides two strategies for estimating genome size:
§TwoSetStrategy
The two-set strategy uses two (random) sets of reads to estimate the genome size. The query set, which is generally smaller, is overlapped against a target set of reads. A genome size estimate is generated for each read in the query set, based on the number of overlaps and the average read length. The median of these estimates is taken as the final genome size estimate.
use liblrge::{Estimate, TwoSetStrategy};
use liblrge::twoset::{Builder, DEFAULT_TARGET_NUM_READS, DEFAULT_QUERY_NUM_READS};
let input = "path/to/reads.fastq";
let mut strategy = Builder::new()
.target_num_reads(DEFAULT_TARGET_NUM_READS)
.query_num_reads(DEFAULT_QUERY_NUM_READS)
.threads(4)
.build(input);
let est_result = strategy.estimate(false, None, None).expect("Failed to generate estimate");
let estimate = est_result.estimate;
// do something with the estimate
§AvaStrategy
The all-vs-all (ava) strategy takes a (random) set of reads and overlaps it against itself to estimate the genome size. The genome size estimate is generated for each read in the set, based on the number of overlaps and the average read length - minus the read being assessed. The median of these estimates is taken as the final genome size estimate.
use liblrge::{Estimate, AvaStrategy};
use liblrge::ava::{Builder, DEFAULT_AVA_NUM_READS};
let input = "path/to/reads.fastq";
let mut strategy = Builder::new()
.num_reads(DEFAULT_AVA_NUM_READS)
.threads(4)
.build(input);
let est_result = strategy.estimate(false, None, None).expect("Failed to generate estimate");
let estimate = est_result.estimate;
// do something with the estimate
§Features
This library includes optional support for compressed file formats, controlled by feature flags.
By default, the compression
feature is enabled, which activates support for all included
compression formats.
§Available Features
- compression (default): Enables all available compression formats (
gzip
,zstd
,bzip2
,xz
). - gzip: Enables support for gzip-compressed files (
.gz
) using theflate2
crate. - zstd: Enables support for zstd-compressed files (
.zst
) using thezstd
crate. - bzip2: Enables support for bzip2-compressed files (
.bz2
) using thebzip2
crate. - xz: Enables support for xz-compressed files (
.xz
) using theliblzma
crate.
§Enabling and Disabling Features
By default, all compression features are enabled. However, you can selectively enable or disable them
in your Cargo.toml
to reduce dependencies or target specific compression formats:
To disable all compression features:
liblrge = { version = "0.1.1", default-features = false }
To enable only specific compression formats, list the desired features in Cargo.toml
:
liblrge = { version = "0.1.1", default-features = false, features = ["gzip", "zstd"] }
In this example, only gzip
(flate2
) and zstd
are enabled, so liblrge
will support .gz
and .zst
files.
§Compression Detection
The library uses magic bytes at the start of the file to detect its compression format before deciding how to read it. Supported formats include gzip, zstd, bzip2, and xz, with automatic decompression if the appropriate feature is enabled.
§Disabling logging
liblrge
will output some logging information via the log
crate. If you wish to
suppress this logging you can configure the logging level in your application. For example, using
the env_logger
crate you can do the following:
use log::LevelFilter;
let mut log_builder = env_logger::Builder::new();
log_builder
.filter(None, LevelFilter::Info)
.filter_module("liblrge", LevelFilter::Off);
log_builder.init();
// Your application code here
This will set the global logging level to Info
and disable all logging from the liblrge
library.
Re-exports§
pub use self::ava::AvaStrategy;
pub use self::estimate::Estimate;
pub use self::twoset::TwoSetStrategy;
Modules§
- ava
- A strategy that compares overlaps between the same set of reads - i.e., all-vs-all.
- error
- Error handling for liblrge.
- estimate
- A trait for generating genome size estimates, and calculating the median of those estimates.
- twoset
- A strategy that compares overlaps between two different sets of reads.
Enums§
- Platform
- The sequencing platform used to generate the reads.