Rustynetics
Rustynetics is a high-performance Rust library designed for bioinformatics applications, offering efficient and scalable handling of common genomic file formats. It supports reading and writing of widely used formats such as BAM, FASTQ, FASTA, bigWig, bedGraph, BED, and GFF, making it an essential tool for genomic data processing pipelines.
The library excels in computing coverage tracks, summarizing sequence alignments or read counts across the genome, allowing users to generate coverage profiles over specified the genome. In addition, it offers advanced statistical features, such as the calculation of cross-correlations, which can be used to assess relationships between different genomic datasets, for example, in ChIP-seq or RNA-seq analysis.
One of the library's core strengths is its efficient handling of genomic ranges. It offers a highly optimized data structure for manipulating large genomic intervals, ensuring that operations like querying, merging, or intersecting genomic regions are performed with minimal overhead. Moreover, the library provides sequence containers for FASTA data, motif/PWM utilities, k-mer counting, segmentation import/export, and pretty printing for displaying genomic ranges in human-readable formats.
Designed with performance and usability in mind, this library is ideal for large-scale genomics projects requiring both speed and precision, whether for research in genomics, epigenetics, or other related fields.
Documentation
Please find the API documentation here.
Tools
The package contains the following command line tools:
| Tool | Description |
|---|---|
| bam-check-fastq | check whether all BAM read names are present in a FASTQ file |
| bam-check-bin | check bin records of a bam file |
| bam-genome | print the genome (sequence table) of a bam file |
| bam-to-fastq | reconstruct FASTQ records from a BAM file |
| bam-to-bigwig | convert bam to bigWig (estimate fragment length if required) |
| bam-view | print contents of a bam file |
| bed-remove-overlaps | remove BED or table rows that overlap inadmissible regions |
| bigwig-counts-to-quantiles | convert bigWig counts to empirical quantiles |
| bigwig-edit-chrom-names | rewrite a bigWig with chromosome names transformed by a regex |
| bigwig-extract | extract bigWig data for BED regions as a table or bigWig |
| bigwig-extract-chroms | write a new bigWig containing only selected chromosomes |
| bigwig-genome | print the genome (sequence table) of a bigWig file |
| bigwig-histogram | compute a histogram or cumulative histogram over track values |
| bigwig-info | print information about a bigWig file |
| bigwig-map | apply a shared-library mapping function across one or more bigWig tracks |
| bigwig-nil | re-encode a bigWig track through the Rust implementation |
| bigwig-positive | call joint positive regions across one or more bigWig tracks |
| bigwig-quantile-normalize | quantile-normalize one bigWig track against a reference |
| bigwig-query | retrieve data from a bigWig file |
| bigwig-query-sequence | retrieve sequences from a bigWig file |
| bigwig-statistics | print summary statistics for a bigWig track |
| chromhmm-tables-to-bigwig | convert ChromHMM per-chromosome tables to bigWig |
| count-kmers | count or identify k-mers in FASTA sequences or BED regions |
| draw-genomic-regions | draw random genomic regions from a genome |
| fasta-extract | extract FASTA subsequences for BED regions |
| fasta-unresolved-regions | report unresolved (N) FASTA intervals as BED |
| gtf-to-bed | convert GTF records to BED6 |
| meme-extract | extract PWM or PPM motif matrices from MEME or DREME XML |
| observed-over-expected-cpg | compute observed/expected CpG scores for regions or whole sequences |
| pwm-scan-regions | score genomic regions with one or more PWMs |
| pwm-scan-sequences | scan FASTA sequences with a PWM and export a bigWig track |
| segmentation-differential | merge and score differential chromatin states across segmentations |
| sequence-similarity | compute sliding-window k-mer similarity to a reference sequence |
Examples
Import genes from UCSC
use crateGenes;
// Import from local file
if let Ok = import_genes
// Retrieve from USCS server
if let Ok = import_genes_from_ucsc
The result is:
|
) | )
) | )
) | )
) | )
) | )
|
) | )
) | )
) | )
) | )
) | )
Read GTF files
use crateGRanges;
let granges = import_gtf.unwrap;
The result is:
|
) |
) |
Read a BAM file into a GRanges object
use crateBamReaderOptions;
use crateGRanges;
let mut options = default;
options.read_cigar = true;
options.read_qual = true;
if let Ok = import_bam_single_end
The result is:
|
) |
) |
) |
) |
) |
|
) | ;
) |
) |
) | ;));
) |
Read and write FASTQ records
use ;
use File;
use ;
let input = open.unwrap;
let mut reader = new;
let output = create.unwrap;
let mut writer = new;
while let Some = reader.read_record.unwrap
let record = new.unwrap;
writer.write_record.unwrap;
writer.flush.unwrap;
Reconstruct FASTQ from BAM
bam-to-fastq reconstructs FASTQ records from BAM alignments. Reverse-strand reads are emitted in original FASTQ orientation by reverse-complementing the sequence and reversing the qualities.
# Write a single FASTQ stream
# Split paired-end output into separate files and keep singles
If a BAM file does not contain quality scores, use --fill-missing-quality to emit synthetic qualities:
Sequence, PWM, and k-mer tools
The package also contains utilities for FASTA extraction, motif scanning, and k-mer counting:
# Extract sequences for BED intervals
# Report unresolved regions
# Count k-mers in FASTA sequences
# Score genomic regions with one or more PWMs
# Scan complete sequences and export the scores as a bigWig track
Reading BigWig files
BigWig files contain data in a binary format optimized for fast random access. In addition to the raw data, bigWig files typically contain several zoom levels for which the data has been summarized. The BigWigReader class allows to query data and it automatically selects an appropriate zoom level for the given binsize:
let seqname = "chrY"; // (can be a regular expression)
let from = 1838100;
let to = 1838600;
let binsize = 100;
// The reader accepts either a local file or a file
// hosted on a HTTP server
if let Ok = new_reader
The result is:
(data=(chrom_id=chrY, from=1838100, to=1838200, statistics=(valid=1, min=1.0000, max=1.0000, sum=1.0000, sum_squares=1.0000)))
(data=(chrom_id=chrY, from=1838200, to=1838300, statistics=(valid=1, min=1.0000, max=1.0000, sum=1.0000, sum_squares=1.0000)))
(data=(chrom_id=chrY, from=1838300, to=1838400, statistics=(valid=1, min=0.0000, max=0.0000, sum=0.0000, sum_squares=0.0000)))
(data=(chrom_id=chrY, from=1838400, to=1838500, statistics=(valid=1, min=0.0000, max=0.0000, sum=0.0000, sum_squares=0.0000)))
(data=(chrom_id=chrY, from=1838500, to=1838600, statistics=(valid=1, min=0.0000, max=0.0000, sum=0.0000, sum_squares=0.0000)))
BigWig utilities
# Apply a custom shared-library function across tracks
# For Rust plugins, export `rustynetics_bigwig_map` and select the Rust ABI
# Extract selected regions from a bigWig file as a table
# Keep only selected chromosomes
# Quantile-normalize one bigWig track against another
# Compute global statistics or a histogram
# Find regions where multiple tracks are jointly positive
# Preview chromosome renaming without modifying the file
bigwig-map loads a shared library and calls a mapping function for each bin.
The mapper receives the sequence name, the genomic position, and the values from
all input tracks for the current bin, and must return the output value for that bin.
For Rust, the recommended interface is the Rust ABI plugin mode:
- build the mapper as a
cdylib - export the symbol
rustynetics_bigwig_map - accept a pointer to
rustynetics::bigwig_map_plugin::BigWigMapInput
bigwig-map defaults to --abi auto, which first looks for the Rust symbol
rustynetics_bigwig_map and then falls back to the legacy shared-library ABI
(F or bigwig_map). You can force one mode explicitly with --abi rust or
--abi legacy. --symbol overrides the default symbol name.
A minimal Rust mapper looks like this:
use BigWigMapInput;
pub unsafe extern "C"
The BigWigMapInput helper provides:
input.seqname()for the chromosome nameinput.positionfor the current genomic positioninput.values()for the per-track input values at that position
The plugin crate needs to be compiled as a shared library:
[]
= ["cdylib"]
Then build and run it like this:
On macOS the shared library is usually libmy_mapper.dylib, and on Windows
my_mapper.dll.
Motif XML extraction
# Extract MEME motifs as log-odds matrices
# Extract DREME motifs as PPMs in JASPAR format
Compute coverage tracks from BAM files
We download a BAM file from a ChIP-seq experiment in Homo sapiens A549 with FOXS1 as target (ENCFF504WRM) in addition to the control data (ENCFF739ECZ):
use bam_coverage;
let tracks_treatment = vec!;
let tracks_control = vec!;
// Set fragment length to 0, which means that fragments will not be extended.
// Setting this to None will trigger automatic fragment length estimation
let fraglen_treatment = vec!;
let fraglen_control = vec!;
let = bam_coverage.unwrap;
if let Err = track.export_bigwig