pub fn split_fa(args: &Args) -> Result<()>Expand description
Splits a non-gzipped FASTA file into multiple smaller FASTA files.
This function reads a FASTA file, identifies the start positions of all records
using memchr_iter to find FA_NEEDLE (typically >). It then divides the
file’s content (memory-mapped for efficiency) into chunks based on the
specified SplitMode (either ChunkSize or NumFiles). Each chunk is then
written to a new output file within the designated output directory, utilizing
a Rayon thread pool for parallel processing.
§Arguments
args- A reference to anArgsstruct containing the input file path, output directory, number of threads, splitting mode, and an optional suffix.
§Returns
Result<()>- AnOk(())on successful completion, or ananyhow::Errorif any operation (file opening, memory mapping, directory creation, writing to files, or thread pool building) fails.
§Errors
- Returns an error if the input FASTA file cannot be opened or memory-mapped.
- Returns an error if no FASTA records are found in the input file.
- Returns an error if the output directory cannot be created.
- Returns an error if
SplitMode::NumFilesis 0. - Returns any
std::io::Errorduring file writing.
§Parallelism
This function uses rayon for parallel processing of chunks, improving
performance for large files. The number of threads is configured via args.threads.
§Example
ⓘ
use anyhow::Result;
use std::path::PathBuf;
// Assuming Args and SplitMode are defined as in lib_iso_split example
fn main() -> Result<()> {
let args = cli::Args {
file: PathBuf::from("input.fa"),
outdir: PathBuf::from("fa_chunks"),
threads: 4,
suffix: Some("part".to_string()),
mode_chunk_size: Some(100), // Split into chunks of 100 records
mode_num_files: None,
};
// split_fa(&args)?;
println!("Successfully split FASTA file.");
Ok(())
}