Crate entab

source · []
Expand description

entab is a library to parse different “record-formatted” file formats into tabular form.

Entab provides two different ways to parse each file it supports. If you know the type of the file you’ll be reading, you generally want to use the specific parser for that file type which will return a record of a specific type. For example, to parse the IDs out of a FASTA file you might do the following:

use std::fs::File;
use entab::parsers::fasta::{FastaReader, FastaRecord};

let file = File::open("./tests/data/sequence.fasta")?;
let mut reader = FastaReader::new(file, None)?;
while let Some(FastaRecord { id, .. }) = reader.next()? {
    println!("{}", id);
}

Alternatively, you may not know the type of file when writing your code so you may want to abstract over as many types as possible. This is where the slower, generic parser framework is used (for example, in the bindings for different languages). This framework can optionally take a parser_name to force it to use that specific parser and optional params to control parser options.

use std::fs::File;
use entab::filetype::FileType;
use entab::readers::get_reader;

let file = File::open("./tests/data/sequence.fasta")?;
let (mut reader, _) = get_reader(file, None, None)?;
while let Some(record) = reader.next_record()? {
    println!("{:?}", record[0]);
}

Re-exports

pub use error::EtError;

Modules

The buffer interface that underlies the file readers

Generic file decompression

Miscellanous utility functions and error handling

File format inference

Lightweight parsers to read records out of buffers

Parsers for specific file formats

Record and abstract record reading

Macros

Generates a ...Reader struct for the associated state-based file parsers along with the matching RecordReader for that struct.

Autogenerates the conversion from a struct into the matching Vec of headers and the corresponding Vec of Values to allow decomposing these raw structs into a common Record system that allows abstracting over different file formats.