# π¦π§¬ `nafcodec` [](https://github.com/althonos/nafcodec/stargazers)
*Rust coder/decoder for [Nucleotide Archive Format (NAF)](https://github.com/KirillKryukov/naf) files*.
[](https://github.com/althonos/nafcodec/actions)
[](https://app.codecov.io/gh/althonos/nafcodec)
[](https://choosealicense.com/licenses/mit/)
[](https://nafcodec.readthedocs.io)
[](https://crates.io/crates/nafcodec-py)
[](https://pypi.org/project/nafcodec)
[](https://pypi.org/project/nafcodec/#files)
[](https://anaconda.org/bioconda/nafcodec)
[](https://pypi.org/project/nafcodec/#files)
[](https://pypi.org/project/nafcodec/#files)
[](https://github.com/althonos/nafcodec/tree/main/nafcodec-py)
[](https://git.embl.de/larralde/nafcodec/)
[](https://github.com/althonos/nafcodec/issues)
[](https://github.com/althonos/nafcodec/blob/master/CHANGELOG.md)
[](https://pepy.tech/project/nafcodec)
## πΊοΈ Overview
[Nucleotide Archive Format](https://github.com/KirillKryukov/naf) is a file
format proposed in Kryukov *et al.*[\[1\]](#ref1) in 2019 for storing
compressed nucleotide or protein sequences combining 4-bit encoding and
[Zstandard](https://github.com/facebook/zstd) compression. NAF files can
be compressed and decompressed using the
[original C implementation](https://kirill-kryukov.com/study/naf).
This library provides [PyO3](https://pyo3.rs) bindings to the `nafcodec` crate,
a Rust implementation of a NAF decoder using [`nom`](https://crates.io/crates/nom)
for parsing the binary format, and [`zstd`](https://crates.io/crates/zstd) for
handling Zstandard decompression. It provides a complete API that allows
iterating over the contents of a NAF file.
*This is the Python version, there is a [Rust crate](https://crates.io/crates/nafcodec) available as well.*
### π Features
- **streaming decoder**: The decoder is implemented using different readers
each accessing a region of the compressed file, allowing to stream records
without having to decode full blocks.
- **file-like decoding**: Allow the decoder to read from a file-like object
instead of expecting a path.
The following features are planned:
- **optional decoding**: Allow the decoder to skip the decoding of certains
fields, such as ignoring quality strings when they are not needed.
- **encoder**: Implement an encoder as well, using either in-memory buffers
or temporary files to grow the archive.
### π Usage
Use a `nafcodec.Decoder` to iterate over the contents of a Nucleotide Archive
Format, reading from the given [path-like](https://docs.python.org/3/glossary.html#term-path-like-object)
or [file-like](https://docs.python.org/3/glossary.html#term-file-object) object:
```python
import nafcodec
decoder = nafcodec.Decoder("../data/LuxC.naf")
for record in decoder:
print(record.id)
```
All fields of the obtained `Record` are optional, and actually depend on the
kind of data that was compressed.
## π Feedback
### β οΈ Issue Tracker
Found a bug ? Have an enhancement request ? Head over to the [GitHub issue
tracker](https://github.com/althonos/nafcodec/issues) if you need to report
or ask something. If you are filing in on a bug, please include as much
information as you can about the issue, and try to recreate the same bug
in a simple, easily reproducible situation.
## π Changelog
This project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.html)
and provides a [changelog](https://github.com/althonos/nafcodec/blob/master/CHANGELOG.md)
in the [Keep a Changelog](http://keepachangelog.com/en/1.0.0/) format.
## βοΈ License
This library is provided under the open-source
[MIT license](https://choosealicense.com/licenses/mit/). The
[NAF specification](https://github.com/KirillKryukov/naf/blob/master/NAFv2.pdf)
is in the public domain.
*This project is in no way not affiliated, sponsored, or otherwise endorsed
by the [original NAF authors](https://github.com/KirillKryukov). It was
developed by [Martin Larralde](https://github.com/althonos/) during his PhD
project at the [European Molecular Biology Laboratory](https://www.embl.de/)
in the [Zeller team](https://github.com/zellerlab).*
## π References
- <a id="ref1">\[1\]</a> Kirill Kryukov, Mahoko Takahashi Ueda, So Nakagawa, Tadashi Imanishi. "Nucleotide Archival Format (NAF) enables efficient lossless reference-free compression of DNA sequences". Bioinformatics, Volume 35, Issue 19, October 2019, Pages 3826β3828. [doi:10.1093/bioinformatics/btz144](https://doi.org/10.1093/bioinformatics/btz144)