Bio Files: Read and write common biology file formats
This library contains functionality to load and save data in common biology file formats. It operates on data structures that are specific to each file format; you will need to convert to and from the structures used by your application. The API docs, and examples below are sufficient to get started.
Currently supported formats:
- Mol2 (Small molecules, e.g. ligands)
- SDF (Small molecules, e.g. ligands)
- Map (Electron density, e.g. from crystallography, Cryo EM)
- AB1 (Sequence tracing)
- DAT (Amber force field data for small molecules)
- FRCMOD (Amber force field patch data for small molecules)
Planned:
- PDBQT (Exists in Daedalus; needs to be decoupled)
- MTZ (Exists in Daedalus; needs to be decoupled)
- DNA (Exists in PlasCAD; needs to be decoupled)
- CIF structure formats (2fo-fc etc) (Exists in Daedalus; needs to be decoupled)
For Genbank, we recommend gb-io. For atom coordinate mmCIF and PDB, we recommend Pdbtbx. We do not plan to support these formats, due to the existence of these high-quality libraries.
Each module represents a file format, and most have dedicated structs dedicated to operating on that format.
It operates using structs with public fields, which you can explore
using the API docs, or your IDE. These structs generally include these three methods: new(),
save() and load(). new() accepts &str for text files, and a R: Read + Seek for binary. save() and load() accept &Path.
The Force Field formats instead use load_dat, save_frcmod instead, as they use the same structs for both formats.
Example use:
/// A single endpoint to save a number of file types
The Amber forcefield parameter format has fields which each contain a Vec of a certain type of data. (Linear bond parameters,
angle between 3 atoms, dihedral angles etc.) You may wish to parse these into a format that has faster lookups for your application.
This might mean using a HashMap with the atom_names fields set as keys, etc.
Note that the above examples expect that your application has a struct representing the molecule that has
From<Mol2>, and to_mol2(&self) (etc) methods. The details of these depend on the application. For example: