# TACO (Trajectory and Compressed Observables) Format
TACO is a high-performance binary format for molecular dynamics (MD) trajectory data, designed for efficient storage and processing of large simulation trajectories.
## Features
- **Delta Encoding**: Stores differences between consecutive frames to leverage temporal coherence
- **Hybrid Compression**: Configurable lossy (half-precision) or lossless compression for positions, velocities, and forces
- **Random Access**: Fast direct access to arbitrary frames without scanning the entire file
- **Metadata Support**: Rich metadata for both simulation parameters and atom properties
- **Efficient Processing**: Optimized algorithms for reading and writing of frames
- **Multi-Language Support**: Native APIs for Python, C, C++, and Fortran
- **Cross-Platform**: Works on Linux, macOS, and Windows
## Performance
TACO provides significant space savings compared to traditional formats:
- **Storage Efficiency**: Typically 3-5x smaller than ASE trajectory files
- **Fast Reading**: Efficient batch loading of frames for analysis tasks
- **Fast Writing**: Streamlined frame processing and compression
- **Memory Efficient**: Processes frames in batches to minimize memory usage
## Python Interface
TACO provides a Python interface for easy integration with analysis tools.
### Installation
```bash
pip install taco-format
```
### Basic Usage
```python
import taco_format
import numpy as np
from ase import Atoms
# Create some example Atoms objects
atoms_list = [Atoms('H2O') for _ in range(100)]
# Writing
taco_format.write('traj.taco', atoms_list)
# Reading
atoms_list = taco_format.read('traj.taco')
```
### Appending to Existing Trajectories
TACO supports efficiently appending frames to existing trajectory files without rewriting the entire file:
```python
import taco_format
from ase import Atoms
# Create initial trajectory
initial_frames = [Atoms('H2O') for _ in range(100)]
taco_format.write('traj.taco', initial_frames)
# Later, append more frames to the same file
additional_frames = [Atoms('H2O') for _ in range(50)]
taco_format.append('traj.taco', additional_frames)
# The file now contains 150 frames total
all_frames = taco_format.read('traj.taco')
print(len(all_frames)) # Output: 150
```
**Benefits of Append:**
- **Efficient**: Only writes new frame data, doesn't rewrite existing frames
- **Maintains Compression**: Delta encoding chain is preserved across appends
- **Preserves Metadata**: All simulation and atom metadata remains intact
- **Multiple Appends**: Can append to the same file multiple times
- **Random Access**: Full random access to all frames after appending
**Rust API:**
```rust
use taco_format::{Writer, Frame, FrameData};
use ndarray::Array2;
// Create initial trajectory
let mut writer = Writer::create(
"trajectory.taco",
num_atoms,
time_step,
sim_metadata,
atom_metadata,
compression_settings,
)?;
// Write initial frames
for i in 0..100 {
let frame = create_frame(i); // Your frame creation logic
writer.write_frame(frame)?;
}
writer.finish()?;
// Later, append more frames
let mut writer = Writer::append("trajectory.taco")?;
for i in 100..150 {
let frame = create_frame(i);
writer.write_frame(frame)?;
}
writer.finish()?;
```
### Advanced Usage
```python
# Write with custom settings
taco_format.write('traj.taco', atoms_list,
time_step=0.002, # in picoseconds
full_frame_interval=50, # store full frame every 50 frames
compression_level=5, # zstd compression level (1-22)
lossless=False) # use lossy compression
# Read specific frames
frames = taco_format.read('traj.taco',
frame_indices=[0, 10, 20, 30, 40])
# Read a range of frames
frames = taco_format.read('traj.taco',
start_frame=100,
end_frame=200) # Reads frames 100-199
# Efficient writing for large trajectories
taco_format.write('big_traj.taco', big_atoms_list,
compression_level=3) # Use moderate compression
```
### Tensor Operations
TACO provides built-in tensor operations for common trajectory analyses:
```python
import taco_format
import numpy as np
# Calculate center of mass
positions = np.array([[0.0, 0.0, 0.0], [1.0, 0.0, 0.0], [0.0, 1.0, 0.0]], dtype=np.float32)
masses = np.array([[1.0], [12.0], [16.0]], dtype=np.float32)
com = taco_format.center_of_mass(positions, masses)
# Extract subset of atoms
indices = [0, 2, 4] # Atoms to extract
subset = taco_format.extract_subset(positions, indices)
# Calculate RMSD between two coordinate sets
coords1 = np.array([[0.0, 0.0, 0.0], [1.0, 0.0, 0.0]], dtype=np.float32)
coords2 = np.array([[0.1, 0.1, 0.1], [1.1, 0.1, 0.1]], dtype=np.float32)
rmsd = taco_format.calc_rmsd(coords1, coords2)
```
### Utility Functions
The Python interface also includes utility functions for working with TACO files:
```python
# Check if file is a TACO file
is_taco = taco_format.is_taco_file("trajectory.taco")
# Get file information
info = taco_format.get_file_info("trajectory.taco")
print(info)
# Copy frames from one file to another
taco_format.copy_frames("source.taco", "subset.taco",
start_frame=10, num_frames=50)
# Extract specific atoms
atom_indices = [0, 1, 2, 10, 15, 20] # Atoms to extract
taco_format.extract_atoms("full.taco", "subset.taco", atom_indices)
```
## C, C++, and Fortran Interfaces
TACO provides native interfaces for C, C++, and Fortran, enabling integration with existing molecular dynamics codes and high-performance computing applications.
All interfaces are located in the `c_api/` directory with a single, canonical implementation.
### C API
The C API provides a low-level interface suitable for integration with C programs and as a foundation for other language bindings.
```c
#include "taco_format_c.h"
// Setup metadata
const char* names[] = {"O", "H", "H"};
const char* types[] = {"O", "H", "H"};
float masses[] = {15.999, 1.008, 1.008};
taco_atom_metadata_t atom_metadata = {
.masses = masses, .names = names, .types = types, .num_atoms = 3
};
taco_simulation_metadata_t sim_metadata = {
.name = NULL, .description = NULL, .ensemble = NULL,
.temperature = 300.0, .pressure = 1.0, .software = NULL, .timestep_fs = 1.0
};
taco_compression_settings_t compression = {.precision = 0, .zstd_level = 3};
// Create writer
CTacoWriter* writer = taco_writer_create("output.taco", 3, 0.001,
&sim_metadata, &atom_metadata,
compression);
// Write frame
float positions[] = {0.0, 0.0, 0.0, 0.1, 0.08, 0.0, 0.1, -0.08, 0.0};
taco_frame_t frame = {
.frame_number = 0, .time = 0.0, .positions = positions,
.num_atoms = 3, /* other fields... */
};
taco_writer_write_frame(writer, &frame);
taco_writer_finish(writer);
```
### C++ API
The C++ API provides a modern interface with RAII, STL containers, and exception handling.
```cpp
#include "taco_format.hpp"
// Setup metadata
std::vector<float> masses = {15.999f, 1.008f, 1.008f};
std::vector<std::string> names = {"O", "H", "H"};
taco::AtomMetadata atom_metadata(masses, names);
taco::SimulationMetadata sim_metadata;
sim_metadata.ensemble = "NVT";
sim_metadata.temperature = 300.0;
// Create writer (RAII - automatically closes)
taco::Writer writer("output.taco", 3, 0.001, sim_metadata, atom_metadata);
// Create and write frame
taco::Frame frame;
frame.positions = {{0.0f, 0.0f, 0.0f}, {0.1f, 0.08f, 0.0f}, {0.1f, -0.08f, 0.0f}};
writer.write_frame(frame);
// Read all frames
taco::Reader reader("output.taco");
auto all_frames = reader.read_all_frames();
```
### Fortran API
The Fortran API provides a modern Fortran 2008 interface with ISO C binding.
```fortran
program taco_example
use iso_c_binding
use iso_fortran_env, only: real32, real64, int64
use taco_format
implicit none
type(c_ptr) :: writer, reader
type(taco_frame_t) :: frame
type(taco_compression_settings_t) :: compression
type(taco_atom_metadata_t) :: atom_meta
type(taco_simulation_metadata_t) :: sim_meta
real(real32), target :: masses(3) = [15.999, 1.008, 1.008]
real(real32), target :: positions(9) = [0.0, 0.0, 0.0, 0.1, 0.08, 0.0, 0.1, -0.08, 0.0]
character(len=8), target :: names(3) = ['O ', 'H ', 'H ']
character(len=8), target :: types(3) = ['O ', 'H ', 'H ']
type(c_ptr), target :: name_ptrs(3), type_ptrs(3)
integer :: error_code, i
! Setup pointers for strings
do i = 1, 3
name_ptrs(i) = c_loc(names(i))
type_ptrs(i) = c_loc(types(i))
end do
! Setup metadata
sim_meta%temperature = 300.0_c_double
sim_meta%pressure = 1.0_c_double
sim_meta%timestep_fs = 1.0_c_double
! Set other fields to null
atom_meta%masses = c_loc(masses)
atom_meta%names = c_loc(name_ptrs)
atom_meta%types = c_loc(type_ptrs)
atom_meta%num_atoms = 3
compression%precision = 0 ! lossless
compression%zstd_level = 3
! Create writer
writer = taco_writer_create('output.taco', 3, 0.001_real64, &
sim_meta, atom_meta, compression)
! Setup frame
frame%frame_number = 0
frame%time = 0.0_c_double
frame%positions = c_loc(positions)
frame%num_atoms = 3
! Set other fields...
! Write frame
error_code = taco_writer_write_frame(writer, frame)
error_code = taco_writer_finish(writer)
end program
```
### Building with C/C++/Fortran Support
```bash
# Build the C API library
cd c_api
cargo build --release
# Build and test C examples
make test_c_api_static
./test_c_api_static
# Build and run Fortran examples and tests
cd fortran
make all # Build interface, examples, and tests
make run-examples # Run all examples
make run-tests # Run all tests
```
The C API is located in `c_api/` with:
- `taco_format_c.h` - C header file
- `src/lib.rs` - Rust implementation with C FFI
- `test_c_api.c` - C example/test
The Fortran API is located in `c_api/fortran/` with:
- `taco_format.f90` - Fortran interface module
- `examples/` - Comprehensive Fortran examples
- `tests/` - Fortran unit and integration tests
See [C/C++/Fortran API Documentation](docs/c_cpp_fortran_api.md) for complete details.
## Usage in Rust
### Writing Trajectories
```rust
use taco_format::{Writer, Frame, FrameData, SimulationMetadata, AtomMetadata, CompressionSettings};
use ndarray::Array2;
// Create metadata
let sim_metadata = SimulationMetadata::default();
let atom_metadata = AtomMetadata::default();
// Create a writer
let mut writer = Writer::create(
"trajectory.taco", // File path
1000, // Number of atoms
0.001, // Time step (ps)
sim_metadata,
atom_metadata,
CompressionSettings::default(),
)?;
// Write frames
let positions = Array2::<f32>::zeros((1000, 3));
let frame_data = FrameData::new(positions);
let frame = Frame::new(0, 0.0, frame_data);
writer.write_frame(frame)?;
// Write multiple frames sequentially
let frames = vec![frame1, frame2, frame3, ...];
writer.write_frames(frames)?;
// Finish writing
writer.finish()?;
```
### Reading Trajectories
```rust
use taco_format::Reader;
// Open a reader
let mut reader = Reader::open("trajectory.taco")?;
// Get header information
println!("Num atoms: {}", reader.num_atoms());
println!("Num frames: {}", reader.num_frames());
// Read a specific frame
let frame = reader.read_frame(42)?;
let positions = frame.data.positions.unwrap();
// Read a range of frames
let frames = reader.read_frame_range(100, 200)?; // Frames 100-199
// Read specific frames
let frame_indices = vec![10, 20, 30, 40, 50];
let selected_frames = reader.read_frames(&frame_indices)?;
// Iterate through all frames
for frame_result in reader.iter_frames() {
let frame = frame_result?;
// Process frame...
}
```
### Tensor Operations
```rust
use taco_format::tensor;
use ndarray::{Array1, Array2};
// Calculate center of mass
let positions = Array2::<f32>::zeros((100, 3));
let masses = Array1::<f32>::ones(100);
let com = tensor::center_of_mass(&positions, &Some(masses));
// Extract subset of atoms
let atom_indices = vec![0, 1, 5, 10];
let subset = tensor::extract_subset(&positions, &atom_indices);
// Calculate RMSD between two coordinate sets
let coords1 = Array2::<f32>::zeros((100, 3));
let coords2 = Array2::<f32>::zeros((100, 3));
let rmsd = tensor::calc_rmsd(&coords1, &coords2)?;
```
## File Structure
```
[Header]
- Format version
- Simulation parameters (time step, temperature, etc.)
- Atom metadata (masses, names, etc.)
- Compression settings
[Frame Index Table]
- Byte offsets to each frame for random access
[Data Blocks]
- Full and delta frames:
- Position tensors (Nx3)
- Velocity tensors (Nx3)
- Force tensors (Nx3)
- Box dimensions & energies
```
## Building from Source
```bash
git clone https://github.com/username/taco-format.git
cd taco-format
cargo build --release
```
For Python bindings:
```bash
pip install maturin
maturin develop --release
```
## License
MIT