TACO (Trajectory and Compressed Observables) Format
TACO is a high-performance binary format for molecular dynamics (MD) trajectory data, designed for efficient storage and processing of large simulation trajectories.
Features
- Delta Encoding: Stores differences between consecutive frames to leverage temporal coherence
- Hybrid Compression: Configurable lossy (half-precision) or lossless compression for positions, velocities, and forces
- Random Access: Fast direct access to arbitrary frames without scanning the entire file
- Metadata Support: Rich metadata for both simulation parameters and atom properties
- Efficient Processing: Optimized algorithms for reading and writing of frames
- Multi-Language Support: Native APIs for Python, C, C++, and Fortran
- Cross-Platform: Works on Linux, macOS, and Windows
Performance
TACO provides significant space savings compared to traditional formats:
- Storage Efficiency: Typically 3-5x smaller than ASE trajectory files
- Fast Reading: Efficient batch loading of frames for analysis tasks
- Fast Writing: Streamlined frame processing and compression
- Memory Efficient: Processes frames in batches to minimize memory usage
Python Interface
TACO provides a Python interface for easy integration with analysis tools.
Installation
Basic Usage
# Create some example Atoms objects
=
# Writing
# Reading
=
Appending to Existing Trajectories
TACO supports efficiently appending frames to existing trajectory files without rewriting the entire file:
# Create initial trajectory
=
# Later, append more frames to the same file
=
# The file now contains 150 frames total
=
# Output: 150
Benefits of Append:
- Efficient: Only writes new frame data, doesn't rewrite existing frames
- Maintains Compression: Delta encoding chain is preserved across appends
- Preserves Metadata: All simulation and atom metadata remains intact
- Multiple Appends: Can append to the same file multiple times
- Random Access: Full random access to all frames after appending
Rust API:
use ;
use Array2;
// Create initial trajectory
let mut writer = create?;
// Write initial frames
for i in 0..100
writer.finish?;
// Later, append more frames
let mut writer = append?;
for i in 100..150
writer.finish?;
Advanced Usage
# Write with custom settings
# use lossy compression
# Read specific frames
=
# Read a range of frames
= # Reads frames 100-199
# Efficient writing for large trajectories
# Use moderate compression
Tensor Operations
TACO provides built-in tensor operations for common trajectory analyses:
# Calculate center of mass
=
=
=
# Extract subset of atoms
= # Atoms to extract
=
# Calculate RMSD between two coordinate sets
=
=
=
Utility Functions
The Python interface also includes utility functions for working with TACO files:
# Check if file is a TACO file
=
# Get file information
=
# Copy frames from one file to another
# Extract specific atoms
= # Atoms to extract
C, C++, and Fortran Interfaces
TACO provides native interfaces for C, C++, and Fortran, enabling integration with existing molecular dynamics codes and high-performance computing applications.
All interfaces are located in the c_api/ directory with a single, canonical implementation.
C API
The C API provides a low-level interface suitable for integration with C programs and as a foundation for other language bindings.
// Setup metadata
const char* names = ;
const char* types = ;
float masses = ;
taco_atom_metadata_t atom_metadata = ;
taco_simulation_metadata_t sim_metadata = ;
taco_compression_settings_t compression = ;
// Create writer
CTacoWriter* writer = ;
// Write frame
float positions = ;
taco_frame_t frame = ;
;
;
C++ API
The C++ API provides a modern interface with RAII, STL containers, and exception handling.
// Setup metadata
std::vector<float> masses = ;
std::vector<std::string> names = ;
taco::AtomMetadata ;
taco::SimulationMetadata sim_metadata;
sim_metadata.ensemble = "NVT";
sim_metadata.temperature = 300.0;
// Create writer (RAII - automatically closes)
taco::Writer ;
// Create and write frame
taco::Frame frame;
frame.positions = ;
writer.;
// Read all frames
taco::Reader ;
auto all_frames = reader.;
Fortran API
The Fortran API provides a modern Fortran 2008 interface with ISO C binding.
program taco_example
use iso_c_binding
use iso_fortran_env, only: real32, real64, int64
use taco_format
implicit none
type(c_ptr) :: writer, reader
type(taco_frame_t) :: frame
type(taco_compression_settings_t) :: compression
type(taco_atom_metadata_t) :: atom_meta
type(taco_simulation_metadata_t) :: sim_meta
real(real32), target :: masses(3) = [15.999, 1.008, 1.008]
real(real32), target :: positions(9) = [0.0, 0.0, 0.0, 0.1, 0.08, 0.0, 0.1, -0.08, 0.0]
character(len=8), target :: names(3) = ['O ', 'H ', 'H ']
character(len=8), target :: types(3) = ['O ', 'H ', 'H ']
type(c_ptr), target :: name_ptrs(3), type_ptrs(3)
integer :: error_code, i
! Setup pointers for strings
do i = 1, 3
name_ptrs(i) = c_loc(names(i))
type_ptrs(i) = c_loc(types(i))
end do
! Setup metadata
sim_meta%temperature = 300.0_c_double
sim_meta%pressure = 1.0_c_double
sim_meta%timestep_fs = 1.0_c_double
! Set other fields to null
atom_meta%masses = c_loc(masses)
atom_meta%names = c_loc(name_ptrs)
atom_meta%types = c_loc(type_ptrs)
atom_meta%num_atoms = 3
compression%precision = 0 ! lossless
compression%zstd_level = 3
! Create writer
writer = taco_writer_create('output.taco', 3, 0.001_real64, &
sim_meta, atom_meta, compression)
! Setup frame
frame%frame_number = 0
frame%time = 0.0_c_double
frame%positions = c_loc(positions)
frame%num_atoms = 3
! Set other fields...
! Write frame
error_code = taco_writer_write_frame(writer, frame)
error_code = taco_writer_finish(writer)
end program
Building with C/C++/Fortran Support
# Build the C API library
# Build and test C examples
# Build and run Fortran examples and tests
The C API is located in c_api/ with:
taco_format_c.h- C header filesrc/lib.rs- Rust implementation with C FFItest_c_api.c- C example/test
The Fortran API is located in c_api/fortran/ with:
taco_format.f90- Fortran interface moduleexamples/- Comprehensive Fortran examplestests/- Fortran unit and integration tests
See C/C++/Fortran API Documentation for complete details.
Usage in Rust
Writing Trajectories
use ;
use Array2;
// Create metadata
let sim_metadata = default;
let atom_metadata = default;
// Create a writer
let mut writer = create?;
// Write frames
let positions = zeros;
let frame_data = new;
let frame = new;
writer.write_frame?;
// Write multiple frames sequentially
let frames = vec!;
writer.write_frames?;
// Finish writing
writer.finish?;
Reading Trajectories
use Reader;
// Open a reader
let mut reader = open?;
// Get header information
println!;
println!;
// Read a specific frame
let frame = reader.read_frame?;
let positions = frame.data.positions.unwrap;
// Read a range of frames
let frames = reader.read_frame_range?; // Frames 100-199
// Read specific frames
let frame_indices = vec!;
let selected_frames = reader.read_frames?;
// Iterate through all frames
for frame_result in reader.iter_frames
Tensor Operations
use tensor;
use ;
// Calculate center of mass
let positions = zeros;
let masses = ones;
let com = center_of_mass;
// Extract subset of atoms
let atom_indices = vec!;
let subset = extract_subset;
// Calculate RMSD between two coordinate sets
let coords1 = zeros;
let coords2 = zeros;
let rmsd = calc_rmsd?;
File Structure
[Header]
- Format version
- Simulation parameters (time step, temperature, etc.)
- Atom metadata (masses, names, etc.)
- Compression settings
[Frame Index Table]
- Byte offsets to each frame for random access
[Data Blocks]
- Full and delta frames:
- Position tensors (Nx3)
- Velocity tensors (Nx3)
- Force tensors (Nx3)
- Box dimensions & energies
Building from Source
For Python bindings:
License
MIT