molar 1.4.0

Molar is a rust library for analysis of MD trajectories and molecular modeling
Documentation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
# **MolAR** - a **Mol**ecular **A**nalysis and modeling library for **R**ust.
 
[![Crates.io](https://img.shields.io/crates/v/molar.svg)](https://crates.io/crates/molar)
[![Documentation](https://docs.rs/molar/badge.svg)](https://docs.rs/molar)
[![Python API Docs](https://img.shields.io/badge/python--api-docs-blue)](https://yesint.github.io/molar/)
[![Rust Version](https://img.shields.io/badge/rust-1.83+-blue.svg)](https://www.rust-lang.org/)
[![Crates.io Downloads](https://img.shields.io/crates/d/molar.svg)](https://crates.io/crates/molar)

# Table of contents
- [What is MolAR?](#what-is-molar)
- [Features](#features)
- [Current status](#current-status)
- [Design and performance](#design-and-performance)
- [Installation](#installation)
- [Selection syntax](#selection-syntax)
- [Tutorial](#tutorial)
- [Doing things in parallel](#parallel-splits)
- [Analysis tasks](#analysis-tasks)
- [Python bindings](#python-bindings)

# What is molar?
MolAR is a library for molecular modeling and analysis written in Rust with an emphasis on memory safety and performance. Python bindings are also provided.

Molar is designed to simplify the analysis of molecular dynamics trajectories and to implement new analysis algorithms. Molar is intended to provide facilities, which are routinely used in all molecular analysis programs, namely input/output of popular file formats, powerful and flexible atom selections, geometry transformations, RMSD fitting and alignment, etc.

MolAR is a logical successor of [Pteros](https://github.com/yesint/pteros) molecular modeling library, which is written in C++ and become hard to develop and maintain due to all C++ idiosyncrasies.

# Features
* Reading and writing PDB, GRO, XYZ, XTC, TRR, TPR, CPT, and AMBER NetCDF (.nc) files
    * Reading and writing trajectories with random access.
    * Reading Gromacs TPR and CPT (checkpoint) files (if Gromacs is installed).
* Selections using the syntax similar to VMD and Pteros
    * Memory-safe selections for serial and parallel analysis tasks.
    * Powerful subselections and selection splitting.
* SASA calculations with the fastest PowerSasa method
* RMSD fitting and alignment
* Fast distance search
* Basic algorithm (center of mass, center of geometry, etc.)
* Seamless PBC treatment
* Trajectory processing with easy to write analysis tasks
* Python bindings

# Design and Performance
Initial design is described in the [MolAR paper](https://onlinelibrary.wiley.com/doi/10.1002/jcc.27536). However, this concept appeared to be too complex in practice and didn't cover all the real world scenarios properly. Starting from version 1.0 it was changed completely. Now selections are just indices of selected atoms, which have to be "bound" to [System] (that is an actual container for atoms and coordinates) to do useful work.
The API becomes more noisy but you get proper compile-time borrow checking, soundness and all Rust memory safety guarantees. The "binding" of selections is very cheap (one integer comparison), so it should not affect the performance reported in the [paper](https://onlinelibrary.wiley.com/doi/10.1002/jcc.27536).

# Current status
Molar is usable in real-life projects and is close to be feature complete. Documentation is still work in progress.

# Installation
Molar requires Rust 1.83 or above and a C/C++ compiler for compiling third-party libraries. Any sufficiently modern gcc or clang compiler should work.

To add MolAR to your Rust project just use `cargo add molar`.

For installation of the Python bindings [look here](#python-bindings).

## Gromacs TPR and CPT support

TPR and CPT (checkpoint) reading is implemented as a **runtime plugin** (`libmolar_gromacs_plugin`). MolAR itself has no compile-time Gromacs dependency — the plugin is loaded dynamically when a `.tpr` or `.cpt` file is opened. If the plugin is not found, attempts to open these files return an error.

Opening a `.cpt` file returns a `State` containing the coordinates and periodic box at the checkpointed step. It can be used to, e.g., continue analysis from a specific checkpoint or to patch coordinates from a restart into an existing system.

MolAR searches for plugin in this order:
1. `MOLAR_GROMACS_PLUGIN` environment variable (runtime override).
2. Path baked in at compile time when Gromacs env vars were set during build.
3. System library search path.

### Building the plugin

The plugin must be compiled against a _local_ Gromacs installation. Modern Gromacs does not expose all required data structures in its public API, so you need access to both the Gromacs **source tree** and its **build directory**.

1. Compile Gromacs from source (see the [Gromacs installation guide](https://manual.gromacs.org/current/install-guide/index.html)).

2. In the root of the MolAR workspace (or your project using `molar`), create `.cargo/config.toml`:
```toml
[env]
# Path to the Gromacs source tree
GROMACS_SOURCE_DIR = "/path/to/gromacs_src"
# Path to the Gromacs build directory (for generated headers)
GROMACS_BUILD_DIR  = "/path/to/gromacs_src/build"
# Directory containing libgromacs.so
GROMACS_LIB_DIR    = "/installed/gromacs/lib64"
```
A template is provided: `cp config.toml.template .cargo/config.toml`.

3. Build the plugin:
```shell
cargo build -p molar_gromacs -r
```
The plugin is written to `target/release/build/molar_gromacs-*/out/libmolar_gromacs_plugin.so`. This path is baked into the molar library automatically, so no further configuration is needed if you used molar at the same machine.

### Using the plugin at runtime

If you want to use a plugin built separately, set the `MOLAR_GROMACS_PLUGIN` environment variable:

```shell
export MOLAR_GROMACS_PLUGIN=/path/to/libmolar_gromacs_plugin
```

This is especially useful when using MolAR [Python bindings](#python-bindings) - you can install pre-compiled bindings from PyPI and then compile the Gromacs plugin on your machine.

## AMBER NetCDF trajectories

MolAR can read and write AMBER NetCDF (`.nc` / `.ncdf`) trajectory files. This requires the NetCDF and HDF5 C libraries to be installed on your system.

**Linux (Fedora/RHEL):**
```sh
sudo dnf install hdf5-devel netcdf-devel
```

**Linux (Ubuntu/Debian):**
```sh
sudo apt install libhdf5-dev libnetcdf-dev
```

**macOS:**
```sh
brew install netcdf
```

Once the libraries are installed, enable the `netcdf` feature when adding molar to your project:

```toml
# Cargo.toml
[dependencies]
molar = { version = "1", features = ["netcdf"] }
```

Or from the command line:
```sh
cargo build --features molar/netcdf
```

# Selection syntax

MolAR uses a powerful and flexible selection language similar to VMD and Pteros. Selections are composed of logical expressions that combine keywords, comparisons, and geometric criteria to identify specific atoms.

## Keywords

Keywords select atoms by their properties:

| Keyword | Aliases | Description | Example |
|---------|---------|-------------|---------|
| `index` | — | Atom index (0-based) | `index 0 10 20:30` |
| `resid` | — | Residue ID | `resid 1 5:10` |
| `resindex` | — | Residue index (0-based) | `resindex 0:5` |
| `resname` | — | Residue name | `resname ALA GLY` |
| `name` | — | Atom name | `name CA CB` |
| `chain` | — | Chain ID | `chain A B C` |

### Keyword syntax

Integer keywords (`index`, `resid`, `resindex`) accept ranges and single values:
```ignore
resid 5                    # Single residue
resid 1:10                 # Range (inclusive)
resid 1 5 10:20            # Multiple selections, implicit OR
```

String keywords (`name`, `resname`, `chain`) accept multiple values and regex patterns:
```ignore
resname ALA GLY            # Multiple residues
name CA CB CG              # Multiple atom names
name /C.+/               # Regex pattern: C followed by at least one symbol
chain A /[AB]/             # Chain A or chains matching [AB]
```
```shell
## Molecular groups

Pre-defined selections for common molecular groups:

| Keyword | Description |
|---------|-------------|
| `protein` | All protein atoms |
| `backbone` | Protein backbone atoms (N, CA, C, O) |
| `sidechain` | Protein sidechain atoms |
| `water` | Water molecules |
| `hydrogen` | All hydrogen atoms |
| `not_water` or `now` | All non-water atoms |
| `not_hydrogen` or `noh` | All non-hydrogen atoms |

Examples:
```ignore
protein                    # All protein atoms
backbone and chain A       # Backbone of chain A
water or hydrogen          # Water or hydrogens
not hydrogen               # Everything except hydrogens
```

## Comparisons

Compare numeric properties using mathematical expressions:

```ignore
x < 5.0                    # X coordinate less than 5
y >= 10.0 and y <= 20.0    # Y between 10 and 20
z 15.0:25.0                # Shorthand for chained comparison
mass > 12.0                # Atomic mass greater than 12
charge != 0.0              # Non-neutral atoms
vdw < 2.0                  # Van der Waals radius less than 2
occupancy > 0              # non-zero occupancy
beta > 0.5                 # B-factors above 0.5
```

### Comparison operators

- `==` — Equal
- `!=` — Not equal
- `<` — Less than
- `<=` — Less than or equal
- `>` — Greater than
- `>=` — Greater than or equal

### Chained comparisons

Compare between two values:
```ignore
10 < resid < 20            # Residues between 10 and 20
0 <= x <= 5.0              # X coordinate between 0 and 5
15.0 < y < 25.0            # Y strictly between 15 and 25
```

### Mathematical expressions

Use math operations and functions in comparisons:

```ignore
sqrt(x^2 + y^2) < 10.0     # Distance from origin in XY plane less than 10
abs(z) > 5.0               # Absolute Z coordinate
sin(x) * 2.0 == 1.0        # Trigonometric operations
```

Supported functions: `abs`, `sqrt`, `sin`, `cos`

## Geometric expressions

### Distance-based selections

#### Distance to point
```ignore
dist point [1.0, 2.0, 3.0] < 1.0        # Within 5 Å of a given point
dist pbc point [1.0, 2.0, 3.0] > 2.0    # With periodic boundary conditions
```

#### Distance to line
```ignore
dist line [1,2,3] [4,5,6] < 3.0            # Two points defining the line
dist line [1,2,3] dir [1,0,0] >= 1.3       # Point and direction
```

#### Distance to plane
```ignore
dist plane [0,0,0] [1,0,0] [0,1,0] < 1.0   # Three-point definition
dist plane [0,0,0] normal [0,0,1]  < 1.0   # Point and normal
```

### Center of mass/geometry

Calculate geometric properties of selections:

```ignore
within 5.0 of com of protein           # Within 5 Å of protein COM
within 3.0 pbc of cog of chain A       # Within 3 Å of chain A COG
within 2.0 of com pbc yyn of water     # PBC only in X and Y
within 2.0 of com pbc 110 of water     # PBC only in X and Y
```  
 
### PBC options

Periodic boundary condition modes for geometric expressions:

- `pbc` — Apply PBC in all dimensions
- `pbc 1 0 1` or `pbc yny` — Apply PBC selectively (X, Y, Z), spaces are optional
- `nopbc` — No periodic boundary conditions (default)

## Logical operators

Combine selections using logical operations:

| Operator | Description | Example |
|----------|-------------|---------|
| `and` | Both conditions true | `name CA and chain A` |
| `or` | Either condition true | `resname ALA or resname GLY` |
| `not` | Negate condition | `not water` |

### Operator precedence (highest to lowest)

1. `not` — Negation
2. `same ... as` — Same residue/chain
3. `within ... of` — Distance selections
4. `and` — Logical AND
5. `or` — Logical OR

Parentheses can be used to override precedence:
```ignore
(resname ALA or resname GLY) and backbone    # Parentheses for clarity
name CA or (name CB and chain A)             # Different grouping
```

### Same residue/chain

Select atoms in the same residue or chain as matched atoms:

```ignore
same residue as name CB    # All atoms in residues containing CB atoms
same chain as resid 5      # All atoms in chains containing residue 5
```

## Complete examples

```ignore
// Simple selections
"name CA"                              // Alpha carbons
"resname ALA"                          // Alanine residues
"chain A"                              // Chain A

// Combined with logic
"protein and backbone"                 // Protein backbone
"not water and not hydrogen"           // Non-water, non-hydrogen
"(resname ALA or resname GLY) and backbone"

// Numeric comparisons
"x < 0 and y < 0"                     // Negative X and Y
"10 < resid < 20"                     // Residues 10-20

// Geometric selections
"within 5.0 of [0, 0, 0]"                     // Within 5 Å of origin
"within 3.0 pbc of com of protein"            // Near protein center of mass with pbc
"same residue as within 2.0 of com of water"  // Residues near water

// Complex combined
"backbone and chain A and resid 1:50"
"(protein or water) and within 10.0 of com of protein"
```

# Tutorial
We will write an example program that reads a file of some molecular system containing TIP3P water molecules, convert all water to TIP4P and saves this as a new file. TIP3P water has 3 particles (oxygen and two hydrogens), while TIP4P [has 4]((http://www.sklogwiki.org/SklogWiki/index.php/TIP4P/2005_model_of_water)) (oxygen, two hydrogens and a dummy particle). Our goal is to add these dummy particles to each water molecule.

## Preparing the stage
First, let's create a new Rust project called `tip3to4`: 
```shell
cargo new tip3to4
```

Then, let's add dependencies: the MolAR itself and anyhow crate for easy error reporting:
```shell
cargo add molar anyhow
```

In `src/main.rs` add needed boilerplate:
```rust,ignore
// For processing command line arguments
use std::env;
// Import all basic things from molar
use molar::prelude::*;
// For error handling
use anyhow::Result;

fn main() -> Result<()> {
    // Get the command line arguments
    let args: Vec<String> = env::args().collect();

    // Here our program is going to be written

    // Report successful completion of the program
    Ok(())
}
```

Now we can start writing our program.

## Reading an input file
The simples way of loading the molecular system in MolAR is to use a `Source` - an object that holds a `Topology` and `State` of the system and is used to create atom selections for manipulating this data:

```rust,ignore
// Load the source file from the first command line argument
let src = System::from_file(&args[0])?;
```

The file type (PDB, GRO, etc) is automatically recognized by its extention.
If reading the file fails for whatever reason the `?` operator will return an error, which will be nicely printed by `anyhow` crate.

## Making selections
Now we need to select all waters that are going to be converted to TIP4. We also need to select all non-water part of the system to keep it as is.

In MolAR selection (`Sel`) is just a list of atom indexes, which has to be "bound" to the `System` to do useful work using `system.bind(&sel)` for read-only acces or 
`system.bind_mut(&sel)` for read-write access. There are also convenience methods that produce bound selection right away. This is useful if you know that you won't re-bind the same `System` for different type of access often.

```rust,ignore
let water = sys.select_bound("resname TIP3")?;
let non_water = sys.select_bound("not resname TIP3")?;
```

Selections are created with the syntax that is very similar to one used in VMD Pteros and Gromacs. Here we select water and non-water by residue name.

In MolAR empty selections are not permitted, so if no atoms are selected (or if anything else goes wrong) the error will be reported.

## Looping over indiviudual water molecules
We selected all water molecules as a single selection but we need to loop over individual water molecules to add an additional dummy particle to each of them. In order to do this we are splitting a selection to fragments by the residue index:

```rust,ignore
// Go over water molecules one by one                   
for mol in water.split_resindex_bound() {
    // Do something with mol
}
```

The method `split_resindex_bound()` returns a Rust iterator, which produces contigous bound selections containig distinct residue index each. There are many other ways of splitting selections into parts using arbitrary logic in MolAR, but this simplest one is what we need now. 

## Working with coordinates
Now we need to get the coordinates of atoms for current water molecules and compute a position of the dummy atom.

```rust,ignore
// Go over water molecules one by one                   
for mol in water.split_resindex_bound() {
    // TIP3 is arranged as O->H->H
    // so atom 0 is O, atoms 1 and 2 are H
    // Get cooridnates
    let o_pos = mol.get_pos(0).unwrap();
    let h1_pos = mol.get_pos(1).unwrap();
    let h2_pos = mol.get_pos(2).unwrap();
    // Get center of masses of H
    let hc = 0.5*(h1_pos.coords + h2_pos.coords);
    // Unit vector from o to hc
    let v = (hc-o_pos.coords).normalize();	
    // Position of the M dummy particle in TIP4
    let m_pos = o_pos + v*0.01546;
    // Dummy atom M
    let m_at = Atom::from(mol.first_atom())
        .with_name("M")
        .with_resname("TIP4");
    println!("{:?} {:?}",m_at,m_pos);
}
```

First, we are getting the coordinates of oxigen and two hydrogens. `get_pos(n)` returns the position of n-th atom in selection. Since n may potentially be out of range, it returns an `Option<&Pos>`. We are sure that there are just 3 atoms in water molecule, so we just unwrapping an option.

Then we are computing the position of the dummy atom, which is on the bisection of H-O-H angle at the distance of 0.01546 from the oxygen.

Finally, we are constructing a new `Atom` from the first atom of our water molecule using `Atom::from()`, and then overriding the name to 'M' and the residue name to 'TIP4' using fluent setters. All other fields (resid, resindex, etc) are copied from the original atom.

We are printing our new dummy atom and its position just to be sure that everything works as intended.

## Constructing output system
All this is fine, but we still have no system to write our converted water molecules to. Let's fix this and modify the beginning of our main funtion like this:

```rust,ignore
// Load the source file from the first command line argument
let sys = System::from_file(&args[0])?;

// Make empty output system
let mut out = System::default();

```

The first thing that we add to out new empty system is all non-water atoms:
```rust,ignore
// Add non-water atoms to the output
out.append(&non_water);
```

Now, at the end of our loop over water molecules, we can add new dummy atoms properly to the new system:
```rust,ignore
// Add new converted water molecule
let added = out.append_atoms(
    mol.iter_atoms().chain(std::iter::once(&m_at)),
    mol.iter_pos().chain(std::iter::once(&m_pos)),
)?;
```

Here `append_atoms()` accepts two iterators: the first yielding atoms and the second yielding their corresponding coordinates. We chain the existing water molecule atoms/positions with the newly constructed dummy atom using `std::iter::once()`. The method returns a selection of all added atoms.

We also need to change the resname of the added atoms from TIP3 to TIP4. Since `append_atoms()` returned a selection of all added atoms, we can bind it mutably and set the new residue name:

```rust,ignore
// Change resname for added atoms
// Note the use of bind_mut()!
out.bind_mut(&added).set_same_resname("TIP4");
```

## Writing the output file
Out output system is now fully constructed but it still lacks an important element - the periodic box description. Most molecular systems originating from MD are periodic and the information about the periodic box has to be copied to our newly constructed system:

```rust,ignore
// Transfer the box from original file
out.set_box_from(&src);
```

Here we provide a reference to the input system, so the box is cloned from it to the output system.

Finally we are ready to write the output file:

```rust,ignore
// Write out new system
out.save(&args[1])?;
```

Again, file format will be determined by extension. The file name is provided by the second command line argument.

## The final result
The complete program looks like this:
```rust,no_run
// For processing command line arguments
use std::env;
// Import all baic things from molar
use molar::prelude::*;
// For error handling
use anyhow::Result;

fn main() -> Result<()> {
    // Get the command line arguments
    let args: Vec<String> = env::args().collect();

    // Load the source file from the first command line argument
    let sys = System::from_file(&args[0])?;

    // Make empty output system
    let mut out = System::default();

    // Select water and non-water. 
    // Normally selections are just indexes of atoms, which have to be
    // "bound" to system to do useful work using 
    // `sys.bind(&sel)` for read-only acces or 
    // `sys.bind_mut(&sel)` for read-write access. 
    // In this particular case we use `select_bound()` to get 
    // bound selections in one shot. This works here because we know in advance
    // that `sys` is used read-only and we'll never re-bind it for mut access.
    let water = sys.select_bound("resname TIP3")?;
    let non_water = sys.select_bound("not resname TIP3")?;

    // Add non-water atoms to the output
    out.append(&non_water);

    // Go over water molecules one by one                   
    for mol in water.split_resindex_bound() {
        // TIP3 is arranged as O->H->H
        // so atom 0 is O, atoms 1 and 2 are H
	    // Get cooridnates
        let o_pos = mol.get_pos(0).unwrap();
        let h1_pos = mol.get_pos(1).unwrap();
        let h2_pos = mol.get_pos(2).unwrap();
	    // Get center of masses of H
	    let hc = 0.5*(h1_pos.coords + h2_pos.coords);
	    // Unit vector from o to hc
	    let v = (hc-o_pos.coords).normalize();	
	    // Position of the M dummy particle in TIP4
	    let m_pos = o_pos + v*0.01546;
        // Dummy atom M
        let m_at = Atom::from(mol.first_atom())
            .with_name("M");

        // Add new converted water molecule
        let added = out.append_atoms(
            mol.iter_atoms().chain(std::iter::once(&m_at)),
            mol.iter_pos().chain(std::iter::once(&m_pos)),
        )?;

        // Change resname for added atoms
        // Note the use of bind_mut()!
        out.bind_mut(&added).set_same_resname("TIP4");
    }

    // Transfer the box
    out.set_box_from(&sys);

    // Write out new system
    out.save(&args[1])?;

    // Report successful completion of the program
    Ok(())
}
```

# Doing things in parallel

## Iterating in parallel within selection

You can iterate over coordinates, atoms and particles within the selection
in parallel using "par" versions of the corresponding iterators:

```rust,no_run
// Import all basic things from molar
use molar::prelude::*;
// For error handling
use anyhow::Result;

fn main() -> Result<()> {
    // Load molecular system
    let mut sys = System::from_file("tests/protein.pdb")?;
    
    // Create immutable selection
    let sel = sys.select_bound("name CA")?;

    // Parallel iteration over positions
    // Calculate sum of X coordinates of all CA atoms in parallel
    let sum_x: f32 = sel.par_iter_pos()
        .map(|pos| pos.x)
        .sum();
    println!("Sum of X coordinates: {}", sum_x);

    // Parallel iteration over atoms  
    // Count ALA residues in parallel
    let ala_count = sel.par_iter_atoms()
        .filter(|atom| atom.resname == "ALA")
        .count();
    println!("CA in ALA: {}", ala_count);

    // Parallel iteration over particles
    // Combine atom name and position information in parallel
    let particle_data: Vec<String> = sel.par_iter_particle()
        .map(|p| format!("{}-{}", p.atom.name, p.pos.x))
        .collect();
    println!("Collected {} particles", particle_data.len());

    // Mutable parallel iteration
    // Translate all positions in parallel (requires mutable selection)
    let mut sel_mut = sys.select_bound_mut("name CA")?;

    sel_mut.par_iter_pos_mut().for_each(|pos| {
        pos.x += 1.0;
        pos.y += 2.0;
        pos.z += 3.0;
    });

    println!("Translated all CA atoms");

    Ok(())
}
```

## Parallel splits

You may perform operations on __non-overlapping__ selections in parallel, which typically gives a huge speed up in such tasks as removing periodicity for individual molecules, which are rather heavy computationally.

Here is how you can do this:

```rust,no_run
// Import all baic things from molar
use molar::prelude::*;
// For error handling
use anyhow::Result;

fn main() -> Result<()> {
    // Load lipid membrane
    let mut sys = System::from_file("tests/membr.gro")?;

    // Make a parallel split: distinct selection for each POPG lipid.
    let par = sys.split_par(|p| {
        if p.atom.resname == "POPG" {
            // Whenever distinct result is returned form this closure
            // new selection is created, so each distinct POPG residue
            // becomes a separate selection.
            Some(p.atom.resindex)
        } else {
            // All other atoms are ignored
            None
        }
    })?;

    // Get rayon parallel iterator over selections in split
    sys.iter_par_split_mut(&par)
        // Run unwrap on each selection in parallel
        .try_for_each(|mut sel| sel.unwrap_simple())?; 
    Ok(())
}
```

# Analysis tasks

## Motivation

The vast majority of molecular dynamics trajectory analysis tasks follows the same pattern:
1. Do initialization and pre-processing (read and process the parameters, allocate and initialize data structures, create atom selections, etc.)
2. On each trajectory frame call an analysis function, which computes needed properties.
3. Do post-processing (compute averages, write results to file).

Reading trajectories themselves also requires some standard options:
1. Ability to start from given frame or time stamp.
2. Ability to stop at given frame or time stamp.
3. Ability to read each N-th frame (decimation).
4. Ability to report the progress with given periodicity.

These patterns are so common that it very quickly become annoynig and repetitive to implement them from scratch for each analysis task. That is why MolAR provides [AnalysisTask] trait, which automates all the common steps and takes care of all the boilerplate.

## Implementing analysis tasks

Here is an example of implementing custom analysis task, which prints a center of mass for a user provided selection on each frame and also computes an average center of mass over the trajectory:

```rust,no_run
// Import all basic things from molar
use molar::prelude::*;
// For error handling
use anyhow::Result;
// For processing custom command line arguments
use clap::Args;

// User-defined command line arguments
#[derive(Args, Debug, Default)]
struct UserArgs {
    // Selection string
    #[arg(long, default_value = "all")]
    sel: String,
}

// Our analysis task type
struct ComTask {
    // Selection to use
    sel: Sel,
    // COM vector
    com_aver: nalgebra::Vector3<f32>,
}

// Implement AnalysisTask trait
// The type with custom parameters is provided as generic parameter
impl AnalysisTask<UserArgs> for ComTask {
    // Name of the analysis task
    fn task_name() -> String {
        "Center of mass computation".to_owned()
    }

    // Constructor of our analysis type
    // It is called on the first valid trajectory frame.
    // Context contains all needed data such as topology, 
    // state and parsed command line arguments
    fn new(context: &mut AnalysisContext<UserArgs>) -> anyhow::Result<Self> {
        // Create our selection from the user-supplied string.
        // Arguments are stored in context.args.
        let sel = context.sys.select(&context.args.sel)?;
        // Create our analysis type instance
        Ok(Self {
            sel,
            com_aver: nalgebra::Vector3::zeros(),
        })
    }

    // Function to be called at each frame.
    fn process_frame(&mut self, context: &mut AnalysisContext<UserArgs>) -> anyhow::Result<()> {
        // Compute the center of mass
        let com = context.sys.bind(&self.sel).center_of_mass()?;
        // Report current center of mass. We get current time stamp from the context
        println!("time={}, com={}", context.sys.get_time(), com);
        // Add to average
        self.com_aver += com.coords;
        Ok(())
    }

    // Post-processing
    fn post_process(&mut self, context: &mut AnalysisContext<UserArgs>) -> anyhow::Result<()> {
        // Compute average
        self.com_aver /= context.consumed_frames as f32;
        println!("average com={}", self.com_aver);
        Ok(())
    }
}

// Run our task
fn main() -> anyhow::Result<()> {
    ComTask::run()?;
    Ok(())
}

```

## Trajectory processing aguments

All analysis tasks accept a number of common command line arguments (see [TrajAnalysisArgs]) allowing to select which frames to process and which files to read.

For example, the following command line will run our analysis task sequencially for two trajectories starting from frame 150 and ending when reaching 100 ns, while reporting progress each 100 frames. Custom selection is provided in the `--sel` argument, which is declared and recognised by `ComTask`
```shell
analysis_task -f structure.pdb traj_1.xtc traj_2.xtc -b 150 -e 100ns -log 100 --sel "resid 10:20"
```


# Python bindings

MolAR provides convenient Python bindings, creatively named `pymolar`, which has its own ["Pythonic" API](https://yesint.github.io/molar/).

The bindings are made as performant as possible, but they are not as fast as the native Rust functions. Nevertheless, `pymolar` is still significanly faster than MDAnalysis or other similar pure-python libraries.

## Installation

It is highly recommended to use a Python virtual environment. It is assumed that `pip` is used for installation, but you can use any Python package manager.

### Installing from PyPI

```shell
pip install pymolar
```

Installed package _does not_ include the Gromacs plugin. If you want to read TPR or CPT files you have to compile the plugin and link against the local Gromacs installation as described [here](#gromacs-tpr-and-cpt-support).

### Compiling from source

You may compile `pymolar` from source. In this case Gromacs plugin is also compiled if the corresponding paths are provided as described [here](#gromacs-tpr-and-cpt-support).

```shell
#1. Install maturin in the current virtual environment
pip install maturin
#2. Go to molar_python subfolder of molar source tree
cd <path_to_molar>/molar_python
#3. Compile the bindings
maturin build -r
#4. Install the bindings in the current virtual environment
python -m pip install .
```

## Usage

Pymolar bindings could be used in two modes: manual and as [analysis tasks](#analysis-tasks). In the first case you have a full fine-grained control on how you read your input files, but this may involve a lot of boilerplate. In contrast, "analysis tasks" hide most of the input handling from the user providing a very simple command-line interface to load structure and trajectory files, skip frames, begin and end reading at particular frame or time stamp, etc. The user only needs to implement three methods: `pre_process`, `process_frame` and `post_process`, which are called during the analysis.

As an example we will write a script that prints center of mass of CA protein atoms on each trajectory frame and then computes the average center of masses for the whole trajectory:

```python
#file: average.py

# Import pymolar
from pymolar import *
import numpy as np

# Create an analysis task
class MyTask(AnalysisTask):
    # Register a custom command line argument for selection
    def register_args(self,parser):
        parser.add_argument('--sel',default="protein")
        
    # This method is called before starting trajectory processing
    def pre_process(self):
        # Create a selection using selection string from the command line
        # self.src contains a Source with the first state read
        self.sel = self.src(self.args.sel)
        # Average center of masses
        self.com_ave = np.array([0.0,0.0,0.0])


    # This method is called on each trajectory frame
    def process_frame(self):
        cm = self.sel.com()
        print(f"time: {self.state.time}, com: {cm}")
        self.com_ave += cm


    # This method is called after trajectory processing is finished
    def post_process(self):
        self.com_ave /= self.consumed_frames
        print(f"Average com: {self.com_ave}")


if __name__ == "__main__":
    # Run the analysis task.
    MyTask()

```

This script can be run as following:
```shell
python3 average.py -f struct.gro traj.xtc -e 100 --sel "resid 1:10"
```