bmi-driver 0.0.2

Basic Model Interface (BMI) driver with support for C Fortran, and Python models.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
# bmi-driver

A parallel BMI (Basic Model Interface) orchestrator that dynamically loads and couples C, Fortran, and Python hydrological models. It reads NetCDF forcing data, resolves model dependencies, handles automatic unit conversion, and runs across many geographic locations in parallel.

The binary is named `bmi-driver`.

## Building

### Default (all adapters)

```bash
cargo build --release
```

By default this builds with C, Fortran, and Python adapter support. The build will check for required system libraries and print clear install instructions if anything is missing.

### C adapter only

```bash
cargo build --release --no-default-features
```

Builds with only the C shared library adapter and the built-in SLOTH dummy model. This has the fewest system dependencies (only `libnetcdf`).

### Selecting features

```bash
cargo build --release --no-default-features --features fortran
cargo build --release --no-default-features --features python
cargo build --release --no-default-features --features fortran,python
```

- **`fortran`** -- Enables the `bmi_fortran` adapter for loading Fortran shared libraries. No additional build dependencies; a Fortran runtime must be available at execution time.
- **`python`** -- Enables the `bmi_python` adapter for loading Python BMI models via [PyO3]https://pyo3.rs. Requires Python development headers at build time (e.g., `python3-dev`). The Python environment must have the target BMI packages installed at runtime.

### Install

```bash
cargo install --path .
```

This installs `bmi-driver` to `~/.cargo/bin/`.

### System dependencies

| Library | Purpose | Install |
|---------|---------|---------|
| `libnetcdf` | Reading NetCDF forcing files | `apt install libnetcdf-dev` / `dnf install netcdf-devel` / `brew install netcdf` |
| `pkg-config` | Library discovery at build time | `apt install pkg-config` / `dnf install pkgconfig` / `brew install pkg-config` |
| Python dev headers | Only with `python` feature (default) | `apt install python3-dev` / `dnf install python3-devel` / `brew install python3` |

SQLite is bundled and compiled from source automatically -- no system `libsqlite3` package is needed.

Model shared libraries (`.so`) must be accessible at the paths specified in the config.

## Usage

```
bmi-driver <data_dir> [OPTIONS]
```

### Options

| Flag | Description |
|------|-------------|
| `-j, --jobs N` | Number of parallel worker processes (default: CPU count) |
| `--units` | Print all unit conversion info and exit without running |
| `--minify` | Strip unused fields from realization.json and exit |

### Data directory layout

```
<data_dir>/
  config/
    realization.json            # Model chain configuration
    *.gpkg                      # GeoPackage with location IDs
  config/cat_config/
    <ModelName>/{{id}}.input    # Per-location model configs
  forcings/
    forcings.nc                 # NetCDF forcing data
  outputs/
    bmi-driver/                 # Output CSVs written here
      <location_id>.csv
```

The GeoPackage must contain a `divides` table with a `divide_id` text column listing all location IDs.

### How it runs

1. **Parent process** reads config, queries location IDs from the GeoPackage, checks for unmapped model inputs (offering to auto-add suggested mappings), and prints active unit conversions.
2. Parent divides locations into chunks and spawns N worker subprocesses.
3. **Each worker** iterates its assigned locations: initializes models, runs all timesteps, writes a CSV per location, and reports progress counts to the parent via stdout.
4. Parent displays nested progress bars (overall + per-worker) using the reported counts.

### Library usage

```rust
use bmi_driver::{ModelRunner, BmiError};

fn main() -> Result<(), BmiError> {
    let mut runner = ModelRunner::from_config("config/realization.json")?;
    runner.initialize("cat-123")?;
    runner.run()?;

    let outputs = runner.main_outputs()?;
    println!("Completed {} timesteps", outputs.len());

    runner.finalize()?;
    Ok(())
}
```

## Configuration

The config file is `<data_dir>/config/realization.json`. It defines the model chain, forcing data source, and time range. The format is compatible with [ngen](https://github.com/NOAA-OWP/ngen) realization configs -- extra ngen-specific fields are silently ignored.

### Minimal example

```json
{
  "global": {
    "formulations": [
      {
        "name": "bmi_multi",
        "params": {
          "main_output_variable": "Q_OUT",
          "output_variables": ["Q_OUT"],
          "modules": [
            {
              "name": "bmi_c",
              "params": {
                "model_type_name": "CFE",
                "library_file": "/path/to/libcfebmi.so",
                "init_config": "./config/cat_config/CFE/{{id}}.ini",
                "registration_function": "register_bmi_cfe",
                "main_output_variable": "Q_OUT",
                "variables_names_map": {
                  "atmosphere_water__liquid_equivalent_precipitation_rate": "precip_rate"
                }
              }
            }
          ]
        }
      }
    ],
    "forcing": {
      "path": "./forcings/forcings.nc"
    }
  },
  "time": {
    "start_time": "2010-01-01 00:00:00",
    "end_time": "2010-01-02 00:00:00",
    "output_interval": 3600
  }
}
```

### Top-level fields

| Field | Required | Description |
|-------|----------|-------------|
| `global.formulations` | yes | Array with one `bmi_multi` formulation entry |
| `global.forcing.path` | yes | Path to NetCDF forcing file (relative to data_dir) |
| `time.start_time` | yes | Simulation start, `"YYYY-MM-DD HH:MM:SS"` |
| `time.end_time` | yes | Simulation end, `"YYYY-MM-DD HH:MM:SS"` |
| `time.output_interval` | no | Output interval in seconds (default: 3600) |

### Formulation fields

| Field | Required | Description |
|-------|----------|-------------|
| `name` | yes | Must be `"bmi_multi"` for multi-model orchestration |
| `params.modules` | yes | Array of module configs (see below) |
| `params.main_output_variable` | no | Primary output variable name |
| `params.output_variables` | no | Variables to include in output CSV. If empty, all model outputs are written. |

### Module fields (shared across all adapter types)

| Field | Required | Description |
|-------|----------|-------------|
| `name` | yes | Adapter type: `"bmi_c"`, `"bmi_fortran"`, `"bmi_python"`, or `"bmi_c++"` |
| `params.model_type_name` | yes | Model identifier (e.g., `"CFE"`, `"NoahOWP"`, `"SLOTH"`) |
| `params.init_config` | yes | Path to model config file. `{{id}}` is replaced with the location ID at runtime. |
| `params.main_output_variable` | yes | The primary output variable of this model |
| `params.library_file` | depends | Path to shared library. Required for C and Fortran adapters. |
| `params.registration_function` | no | Name of the C registration function (default: `"register_bmi"`) |
| `params.variables_names_map` | no | Maps model input names to source variable names (from forcings or upstream models) |
| `params.model_params` | no | Key-value parameters set on the model after initialization via `set_value()` |
| `params.python_type` | no | Python class path for the Python adapter (see below) |

## Module types

### bmi_c -- C shared libraries

Loads a C shared library via `dlopen` with `RTLD_GLOBAL` (so symbols are shared across libraries). Looks up the registration function by name, calls it to populate a struct of BMI function pointers, then calls through those pointers for all BMI operations.

```json
{
  "name": "bmi_c",
  "params": {
    "model_type_name": "CFE",
    "library_file": "/path/to/libcfebmi.so",
    "init_config": "./config/cat_config/CFE/{{id}}.ini",
    "registration_function": "register_bmi_cfe",
    "main_output_variable": "Q_OUT",
    "variables_names_map": {
      "atmosphere_water__liquid_equivalent_precipitation_rate": "QINSUR",
      "water_potential_evaporation_flux": "EVAPOTRANS"
    }
  }
}
```

The shared library must export a registration function matching this signature:

```c
Bmi* register_bmi_cfe(Bmi* model);
```

This function fills in the `Bmi` struct's function pointers and returns the pointer. The default function name is `register_bmi` if `registration_function` is omitted.

### bmi_fortran -- Fortran shared libraries

**Requires:** `--features fortran`

Loads Fortran BMI models. The adapter supports two modes:

1. **Single library** -- the `.so` implements both the registration function and the BMI interface directly.
2. **With middleware** -- a separate C-to-Fortran middleware library translates C-calling-convention calls to Fortran. The runner auto-detects a middleware library if one exists alongside the model library.

The adapter passes an opaque handle pointer as the first argument to all BMI calls, matching the convention used by Fortran BMI implementations.

```json
{
  "name": "bmi_fortran",
  "params": {
    "model_type_name": "NoahOWP",
    "library_file": "/path/to/libsurfacebmi.so",
    "init_config": "./config/cat_config/NOAH-OWP-M/{{id}}.input",
    "main_output_variable": "QINSUR",
    "variables_names_map": {
      "PRCPNONC": "precip_rate",
      "SFCTMP": "TMP_2maboveground",
      "SFCPRS": "PRES_surface",
      "Q2": "SPFH_2maboveground",
      "UU": "UGRD_10maboveground",
      "VV": "VGRD_10maboveground",
      "LWDN": "DLWRF_surface",
      "SOLDN": "DSWRF_surface"
    }
  }
}
```

### bmi_python -- Python BMI classes

**Requires:** `--features python`

Loads a Python class that implements the BMI interface. Values are exchanged via numpy arrays. There are two ways to specify the Python class:

**Option 1: `python_type` (preferred)** -- a dotted module path where the last component is the class name:

```json
{
  "name": "bmi_python",
  "params": {
    "model_type_name": "LSTM",
    "python_type": "bmi_lstm.bmi_LSTM",
    "init_config": "./config/cat_config/LSTM/{{id}}.yml",
    "main_output_variable": "land_surface_water__runoff_volume_flux",
    "variables_names_map": {
      "atmosphere_water__liquid_equivalent_precipitation_rate": "precip_rate",
      "land_surface_air__temperature": "TMP_2maboveground"
    }
  }
}
```

This imports `bmi_lstm` and instantiates `bmi_LSTM()`. The package must be installed or on `PYTHONPATH`.

**Option 2: `library_file` + `registration_function`** -- loads a `.py` file directly:

```json
{
  "name": "bmi_python",
  "params": {
    "model_type_name": "MyModel",
    "library_file": "/path/to/my_model.py",
    "registration_function": "MyBmiClass",
    "init_config": "./config/cat_config/MyModel/{{id}}.yml",
    "main_output_variable": "output_var"
  }
}
```

This reads the file, adds its parent directory to `sys.path`, imports the module by filename, and instantiates `MyBmiClass()`.

### bmi_c++ (SLOTH) -- Built-in dummy model

The `bmi_c++` adapter name with `model_type_name: "SLOTH"` activates the built-in SLOTH (Simple Lightweight Output Testing Handler) model. SLOTH returns constant values and requires no shared library. It is configured entirely through `model_params`.

Variables are defined using a special string format:

```
"variable_name(count,type,units,location)": value
```

| Component | Description |
|-----------|-------------|
| `variable_name` | The BMI output variable name |
| `count` | Array size (typically `1` for scalar) |
| `type` | Data type: `double`, `float`, or `int` |
| `units` | Unit string (e.g., `m`, `1`, `mm/s`) |
| `location` | BMI grid location (e.g., `node`) |
| `value` | Constant value returned for all timesteps |

```json
{
  "name": "bmi_c++",
  "params": {
    "model_type_name": "SLOTH",
    "init_config": "/dev/null",
    "main_output_variable": "z",
    "model_params": {
      "sloth_ice_fraction_schaake(1,double,m,node)": 0.0,
      "sloth_ice_fraction_xinanjiang(1,double,1,node)": 0.0,
      "sloth_soil_moisture_profile(1,double,1,node)": 0.0
    }
  }
}
```

## Model dependency resolution

Models are loaded in dependency order, not config order. The runner uses `variables_names_map` to determine dependencies:

- The **keys** are the model's input variable names.
- The **values** are source variable names (from forcings or upstream model outputs).

The runner performs an iterative topological sort:

1. Start with all forcing variables as available.
2. Find a module whose `variables_names_map` values are all satisfied by available variables.
3. Load that module, add its output variables to the available set.
4. Repeat until all modules are loaded.

This means modules can be listed in any order in the config. If a circular dependency or missing variable is detected, the runner reports an error.

## Unit conversion

The runner automatically converts units between source variables and model inputs. When a model input is mapped to a source, the runner reads both sides' unit strings and builds a linear conversion (`output = input * scale + offset`).

### Supported conversions

| Category | Examples |
|----------|----------|
| Length | m, mm, cm, km, ft, in |
| Pressure | Pa, kPa, hPa, mb, atm, bar |
| Temperature | K, C, F (with offset handling for K/C/F conversions) |
| Rates | mm/s, mm/h, m/s, m s^-1, mm s-1 |
| Mass flux | kg m^-2 s^-1 to/from mm/s (assumes water density 1000 kg/m^3) |
| Dimensionless | `""`, `"1"`, `"-"`, `"m/m"`, `"none"` |

### Notation variants

All of these are recognized as equivalent:

- `mm s^-1`, `mm/s`, `mm s-1`
- `kg m^-2 s^-1`, `kg/m^2/s`, `kg m-2 s-1`
- `W m^-2`, `W/m^2`, `W m-2`

Use `--units` to print all active conversions for debugging:

```bash
bmi-driver <data_dir> --units
```

If units are unknown or incompatible, the runner falls back to an identity conversion (no change) and prints a warning.

## Variable aliases

The runner knows common aliases between AORC forcing field names and CSDMS standard names. When it detects unmapped model inputs that match an available variable under a different name, it suggests adding the mapping.

| Short name | CSDMS standard name |
|------------|-------------------|
| `precip_rate` | `atmosphere_water__liquid_equivalent_precipitation_rate` |
| `TMP_2maboveground` | `land_surface_air__temperature` |
| `UGRD_10maboveground` | `land_surface_wind__x_component_of_velocity` |
| `VGRD_10maboveground` | `land_surface_wind__y_component_of_velocity` |
| `DLWRF_surface` | `land_surface_radiation__incoming_longwave_flux` |
| `DSWRF_surface` | `land_surface_radiation__incoming_shortwave_flux` |
| `PRES_surface` | `land_surface_air__pressure` |
| `SPFH_2maboveground` | `land_surface_air__specific_humidity` |
| `APCP_surface` | `land_surface_water__precipitation_volume_flux` |

On the first run, if unmapped inputs are found, the runner prompts:

```
Found unmapped model inputs that match available variables:
  [1] CFE: "atmosphere_water__liquid_equivalent_precipitation_rate" <- "QINSUR"

Add these mappings to realization.json? [y/N]
```

Answering `y` updates the config file in place.

## Config minification

Use `--minify` to strip a realization.json down to only the fields bmi-driver reads:

```bash
bmi-driver <data_dir> --minify
```

This removes:
- ngen-specific fields (`routing`, `forcing.provider`, `allow_exceed_end_time`, `fixed_time_step`, `uses_forcing_file`, etc.)
- Unknown keys not in the config schema
- Empty default-valued fields

The minified config is still valid for subsequent runs.

## NetCDF forcing format

The forcing file must be a NetCDF file with:

- A variable `ids` containing location ID strings
- A variable `Time` containing epoch timestamps (int64)
- Dimensions `catchment-id` (locations) and `time` (timesteps)
- Data variables with dimensions `[catchment-id, time]`

All forcing data for a location is preloaded into memory before running that location's models.

## Output

Each location produces a CSV at `<data_dir>/outputs/bmi-driver/<location_id>.csv`:

```csv
Time Step,Time,Q_OUT,EVAPOTRANS,QINSUR,RAIN_RATE
0,2010-01-01 00:00:00,0.000000000,0.000000000,0.000000000,0.000000000
1,2010-01-01 01:00:00,0.000012345,0.000000100,0.000001234,0.000000500
...
```

Columns are determined by `output_variables` in the formulation params. If `output_variables` is empty, all model output variables are included.

## Project structure

```
src/
├── main.rs          # CLI entry point (bmi-driver binary)
├── lib.rs           # Public exports
├── traits.rs        # Bmi trait and types
├── error.rs         # Error types
├── ffi.rs           # C FFI struct definition
├── library.rs       # Dynamic library loading (dlopen/dlsym)
├── config.rs        # Configuration parsing
├── forcings.rs      # NetCDF forcing data
├── runner.rs        # Model orchestration and dependency resolution
├── units.rs         # Automatic unit conversion
├── aliases.rs       # AORC ↔ CSDMS variable name aliases
└── adapters/
    ├── mod.rs       # Adapter exports
    ├── c.rs         # C BMI adapter
    ├── fortran.rs   # Fortran BMI adapter (feature-gated)
    ├── python.rs    # Python BMI adapter (feature-gated)
    └── sloth.rs     # SLOTH dummy model
```