Crate fortformat

Source
Expand description

Parse and work with Fortran format specifications.

§Crate structure

Most users should focus on the following four modules:

  • format_specs: handles parsing the format strings and provides types to represent each one in Rust. If you just need to get information about a format string, try this module.
  • ser, de: handles serializing and deserializing data in Fortran fixed format using the serde crate. Requires the serde feature be activated.
  • dataframes: an extension of the serde functionality which allows deserializing multiple rows of data into a polars DataFrame. Requires the dataframes feature be activated.

Many of the public functions are also available in the crate root.

§What are format specifications?

Fortran programs can write and read data using fixed format. In fixed format, each piece of data is written to a text or binary file with a specific number of bytes. When reading or writing text, the formatting for each value is given by a format string, which will look something like:

(a12,i5,f12.4,e11.5)

The above string can be interpreted as:

  • a string with 12 characters (a12),
  • an integer that has 5 characters, including a negative sign if needed (i5),
  • a float with 12 characters (including a decimal point and possibly a negative sign), and 4 digits after the decimal point, and
  • a float written in engineering/scientific notation (e.g. 6.022E+23) using 11 characters and with 5 digits after the decimal place.

What makes such data tricky to read in more modern programming languages is that these fields can and do abut. The following is a perfectly valid string with the above format:

Hello,world!123459999999.99996.02214E+23

The four values are actually Hello,world!, 12345, 9999999.9999, and 6.02214E+23, but without knowing the format string, it can be difficult to separate out the values with no delimiting characters.

§A brief overview of format specs

Being such an old language, it can be difficult to find information about the Fortran format syntax. Here is a short summary of the syntax as implemented in this crate (and tested against gfortran-generated output).

§Basic syntax

A format string starts with a (, followed by one or more fields separated by commas, and ends with a ). (Alternate formats accepted by Fortran are not yet implemented.)

A field consists of a least a letter indicating the type to be written and how it is to be formatted. It may be preceded by a number indicating how many times to repeat it, and followed by one or more numbers (usually with a decimal point separator) that indicates its width and precision. A field may also start with a modifier, which impacts all following fields until another modifier of the same type is given.

§Field types

  • a = string/character type. a by itself means a single character. Strings are indicated by aN, e.g. a5, a12, or a128. The number gives the maximum number of characters in the string.
  • i, o, and z = integer type. Must have a width, that is, i5 is valid but i alone is not. If a second number is given, as in i5.3, the second number indicates how wide the integer must be. If the integer has fewer digits than this, it is zero-padded. For example, 42 formatted as i5.3 would be written as “ 042“ - 5 total width, 3 digits required. Fortran does not distinguish between signed and unsigned integers. A negative sign takes up one of the available characters given by the width. o and z write the integer in octal and hexademical, respectively.
  • f, e, d, and g = real/float type. Must have both a width and precision, i.e. f8.3 is valid, but neither f8 nor f are. In these types, the number after the decimal (3 in f8.3) indicates the number of digits written after the decimal place. f will always write out numbers normally while e will use scientific/engineering notation (e.g. 6.022E+23). d is similar to e, but is intended for 64-bit floats, and represents this with a D in place of the E: 6.022D+23. g will choose the format based on the magnitude of the value.
  • x = a positional specifier. This does not correspond to a value, it merely “positions” the next value. x represents a single space, and is the most common positional specifier.

§Repeats and subgroups

A given specifier may be repeated by prefixing it with a number. For example, the string (4i5.3) indicates that four, 5-character wide integers will be written.

Multiple specifiers may be repeated by grouping them with parentheses. For example, the string (a12,3(i5,e13.5)) means one 12-character string will be written, followed by 3 sets consisting of a 5-character integer and a 13-character float. Fully expanded, this will be:

(a12,i5,e13.5,i5,e13.5,i5,e13.5)

§Modifiers

  • p = scale a real/float number before writing it. This is always written as Np, where N is a positive or negative integer. This has slightly different effects for f versus e/d formats.
    • An f format means that the number will be multiplied by 10^N, so given a format 2pf7.3, the number 3.14 would be written as 314.000. Conversely, -2pf7.3 would write it out as 0.031. In both cases, the number of digits after the decimal is unchanged at 3.
    • For e or d formats, the decimal place shifts when N > 1, so 2pe9.3 would print 3.14 as 31.40E-01. For N < 0, it only shifts the place of the numbers, so -2pe8.3 would print 3.14 as 0.003E+03. N = 1 is a bit of a special case, in that it shifts the digits to write the value as e.g. 3.140E+00 instead of the default 0.314E+01.
    • Note that f types actually have their value changed, while e and d types print the same value, just with different exponents.

Modifiers don’t just affect the field they are attached to, but affect all releveant fields later in the format string (until the next instance of the same modifier). So in a string like:

(f5.3,1pe12.5,e12.5,f6.3,-2pf7.5,f7.5)
  • the first f5.3 is unaffected,
  • the next e12.5,e12.5,f6.3 are all affected by the 1p modifier, and
  • the final f7.5,f7.5 are both affected by the -2p modifier.

Re-exports§

pub use format_specs::FortField;
pub use format_specs::FortFormat;
pub use fort_error::FError;
pub use de::from_str;
pub use de::from_str_custom;
pub use de::from_str_with_fields;
pub use de::from_str_with_fields_custom;
pub use de::DeSettings;
pub use ser::to_bytes;
pub use ser::to_bytes_with_fields;
pub use ser::to_string;
pub use ser::to_string_with_fields;
pub use ser::to_writer;
pub use ser::to_writer_with_fields;
pub use serde_common::SError;
pub use serde_common::DError;
pub use dataframes::read_to_dataframe;

Modules§

dataframes
Read Fortran-formatted data directly as a DataFrame.
de
Deserialize data written according to a Fortran format string
format_specs
Represent Fortran formats as Rust types.
fort_error
Errors in Fortran format strings or values
ser
Serialize data according to a Fortran format string.
serde_common
Errors in serializing/deseralizing Fortran-formatted data