Expand description
Parse and work with Fortran format specifications.
§Crate structure
Most users should focus on the following four modules:
format_specs
: handles parsing the format strings and provides types to represent each one in Rust. If you just need to get information about a format string, try this module.ser
,de
: handles serializing and deserializing data in Fortran fixed format using theserde
crate. Requires theserde
feature be activated.dataframes
: an extension of theserde
functionality which allows deserializing multiple rows of data into apolars
DataFrame. Requires thedataframes
feature be activated.
Many of the public functions are also available in the crate root.
§What are format specifications?
Fortran programs can write and read data using fixed format. In fixed format, each piece of data is written to a text or binary file with a specific number of bytes. When reading or writing text, the formatting for each value is given by a format string, which will look something like:
(a12,i5,f12.4,e11.5)
The above string can be interpreted as:
- a string with 12 characters (
a12
), - an integer that has 5 characters, including a negative sign if needed (
i5
), - a float with 12 characters (including a decimal point and possibly a negative sign), and 4 digits after the decimal point, and
- a float written in engineering/scientific notation (e.g.
6.022E+23
) using 11 characters and with 5 digits after the decimal place.
What makes such data tricky to read in more modern programming languages is that these fields can and do abut. The following is a perfectly valid string with the above format:
Hello,world!123459999999.99996.02214E+23
The four values are actually Hello,world!
, 12345
, 9999999.9999
, and 6.02214E+23
,
but without knowing the format string, it can be difficult to separate out the
values with no delimiting characters.
§A brief overview of format specs
Being such an old language, it can be difficult to find information about the Fortran format syntax. Here is a short summary of the syntax as implemented in this crate (and tested against gfortran-generated output).
§Basic syntax
A format string starts with a (
, followed by one or more fields separated by
commas, and ends with a )
. (Alternate formats accepted by Fortran are not yet implemented.)
A field consists of a least a letter indicating the type to be written and how it is to be formatted. It may be preceded by a number indicating how many times to repeat it, and followed by one or more numbers (usually with a decimal point separator) that indicates its width and precision. A field may also start with a modifier, which impacts all following fields until another modifier of the same type is given.
§Field types
a
= string/character type.a
by itself means a single character. Strings are indicated byaN
, e.g.a5
,a12
, ora128
. The number gives the maximum number of characters in the string.i
,o
, andz
= integer type. Must have a width, that is,i5
is valid buti
alone is not. If a second number is given, as ini5.3
, the second number indicates how wide the integer must be. If the integer has fewer digits than this, it is zero-padded. For example,42
formatted asi5.3
would be written as “ 042“ - 5 total width, 3 digits required. Fortran does not distinguish between signed and unsigned integers. A negative sign takes up one of the available characters given by the width.o
andz
write the integer in octal and hexademical, respectively.f
,e
,d
, andg
= real/float type. Must have both a width and precision, i.e.f8.3
is valid, but neitherf8
norf
are. In these types, the number after the decimal (3 inf8.3
) indicates the number of digits written after the decimal place.f
will always write out numbers normally whilee
will use scientific/engineering notation (e.g.6.022E+23
).d
is similar toe
, but is intended for 64-bit floats, and represents this with aD
in place of theE
:6.022D+23
.g
will choose the format based on the magnitude of the value.x
= a positional specifier. This does not correspond to a value, it merely “positions” the next value.x
represents a single space, and is the most common positional specifier.
§Repeats and subgroups
A given specifier may be repeated by prefixing it with a number. For example, the string (4i5.3)
indicates that four, 5-character wide integers will be written.
Multiple specifiers may be repeated by grouping them with parentheses. For example, the string
(a12,3(i5,e13.5))
means one 12-character string will be written, followed by 3 sets consisting of
a 5-character integer and a 13-character float. Fully expanded, this will be:
(a12,i5,e13.5,i5,e13.5,i5,e13.5)
§Modifiers
p
= scale a real/float number before writing it. This is always written asNp
, whereN
is a positive or negative integer. This has slightly different effects forf
versuse
/d
formats.- An
f
format means that the number will be multiplied by 10^N, so given a format2pf7.3
, the number3.14
would be written as314.000
. Conversely,-2pf7.3
would write it out as0.031
. In both cases, the number of digits after the decimal is unchanged at 3. - For
e
ord
formats, the decimal place shifts when N > 1, so2pe9.3
would print 3.14 as31.40E-01
. For N < 0, it only shifts the place of the numbers, so-2pe8.3
would print 3.14 as0.003E+03
. N = 1 is a bit of a special case, in that it shifts the digits to write the value as e.g.3.140E+00
instead of the default0.314E+01
. - Note that
f
types actually have their value changed, whilee
andd
types print the same value, just with different exponents.
- An
Modifiers don’t just affect the field they are attached to, but affect all releveant fields later in the format string (until the next instance of the same modifier). So in a string like:
(f5.3,1pe12.5,e12.5,f6.3,-2pf7.5,f7.5)
- the first
f5.3
is unaffected, - the next
e12.5,e12.5,f6.3
are all affected by the1p
modifier, and - the final
f7.5,f7.5
are both affected by the-2p
modifier.
Re-exports§
pub use format_specs::FortField;
pub use format_specs::FortFormat;
pub use fort_error::FError;
pub use de::from_str;
pub use de::from_str_custom;
pub use de::from_str_with_fields;
pub use de::from_str_with_fields_custom;
pub use de::DeSettings;
pub use ser::to_bytes;
pub use ser::to_bytes_with_fields;
pub use ser::to_string;
pub use ser::to_string_with_fields;
pub use ser::to_writer;
pub use ser::to_writer_with_fields;
pub use serde_common::SError;
pub use serde_common::DError;
pub use dataframes::read_to_dataframe;
Modules§
- dataframes
- Read Fortran-formatted data directly as a
DataFrame
. - de
- Deserialize data written according to a Fortran format string
- format_
specs - Represent Fortran formats as Rust types.
- fort_
error - Errors in Fortran format strings or values
- ser
- Serialize data according to a Fortran format string.
- serde_
common - Errors in serializing/deseralizing Fortran-formatted data