Expand description
Rust library for loading, structuring, and querying astronomical observation datasets — with trajectory grouping, multi-observer support, and efficient lookups.
photom provides a type-safe pipeline for ingesting astrometric and photometric
measurements, associating them with ground-based observatories, and grouping them into
trajectories of moving objects. The library is designed around one primary dataset
type — observation_dataset::ObsDataset for flat observation collections.
§Features
- Polars ingestion (
polarsfeature) — load observations from aDataFrameorLazyFramewith full schema validation. - Parallel iteration (
parallelfeature) — process observations, nights, and trajectories in parallel via rayon with zero data copying. - ADES ingestion (
adesfeature) — load observations directly from MPC ADES XML files (observation_dataset::ObsDataset::from_ades), supporting both structured (obsBlock/obsContext) and flat formats, with automatic MPC observer resolution. - MPC 80-column ingestion (
mpc_80_colfeature) — load observations from the classic MPC fixed-width 80-column ASCII format (observation_dataset::ObsDataset::from_mpc_80_col), with automatic trajectory grouping and nom-based field parsing. - Parquet ingestion via DataFusion (
datafusionfeature) — load observations from any Parquet file reachable by URI (file://,http://,https://,hdfs://) using Apache Arrow / DataFusion (observation_dataset::ObsDataset::from_parquet_uriand its async counterpart), with automatic contiguous index optimisation. - Serialisation / deserialisation (
serdefeature) — persist and restore anobservation_dataset::ObsDataset(and all constituent types) via serde. Runtime-only state — the lazy MPC observatory cache, and all derived index maps — is excluded from the serialised form and rebuilt transparently on deserialisation. - Multi-observer support — MPC observatory codes (resolved lazily from the MPC website), custom geodetic sites (interned and deduplicated), or unknown observer.
- Trajectory grouping — group observations by a
traj_idcolumn; supports both integer (UInt64) and string (String) identifiers. - Three astrometric error models — FCCT14, CBM10, and VFCC17, used to assign measurement accuracies to MPC-coded observatories.
§Modules
| Module | Description |
|---|---|
coordinates | Celestial coordinate types and coordinate-system conversions |
coordinates::equatorial | coordinates::equatorial::EquCoord — equatorial sky position (RA, Dec) with 1-σ uncertainties, Vincenty angular separation, spherical midpoint, and covariance propagation |
coordinates::cartesian | coordinates::cartesian::CartesianCoord / coordinates::cartesian::CartesianCoordCov — Cartesian unit-sphere position with optional 3×3 covariance and inverse propagation back to equatorial coordinates |
coordinates::cov2 | coordinates::cov2::Cov2 — symmetric 2×2 covariance matrix for tangent-plane error ellipses; eigenvalues, Mahalanobis distance, and isotropic inflation |
coordinates::gnomonic_projection | coordinates::gnomonic_projection::TangentPlane / coordinates::gnomonic_projection::TangentPoint / coordinates::gnomonic_projection::TangentVec — gnomonic (tangent-plane) projection between equatorial sky coordinates and a local 2-D Cartesian frame |
photometry | Photometric measurement types: apparent magnitude, uncertainty, and bandpass filter (photometry::Photometry, photometry::Filter) |
observation_dataset | Core observation types (observation_dataset::observation::Observation, observation_dataset::ObsDataset) |
observer | Ground-based observatory representation (observer::Observer) and geodetic utilities |
observer::error_model | Astrometric error model variants (observer::error_model::ObsErrorModel: FCCT14, CBM10, VFCC17) |
constants | Physical and geodetic constants (Earth axes, AU, etc.) |
io | Internal ingestion backends (Polars adapter, schema validation) |
io::ades | ADES XML ingestion backend (io::ades) |
io::mpc_80_col | MPC 80-column ingestion backend (io::mpc_80_col) |
io::datafusion | DataFusion/Arrow Parquet ingestion backend (io::datafusion) |
§Type Aliases
The crate exports five primitive type aliases used throughout the API to make units explicit in function signatures:
| Alias | Underlying type | Unit |
|---|---|---|
Arcseconds | f64 | Angle in arcseconds |
Radians | f64 | Angle in radians |
Degrees | f64 | Angle in degrees |
MJDTT | f64 | Modified Julian Date (Terrestrial Time) in days |
Meters | f64 | Distance in metres |
§DataFrame Schema
Requires the polars feature.
When loading data via observation_dataset::ObsDataset::from_polars or
observation_dataset::ObsDataset::from_lazy, the input frame must conform
to the following column layout.
§Mandatory base columns (non-nullable)
| Column | Polars type | Description |
|---|---|---|
id | UInt64 | Unique observation identifier |
ra | Float64 | Right ascension (radians) |
ra_err | Float64 | Right ascension uncertainty (radians) |
dec | Float64 | Declination (radians) |
dec_err | Float64 | Declination uncertainty (radians) |
magnitude | Float64 | Apparent magnitude |
mag_err | Float64 | Magnitude uncertainty |
filter | String | Photometric filter label |
mjd_tt | Float64 | Epoch (MJD, Terrestrial Time) |
§Optional observer columns (nullable; column may be absent)
| Column | Polars type | Description |
|---|---|---|
obs_lon | Float64 | Geodetic longitude (radians, east positive) |
obs_lat | Float64 | Geodetic latitude (radians) |
obs_alt | Float64 | Altitude above ellipsoid (metres) |
obs_ra_acc | Float64 | RA accuracy (radians) — required when the geodetic triplet is set |
obs_dec_acc | Float64 | Dec accuracy (radians) — required when the geodetic triplet is set |
mpc_code_obs | String | Three-byte ASCII MPC code (takes precedence over geodetic columns) |
§Optional grouping columns
| Column | Polars type | Description |
|---|---|---|
traj_id | UInt32 or String | Trajectory identifier; nullable — null rows are loaded into the ObsDataset but are not assigned to any trajectory |
night_id | UInt32 | Night identifier; nullable — null rows are included in the ObsDataset but are not assigned to any night |
§Observer resolution (per row, in precedence order)
mpc_code_obsnon-null →observer::dataset::ObserverId::MpcCode(MPC site, resolved lazily).obs_lon,obs_lat, andobs_altall non-null →observer::dataset::ObserverId::IntId(geodetic site;obs_ra_accandobs_dec_accmust also be non-null).- Otherwise → no observer (
None).
A partially-null geodetic triplet or a complete triplet without accuracy values causes the ingestion to return an error.
§Ingestion arguments (FromPolarsArgs)
Requires the polars feature.
Both observation_dataset::ObsDataset::from_polars and
observation_dataset::ObsDataset::from_lazy accept a
FromPolarsArgs value that controls how the ingestion pipeline behaves.
Use FromPolarsArgs::default() to get sensible out-of-the-box settings,
or construct the struct explicitly to override individual fields.
| Field | Type | Default | Description |
|---|---|---|---|
error_model | Option<ObsErrorModel> | None | Astrometric error model used to assign accuracies to MPC-coded observatories; None leaves MPC observer accuracies unset until ObsDataset::set_error_model is called |
do_rechunk | Option<bool> | Some(false) | When true, forces all multi-chunk columns to be merged into a single contiguous Arrow chunk before ingestion; set to Some(false) when the caller has already guaranteed single-chunk layout (e.g. after reading a Parquet file with rechunk: true) |
contiguous_choice | Option<ContiguousChoice> | Some(ContiguousNight) | Which grouping column (if any) to sort the frame by before iteration; sorting allows the corresponding index to use compact contiguous ranges instead of per-row index vectors (see below) |
§Contiguous index optimisation (ContiguousChoice)
By default the ingestion pipeline sorts the input frame by night_id
(ContiguousChoice::ContiguousNight) so that all observations belonging to
the same night occupy a single contiguous block in the output observations
vector. This lets the night index store a compact (start, end) range for
each night instead of a Vec of scattered positions, which saves memory and
improves cache locality during sequential and parallel night iteration.
Setting contiguous_choice to ContiguousChoice::ContiguousTraj applies the
same optimisation to trajectories instead. Setting it to None disables the
sort entirely; both indices will use the Vec-based split representation.
Only one grouping column can be made contiguous at a time. The other column (if present in the frame) is always built as a split index.
use photom::io::polars::{ContiguousChoice, FromPolarsArgs};
use photom::observer::error_model::ObsErrorModel;
use photom::observation_dataset::ObsDataset;
// Sort by traj_id so trajectory iteration is more efficient.
let dataset = ObsDataset::from_polars(
&df,
FromPolarsArgs {
error_model: Some(ObsErrorModel::FCCT14),
contiguous_choice: Some(ContiguousChoice::ContiguousTraj),
..Default::default()
},
)?;§Usage Examples
§Build a minimal DataFrame and load observations
use polars::prelude::*;
use photom::observation_dataset::ObsDataset;
use photom::observer::error_model::ObsErrorModel;
// Construct a two-row DataFrame matching the required schema.
// RA and Dec are in radians; errors are in radians.
// Observer accuracy columns (obs_ra_acc, obs_dec_acc) are also in radians.
let df = df! {
"id" => &[1_u64, 2_u64],
"ra" => &[1.4633_f64, 1.4682_f64], // radians
"ra_err" => &[1.745e-5_f64, 1.745e-5_f64], // radians (~1 arcsec)
"dec" => &[0.3840_f64, 0.3847_f64], // radians
"dec_err" => &[1.745e-5_f64, 1.745e-5_f64], // radians (~1 arcsec)
"magnitude" => &[19.3_f64, 19.5_f64],
"mag_err" => &[0.05_f64, 0.05_f64],
"filter" => &["r", "r"],
"mjd_tt" => &[60000.0_f64, 60000.03_f64],
}?;
let dataset = ObsDataset::from_polars(&df, ObsErrorModel::FCCT14, Some(1000))?;
for obs in dataset.iter_observations() {
println!("{} {:?}", obs.id, obs.equ_coord);
}§Use an MPC observatory code
Add an optional mpc_code_obs column (String, nullable) to associate each
observation with an MPC-registered observatory. The accuracy values for MPC
sites are derived from the chosen ObsErrorModel.
use polars::prelude::*;
use photom::observation_dataset::ObsDataset;
use photom::observer::error_model::ObsErrorModel;
let df = df! {
"id" => &[1_u64],
"ra" => &[1.4633_f64], // radians
"ra_err" => &[1.745e-5_f64], // radians
"dec" => &[0.3840_f64], // radians
"dec_err" => &[1.745e-5_f64], // radians
"magnitude" => &[19.3_f64],
"mag_err" => &[0.05_f64],
"filter" => &["r"],
"mjd_tt" => &[60000.0_f64],
"mpc_code_obs" => &[Some("F51")], // Haleakalā Pan-STARRS 1
}?;
let dataset = ObsDataset::from_polars(&df, ObsErrorModel::FCCT14, None)?;§Group observations by trajectory
use polars::prelude::*;
use photom::observation_dataset::ObsDataset;
use photom::io::polars::FromPolarsArgs;
use photom::TrajId;
// traj_id can be UInt32 or String; null rows are loaded but not grouped.
let df = df! {
"id" => &[1_u64, 2_u64, 3_u64],
"ra" => &[1.4633_f64, 1.4682_f64, 0.1745_f64], // radians
"ra_err" => &[1.745e-5_f64; 3], // radians
"dec" => &[0.3840_f64, 0.3847_f64, 0.0873_f64], // radians
"dec_err" => &[1.745e-5_f64; 3], // radians
"magnitude" => &[19.3_f64, 19.5_f64, 18.0_f64],
"mag_err" => &[0.05_f64; 3],
"filter" => &["r", "r", "g"],
"mjd_tt" => &[60000.0_f64, 60000.03_f64, 60001.0_f64],
"traj_id" => &[Some("2020 AV2"), Some("2020 AV2"), None],
}?;
let dataset = ObsDataset::from_polars(&df, FromPolarsArgs::default())?;
let tid = TrajId::Str("2020 AV2".to_owned());
if let Some(iter) = dataset.iter_trajectory_observations(&tid) {
println!("{} observations in trajectory", iter.count());
}§Load observations from a LazyFrame
use photom::observation_dataset::ObsDataset;
use photom::observer::error_model::ObsErrorModel;
// Any DataFrame can be turned into a LazyFrame with .lazy().
let dataset = ObsDataset::from_lazy(df.lazy(), ObsErrorModel::VFCC17, None)?;§Coordinate utilities
coordinates::equatorial::EquCoord bundles a sky position (RA, Dec) with
its 1-σ uncertainties, all stored in radians.
use photom::coordinates::equatorial::EquCoord;
// Construct from degrees — values are converted to radians internally.
let a = EquCoord::from_degrees(10.0, 0.001, 20.0, 0.001);
let b = EquCoord::from_degrees(10.5, 0.001, 20.5, 0.001);
// Great-circle separation via the Vincenty formula (result in radians).
let sep = a.angular_separation(&b);
// Vector-averaging midpoint on the sphere.
let mid = a.spherical_midpoint(&b);To propagate astrometric uncertainties through the spherical-to-Cartesian
mapping use coordinates::equatorial::EquCoordCov::to_cartesian_cov, which
returns a coordinates::cartesian::CartesianCoordCov containing the full
3×3 covariance matrix. The inverse conversion is
coordinates::cartesian::CartesianCoordCov::to_equatorial_cov.
§2-D covariance on the tangent plane
coordinates::cov2::Cov2 is a compact symmetric 2×2 covariance matrix
designed for astrometric error ellipses expressed in a local tangent-plane
frame. It supports eigenvalue decomposition, Mahalanobis distance, and
isotropic inflation.
use photom::coordinates::cov2::Cov2;
use photom::coordinates::equatorial::EquCoord;
use photom::coordinates::gnomonic_projection::TangentVec;
// Build from the marginal 1-σ errors of an EquCoord.
let coord = EquCoord::from_degrees(45.0, 0.001, 20.0, 0.002);
let cov = Cov2::from_equ(&coord);
// Semi-axes of the 1-σ confidence ellipse.
let sigma_major = cov.lambda_max().max(0.0).sqrt();
let sigma_minor = cov.lambda_min().max(0.0).sqrt();
// Mahalanobis distance for an offset vector (radians).
let offset = TangentVec { dx: 1e-4, dy: 0.0 };
if let Some(d2) = cov.mahalanobis_sq(offset) {
let _ = d2.sqrt(); // normalised distance
}
// Add isotropic process noise q·I (Kalman-style inflation).
let q = 1e-8_f64;
let inflated = cov.inflate_isotropic(q);§Gnomonic (tangent-plane) projection
coordinates::gnomonic_projection::TangentPlane projects sky positions
near a chosen tangent point $(\alpha_0, \delta_0)$ onto a local 2-D
Cartesian frame. Great circles project to straight lines, making this
representation well-suited for short-arc astrometry and kinematic linking.
use photom::coordinates::equatorial::EquCoord;
use photom::coordinates::gnomonic_projection::{TangentPlane, TangentVec};
// Define the tangent point (degrees, converted internally to radians).
let ref_coord = EquCoord::from_degrees(45.0, 0.0, 20.0, 0.0);
let plane = TangentPlane::new(ref_coord);
// Project a nearby sky position.
let target = EquCoord::from_degrees(45.5, 0.0, 20.5, 0.0);
let tp = plane.project(&target);
// Inverse projection: recover equatorial coordinates.
let sky = tp.unproject();
// Squared Euclidean distance between two projected points (radians²).
let other = plane.project(&EquCoord::from_degrees(45.1, 0.0, 20.1, 0.0));
let d2 = tp.dist2(&other);
// Translate a projected point by a displacement vector.
let v = TangentVec { dx: 1e-3, dy: -1e-3 };
let shifted = tp + v;§Parallel iteration
Requires the parallel feature.
When the parallel feature is enabled, observation_dataset::ObsDataset gains a
family of par_iter_* methods that return
rayon::iter::ParallelIterator
values instead of standard iterators. These methods take &self and can be called
while other shared borrows of the dataset are live.
use photom::observation_dataset::ObsDataset;
use rayon::iter::ParallelIterator;
// Iterate over every observation in parallel.
let count = dataset.par_iter_observations().count();
// Iterate over every (NightId, &Observation) pair in parallel.
// Returns None if the dataset was built without a night_id column.
if let Some(par_iter) = dataset.par_iter_full_night() {
par_iter.for_each(|(night_id, obs)| {
println!("night {:?}: obs id {}", night_id, obs.id());
});
}§The polars Feature
Polars-based ingestion is gated behind the optional polars feature. To enable it,
add the following to your Cargo.toml:
[dependencies]
photom = { version = "0.1", features = ["polars"] }Without this feature the crate is still fully usable: all types, constants, and
astrometric utilities are available; only the from_polars and from_lazy
constructors on observation_dataset::ObsDataset are absent.
§The parallel Feature
Parallel iteration is gated behind the optional parallel feature, which brings in
rayon as a dependency. To enable it, add the following to
your Cargo.toml:
[dependencies]
photom = { version = "0.1", features = ["parallel"] }The parallel feature can be combined freely with the polars feature:
photom = { version = "0.1", features = ["polars", "parallel"] }When enabled, every method documented in
observation_dataset::parallel becomes available on
observation_dataset::ObsDataset. All parallel methods take &self, so they do
not conflict with outstanding shared borrows of the dataset.
§The ades Feature
ADES ingestion is gated behind the optional ades feature. To enable it, add the
following to your Cargo.toml:
[dependencies]
photom = { version = "0.1", features = ["ades"] }Without this feature the crate is still fully usable: all types, constants, and
astrometric utilities are available; only the from_ades constructor on
observation_dataset::ObsDataset is absent.
The ades feature can be combined freely with the polars and parallel features:
photom = { version = "0.1", features = ["polars", "parallel", "ades"] }§Loading an ADES file
use photom::observation_dataset::ObsDataset;
// error_ra and error_dec are optional fallback uncertainties in arcseconds,
// used when the XML record does not supply rmsRA/rmsDec or precRA/precDec.
let dataset = ObsDataset::from_ades("observations.xml", Some(0.5), Some(0.5))?;§Uncertainty resolution (per observation, in precedence order)
rmsRA/rmsDecpresent in the XML record → used directly (arcseconds).precRA/precDecpresent → used as the uncertainty (arcseconds).- Fallback
error_ra/error_decarguments → applied uniformly when neither of the above fields is available.
§Observer representation
Each observation’s stn field is stored as an
observer::dataset::ObserverId::MpcCode and resolved lazily from the MPC
observatory list the first time accuracy values are requested.
§The datafusion Feature
Parquet ingestion via Apache Arrow and DataFusion is gated behind the optional
datafusion feature. When enabled it brings in the datafusion and object_store
crates and exposes both an async entry-point
(ObsDataset::from_parquet_uri) and a synchronous blocking wrapper on the
same function, so callers without an async runtime can still use the loader.
To enable it, add the following to your Cargo.toml:
[dependencies]
photom = { version = "0.1", features = ["datafusion"] }Without this feature the crate is still fully usable: all types, constants, and
astrometric utilities remain available; only the from_parquet_uri constructor on
observation_dataset::ObsDataset is absent.
The datafusion feature can be combined freely with the other optional features:
photom = { version = "0.1", features = ["polars", "parallel", "ades", "datafusion"] }§Parquet URI scheme
The loader accepts any URI that object_store can resolve. The following schemes are
supported out of the box:
| Scheme | Backend |
|---|---|
file:// | Local filesystem |
http:// | Plain HTTP object store |
https:// | TLS-encrypted HTTP object store |
hdfs:// | Hadoop Distributed File System (requires the hdfs Cargo feature on object_store) |
§Parquet column schema
The Parquet file must contain the following Arrow-typed columns. Column names and Arrow types must match exactly; the loader returns an error for any schema mismatch.
§Mandatory base columns (non-nullable)
| Column | Arrow type | Description |
|---|---|---|
id | UInt64 | Unique observation identifier |
ra | Float64 | Right ascension (radians) |
ra_err | Float64 | Right ascension uncertainty (radians) |
dec | Float64 | Declination (radians) |
dec_err | Float64 | Declination uncertainty (radians) |
magnitude | Float64 | Apparent magnitude |
mag_err | Float64 | Magnitude uncertainty |
filter | Utf8, UInt8, UInt16, or UInt32 | Photometric filter label or code |
mjd_tt | Float64 | Epoch (MJD, Terrestrial Time) |
§Optional observer columns (nullable; column may be absent)
| Column | Arrow type | Description |
|---|---|---|
obs_lon | Float64 | Geodetic longitude (radians, east positive) |
obs_lat | Float64 | Geodetic latitude (radians) |
obs_alt | Float64 | Altitude above ellipsoid (metres) |
obs_ra_acc | Float64 | RA accuracy (radians) — required when the geodetic triplet is set |
obs_dec_acc | Float64 | Dec accuracy (radians) — required when the geodetic triplet is set |
mpc_code_obs | Utf8 | Three-byte ASCII MPC code (takes precedence over geodetic columns) |
§Optional index columns
| Column | Arrow type | Description |
|---|---|---|
night_id | UInt32 | Night identifier; nullable — null rows are included but not assigned to any night |
traj_id | UInt32 or Utf8 | Trajectory identifier; nullable — null rows are loaded but not assigned to any trajectory |
§Loading a Parquet file
use photom::observation_dataset::ObsDataset;
use photom::io::datafusion::LoadObsArgs;
let dataset = ObsDataset::from_parquet_uri(
"file:///data/observations.parquet",
LoadObsArgs::default(),
)?;§Ingestion arguments (LoadObsArgs)
Both the async and the blocking variants of from_parquet_uri accept a LoadObsArgs
value that controls how the ingestion pipeline behaves. Use LoadObsArgs::default()
to get sensible out-of-the-box settings, or construct the struct explicitly to override
individual fields.
| Field | Type | Default | Description |
|---|---|---|---|
error_model | Option<ObsErrorModel> | None | Astrometric error model used to assign accuracies to MPC-coded observatories; None leaves MPC observer accuracies unset until ObsDataset::set_error_model is called |
contiguous_choice | Option<ContiguousChoice> | Some(ContiguousNight) | Which grouping column (if any) to sort the query by before collecting; sorting allows the corresponding index to use compact contiguous ranges instead of per-row index vectors (see below) |
The contiguous_choice field (defaulting to ContiguousNight) causes DataFusion to
append an ORDER BY clause to the internal SQL query before collecting the record
batches. As a result, all observations belonging to the same night occupy a contiguous
block in the output observations vector, enabling the night index to store a compact
(start, end) range instead of a Vec of scattered positions. This is the same
contiguous index optimisation applied by the Polars loader via FromPolarsArgs.
§Minimum Supported Rust Version
photom requires Rust 1.94.0 or later.
Re-exports§
pub use io::mpc_80_col::Mpc80ColError;pub use io::ades::AdesError;pub use io::serde::IndexLayout;pub use io::serde::ObsDatasetSeed;pub use observation_dataset::builder::LoadWarning;pub use observation_dataset::builder::ObsDatasetBuilder;pub use crate::traj_id::TrajId;
Modules§
- constants
- Physical and astronomical constants used throughout the crate.
- coordinates
- Celestial coordinate types and coordinate-system conversions.
- io
- I/O backends for loading astronomical observation data into
photomtypes. - observation_
dataset - Core observation data types for the photom crate.
- observer
- Observer metadata and geodetic conversion utilities.
- photometry
- Photometric measurement types used throughout the pipeline.
- traj_id
Structs§
- NightId
- Logical identifier for a night of observation.