Skip to main content

Crate photom

Crate photom 

Source
Expand description

Rust library for loading, structuring, and querying astronomical observation datasets — with trajectory grouping, multi-observer support, and efficient lookups.

photom provides a type-safe pipeline for ingesting astrometric and photometric measurements, associating them with ground-based observatories, and grouping them into trajectories of moving objects. The library is designed around one primary dataset type — observation_dataset::ObsDataset for flat observation collections.

§Features

  • Polars ingestion (polars feature) — load observations from a DataFrame or LazyFrame with full schema validation.
  • Parallel iteration (parallel feature) — process observations, nights, and trajectories in parallel via rayon with zero data copying.
  • ADES ingestion (ades feature) — load observations directly from MPC ADES XML files (observation_dataset::ObsDataset::from_ades), supporting both structured (obsBlock/obsContext) and flat formats, with automatic MPC observer resolution.
  • MPC 80-column ingestion (mpc_80_col feature) — load observations from the classic MPC fixed-width 80-column ASCII format (observation_dataset::ObsDataset::from_mpc_80_col), with automatic trajectory grouping and nom-based field parsing.
  • Parquet ingestion via DataFusion (datafusion feature) — load observations from any Parquet file reachable by URI (file://, http://, https://, hdfs://) using Apache Arrow / DataFusion (observation_dataset::ObsDataset::from_parquet_uri and its async counterpart), with automatic contiguous index optimisation.
  • Serialisation / deserialisation (serde feature) — persist and restore an observation_dataset::ObsDataset (and all constituent types) via serde. Runtime-only state — the lazy MPC observatory cache, and all derived index maps — is excluded from the serialised form and rebuilt transparently on deserialisation.
  • Multi-observer support — MPC observatory codes (resolved lazily from the MPC website), custom geodetic sites (interned and deduplicated), or unknown observer.
  • Trajectory grouping — group observations by a traj_id column; supports both integer (UInt64) and string (String) identifiers.
  • Three astrometric error models — FCCT14, CBM10, and VFCC17, used to assign measurement accuracies to MPC-coded observatories.

§Modules

ModuleDescription
coordinatesCelestial coordinate types and coordinate-system conversions
coordinates::equatorialcoordinates::equatorial::EquCoord — equatorial sky position (RA, Dec) with 1-σ uncertainties, Vincenty angular separation, spherical midpoint, and covariance propagation
coordinates::cartesiancoordinates::cartesian::CartesianCoord / coordinates::cartesian::CartesianCoordCov — Cartesian unit-sphere position with optional 3×3 covariance and inverse propagation back to equatorial coordinates
coordinates::cov2coordinates::cov2::Cov2 — symmetric 2×2 covariance matrix for tangent-plane error ellipses; eigenvalues, Mahalanobis distance, and isotropic inflation
coordinates::gnomonic_projectioncoordinates::gnomonic_projection::TangentPlane / coordinates::gnomonic_projection::TangentPoint / coordinates::gnomonic_projection::TangentVec — gnomonic (tangent-plane) projection between equatorial sky coordinates and a local 2-D Cartesian frame
photometryPhotometric measurement types: apparent magnitude, uncertainty, and bandpass filter (photometry::Photometry, photometry::Filter)
observation_datasetCore observation types (observation_dataset::observation::Observation, observation_dataset::ObsDataset)
observerGround-based observatory representation (observer::Observer) and geodetic utilities
observer::error_modelAstrometric error model variants (observer::error_model::ObsErrorModel: FCCT14, CBM10, VFCC17)
constantsPhysical and geodetic constants (Earth axes, AU, etc.)
ioInternal ingestion backends (Polars adapter, schema validation)
io::adesADES XML ingestion backend (io::ades)
io::mpc_80_colMPC 80-column ingestion backend (io::mpc_80_col)
io::datafusionDataFusion/Arrow Parquet ingestion backend (io::datafusion)

§Type Aliases

The crate exports five primitive type aliases used throughout the API to make units explicit in function signatures:

AliasUnderlying typeUnit
Arcsecondsf64Angle in arcseconds
Radiansf64Angle in radians
Degreesf64Angle in degrees
MJDTTf64Modified Julian Date (Terrestrial Time) in days
Metersf64Distance in metres

§DataFrame Schema

Requires the polars feature.

When loading data via observation_dataset::ObsDataset::from_polars or observation_dataset::ObsDataset::from_lazy, the input frame must conform to the following column layout.

§Mandatory base columns (non-nullable)

ColumnPolars typeDescription
idUInt64Unique observation identifier
raFloat64Right ascension (radians)
ra_errFloat64Right ascension uncertainty (radians)
decFloat64Declination (radians)
dec_errFloat64Declination uncertainty (radians)
magnitudeFloat64Apparent magnitude
mag_errFloat64Magnitude uncertainty
filterStringPhotometric filter label
mjd_ttFloat64Epoch (MJD, Terrestrial Time)

§Optional observer columns (nullable; column may be absent)

ColumnPolars typeDescription
obs_lonFloat64Geodetic longitude (radians, east positive)
obs_latFloat64Geodetic latitude (radians)
obs_altFloat64Altitude above ellipsoid (metres)
obs_ra_accFloat64RA accuracy (radians) — required when the geodetic triplet is set
obs_dec_accFloat64Dec accuracy (radians) — required when the geodetic triplet is set
mpc_code_obsStringThree-byte ASCII MPC code (takes precedence over geodetic columns)

§Optional grouping columns

ColumnPolars typeDescription
traj_idUInt32 or StringTrajectory identifier; nullable — null rows are loaded into the ObsDataset but are not assigned to any trajectory
night_idUInt32Night identifier; nullable — null rows are included in the ObsDataset but are not assigned to any night

§Observer resolution (per row, in precedence order)

  1. mpc_code_obs non-null → observer::dataset::ObserverId::MpcCode (MPC site, resolved lazily).
  2. obs_lon, obs_lat, and obs_alt all non-null → observer::dataset::ObserverId::IntId (geodetic site; obs_ra_acc and obs_dec_acc must also be non-null).
  3. Otherwise → no observer (None).

A partially-null geodetic triplet or a complete triplet without accuracy values causes the ingestion to return an error.

§Ingestion arguments (FromPolarsArgs)

Requires the polars feature.

Both observation_dataset::ObsDataset::from_polars and observation_dataset::ObsDataset::from_lazy accept a FromPolarsArgs value that controls how the ingestion pipeline behaves. Use FromPolarsArgs::default() to get sensible out-of-the-box settings, or construct the struct explicitly to override individual fields.

FieldTypeDefaultDescription
error_modelOption<ObsErrorModel>NoneAstrometric error model used to assign accuracies to MPC-coded observatories; None leaves MPC observer accuracies unset until ObsDataset::set_error_model is called
do_rechunkOption<bool>Some(false)When true, forces all multi-chunk columns to be merged into a single contiguous Arrow chunk before ingestion; set to Some(false) when the caller has already guaranteed single-chunk layout (e.g. after reading a Parquet file with rechunk: true)
contiguous_choiceOption<ContiguousChoice>Some(ContiguousNight)Which grouping column (if any) to sort the frame by before iteration; sorting allows the corresponding index to use compact contiguous ranges instead of per-row index vectors (see below)

§Contiguous index optimisation (ContiguousChoice)

By default the ingestion pipeline sorts the input frame by night_id (ContiguousChoice::ContiguousNight) so that all observations belonging to the same night occupy a single contiguous block in the output observations vector. This lets the night index store a compact (start, end) range for each night instead of a Vec of scattered positions, which saves memory and improves cache locality during sequential and parallel night iteration.

Setting contiguous_choice to ContiguousChoice::ContiguousTraj applies the same optimisation to trajectories instead. Setting it to None disables the sort entirely; both indices will use the Vec-based split representation.

Only one grouping column can be made contiguous at a time. The other column (if present in the frame) is always built as a split index.

use photom::io::polars::{ContiguousChoice, FromPolarsArgs};
use photom::observer::error_model::ObsErrorModel;
use photom::observation_dataset::ObsDataset;

// Sort by traj_id so trajectory iteration is more efficient.
let dataset = ObsDataset::from_polars(
    &df,
    FromPolarsArgs {
        error_model: Some(ObsErrorModel::FCCT14),
        contiguous_choice: Some(ContiguousChoice::ContiguousTraj),
        ..Default::default()
    },
)?;

§Usage Examples

§Build a minimal DataFrame and load observations

use polars::prelude::*;
use photom::observation_dataset::ObsDataset;
use photom::observer::error_model::ObsErrorModel;

// Construct a two-row DataFrame matching the required schema.
// RA and Dec are in radians; errors are in radians.
// Observer accuracy columns (obs_ra_acc, obs_dec_acc) are also in radians.
let df = df! {
    "id"        => &[1_u64, 2_u64],
    "ra"        => &[1.4633_f64, 1.4682_f64],   // radians
    "ra_err"    => &[1.745e-5_f64, 1.745e-5_f64], // radians (~1 arcsec)
    "dec"       => &[0.3840_f64, 0.3847_f64],   // radians
    "dec_err"   => &[1.745e-5_f64, 1.745e-5_f64], // radians (~1 arcsec)
    "magnitude" => &[19.3_f64, 19.5_f64],
    "mag_err"   => &[0.05_f64, 0.05_f64],
    "filter"    => &["r", "r"],
    "mjd_tt"    => &[60000.0_f64, 60000.03_f64],
}?;

let dataset = ObsDataset::from_polars(&df, ObsErrorModel::FCCT14, Some(1000))?;
for obs in dataset.iter_observations() {
    println!("{} {:?}", obs.id, obs.equ_coord);
}

§Use an MPC observatory code

Add an optional mpc_code_obs column (String, nullable) to associate each observation with an MPC-registered observatory. The accuracy values for MPC sites are derived from the chosen ObsErrorModel.

use polars::prelude::*;
use photom::observation_dataset::ObsDataset;
use photom::observer::error_model::ObsErrorModel;

let df = df! {
    "id"           => &[1_u64],
    "ra"           => &[1.4633_f64],          // radians
    "ra_err"       => &[1.745e-5_f64],        // radians
    "dec"          => &[0.3840_f64],           // radians
    "dec_err"      => &[1.745e-5_f64],        // radians
    "magnitude"    => &[19.3_f64],
    "mag_err"      => &[0.05_f64],
    "filter"       => &["r"],
    "mjd_tt"       => &[60000.0_f64],
    "mpc_code_obs" => &[Some("F51")],   // Haleakalā Pan-STARRS 1
}?;

let dataset = ObsDataset::from_polars(&df, ObsErrorModel::FCCT14, None)?;

§Group observations by trajectory

use polars::prelude::*;
use photom::observation_dataset::ObsDataset;
use photom::io::polars::FromPolarsArgs;
use photom::TrajId;

// traj_id can be UInt32 or String; null rows are loaded but not grouped.
let df = df! {
    "id"        => &[1_u64, 2_u64, 3_u64],
    "ra"        => &[1.4633_f64, 1.4682_f64, 0.1745_f64],  // radians
    "ra_err"    => &[1.745e-5_f64; 3],                      // radians
    "dec"       => &[0.3840_f64, 0.3847_f64, 0.0873_f64],  // radians
    "dec_err"   => &[1.745e-5_f64; 3],                      // radians
    "magnitude" => &[19.3_f64, 19.5_f64, 18.0_f64],
    "mag_err"   => &[0.05_f64; 3],
    "filter"    => &["r", "r", "g"],
    "mjd_tt"    => &[60000.0_f64, 60000.03_f64, 60001.0_f64],
    "traj_id"   => &[Some("2020 AV2"), Some("2020 AV2"), None],
}?;

let dataset = ObsDataset::from_polars(&df, FromPolarsArgs::default())?;
let tid = TrajId::Str("2020 AV2".to_owned());
if let Some(iter) = dataset.iter_trajectory_observations(&tid) {
    println!("{} observations in trajectory", iter.count());
}

§Load observations from a LazyFrame

use photom::observation_dataset::ObsDataset;
use photom::observer::error_model::ObsErrorModel;

// Any DataFrame can be turned into a LazyFrame with .lazy().
let dataset = ObsDataset::from_lazy(df.lazy(), ObsErrorModel::VFCC17, None)?;

§Coordinate utilities

coordinates::equatorial::EquCoord bundles a sky position (RA, Dec) with its 1-σ uncertainties, all stored in radians.

use photom::coordinates::equatorial::EquCoord;

// Construct from degrees — values are converted to radians internally.
let a = EquCoord::from_degrees(10.0, 0.001, 20.0, 0.001);
let b = EquCoord::from_degrees(10.5, 0.001, 20.5, 0.001);

// Great-circle separation via the Vincenty formula (result in radians).
let sep = a.angular_separation(&b);

// Vector-averaging midpoint on the sphere.
let mid = a.spherical_midpoint(&b);

To propagate astrometric uncertainties through the spherical-to-Cartesian mapping use coordinates::equatorial::EquCoordCov::to_cartesian_cov, which returns a coordinates::cartesian::CartesianCoordCov containing the full 3×3 covariance matrix. The inverse conversion is coordinates::cartesian::CartesianCoordCov::to_equatorial_cov.

§2-D covariance on the tangent plane

coordinates::cov2::Cov2 is a compact symmetric 2×2 covariance matrix designed for astrometric error ellipses expressed in a local tangent-plane frame. It supports eigenvalue decomposition, Mahalanobis distance, and isotropic inflation.

use photom::coordinates::cov2::Cov2;
use photom::coordinates::equatorial::EquCoord;
use photom::coordinates::gnomonic_projection::TangentVec;

// Build from the marginal 1-σ errors of an EquCoord.
let coord = EquCoord::from_degrees(45.0, 0.001, 20.0, 0.002);
let cov = Cov2::from_equ(&coord);

// Semi-axes of the 1-σ confidence ellipse.
let sigma_major = cov.lambda_max().max(0.0).sqrt();
let sigma_minor = cov.lambda_min().max(0.0).sqrt();

// Mahalanobis distance for an offset vector (radians).
let offset = TangentVec { dx: 1e-4, dy: 0.0 };
if let Some(d2) = cov.mahalanobis_sq(offset) {
    let _ = d2.sqrt(); // normalised distance
}

// Add isotropic process noise q·I (Kalman-style inflation).
let q = 1e-8_f64;
let inflated = cov.inflate_isotropic(q);

§Gnomonic (tangent-plane) projection

coordinates::gnomonic_projection::TangentPlane projects sky positions near a chosen tangent point $(\alpha_0, \delta_0)$ onto a local 2-D Cartesian frame. Great circles project to straight lines, making this representation well-suited for short-arc astrometry and kinematic linking.

use photom::coordinates::equatorial::EquCoord;
use photom::coordinates::gnomonic_projection::{TangentPlane, TangentVec};

// Define the tangent point (degrees, converted internally to radians).
let ref_coord = EquCoord::from_degrees(45.0, 0.0, 20.0, 0.0);
let plane = TangentPlane::new(ref_coord);

// Project a nearby sky position.
let target = EquCoord::from_degrees(45.5, 0.0, 20.5, 0.0);
let tp = plane.project(&target);

// Inverse projection: recover equatorial coordinates.
let sky = tp.unproject();

// Squared Euclidean distance between two projected points (radians²).
let other = plane.project(&EquCoord::from_degrees(45.1, 0.0, 20.1, 0.0));
let d2 = tp.dist2(&other);

// Translate a projected point by a displacement vector.
let v = TangentVec { dx: 1e-3, dy: -1e-3 };
let shifted = tp + v;

§Parallel iteration

Requires the parallel feature.

When the parallel feature is enabled, observation_dataset::ObsDataset gains a family of par_iter_* methods that return rayon::iter::ParallelIterator values instead of standard iterators. These methods take &self and can be called while other shared borrows of the dataset are live.

use photom::observation_dataset::ObsDataset;
use rayon::iter::ParallelIterator;

// Iterate over every observation in parallel.
let count = dataset.par_iter_observations().count();

// Iterate over every (NightId, &Observation) pair in parallel.
// Returns None if the dataset was built without a night_id column.
if let Some(par_iter) = dataset.par_iter_full_night() {
    par_iter.for_each(|(night_id, obs)| {
        println!("night {:?}: obs id {}", night_id, obs.id());
    });
}

§The polars Feature

Polars-based ingestion is gated behind the optional polars feature. To enable it, add the following to your Cargo.toml:

[dependencies]
photom = { version = "0.1", features = ["polars"] }

Without this feature the crate is still fully usable: all types, constants, and astrometric utilities are available; only the from_polars and from_lazy constructors on observation_dataset::ObsDataset are absent.

§The parallel Feature

Parallel iteration is gated behind the optional parallel feature, which brings in rayon as a dependency. To enable it, add the following to your Cargo.toml:

[dependencies]
photom = { version = "0.1", features = ["parallel"] }

The parallel feature can be combined freely with the polars feature:

photom = { version = "0.1", features = ["polars", "parallel"] }

When enabled, every method documented in observation_dataset::parallel becomes available on observation_dataset::ObsDataset. All parallel methods take &self, so they do not conflict with outstanding shared borrows of the dataset.

§The ades Feature

ADES ingestion is gated behind the optional ades feature. To enable it, add the following to your Cargo.toml:

[dependencies]
photom = { version = "0.1", features = ["ades"] }

Without this feature the crate is still fully usable: all types, constants, and astrometric utilities are available; only the from_ades constructor on observation_dataset::ObsDataset is absent.

The ades feature can be combined freely with the polars and parallel features:

photom = { version = "0.1", features = ["polars", "parallel", "ades"] }

§Loading an ADES file

use photom::observation_dataset::ObsDataset;

// error_ra and error_dec are optional fallback uncertainties in arcseconds,
// used when the XML record does not supply rmsRA/rmsDec or precRA/precDec.
let dataset = ObsDataset::from_ades("observations.xml", Some(0.5), Some(0.5))?;

§Uncertainty resolution (per observation, in precedence order)

  1. rmsRA / rmsDec present in the XML record → used directly (arcseconds).
  2. precRA / precDec present → used as the uncertainty (arcseconds).
  3. Fallback error_ra / error_dec arguments → applied uniformly when neither of the above fields is available.

§Observer representation

Each observation’s stn field is stored as an observer::dataset::ObserverId::MpcCode and resolved lazily from the MPC observatory list the first time accuracy values are requested.

§The datafusion Feature

Parquet ingestion via Apache Arrow and DataFusion is gated behind the optional datafusion feature. When enabled it brings in the datafusion and object_store crates and exposes both an async entry-point (ObsDataset::from_parquet_uri) and a synchronous blocking wrapper on the same function, so callers without an async runtime can still use the loader. To enable it, add the following to your Cargo.toml:

[dependencies]
photom = { version = "0.1", features = ["datafusion"] }

Without this feature the crate is still fully usable: all types, constants, and astrometric utilities remain available; only the from_parquet_uri constructor on observation_dataset::ObsDataset is absent.

The datafusion feature can be combined freely with the other optional features:

photom = { version = "0.1", features = ["polars", "parallel", "ades", "datafusion"] }

§Parquet URI scheme

The loader accepts any URI that object_store can resolve. The following schemes are supported out of the box:

SchemeBackend
file://Local filesystem
http://Plain HTTP object store
https://TLS-encrypted HTTP object store
hdfs://Hadoop Distributed File System (requires the hdfs Cargo feature on object_store)

§Parquet column schema

The Parquet file must contain the following Arrow-typed columns. Column names and Arrow types must match exactly; the loader returns an error for any schema mismatch.

§Mandatory base columns (non-nullable)

ColumnArrow typeDescription
idUInt64Unique observation identifier
raFloat64Right ascension (radians)
ra_errFloat64Right ascension uncertainty (radians)
decFloat64Declination (radians)
dec_errFloat64Declination uncertainty (radians)
magnitudeFloat64Apparent magnitude
mag_errFloat64Magnitude uncertainty
filterUtf8, UInt8, UInt16, or UInt32Photometric filter label or code
mjd_ttFloat64Epoch (MJD, Terrestrial Time)

§Optional observer columns (nullable; column may be absent)

ColumnArrow typeDescription
obs_lonFloat64Geodetic longitude (radians, east positive)
obs_latFloat64Geodetic latitude (radians)
obs_altFloat64Altitude above ellipsoid (metres)
obs_ra_accFloat64RA accuracy (radians) — required when the geodetic triplet is set
obs_dec_accFloat64Dec accuracy (radians) — required when the geodetic triplet is set
mpc_code_obsUtf8Three-byte ASCII MPC code (takes precedence over geodetic columns)

§Optional index columns

ColumnArrow typeDescription
night_idUInt32Night identifier; nullable — null rows are included but not assigned to any night
traj_idUInt32 or Utf8Trajectory identifier; nullable — null rows are loaded but not assigned to any trajectory

§Loading a Parquet file

use photom::observation_dataset::ObsDataset;
use photom::io::datafusion::LoadObsArgs;

let dataset = ObsDataset::from_parquet_uri(
    "file:///data/observations.parquet",
    LoadObsArgs::default(),
)?;

§Ingestion arguments (LoadObsArgs)

Both the async and the blocking variants of from_parquet_uri accept a LoadObsArgs value that controls how the ingestion pipeline behaves. Use LoadObsArgs::default() to get sensible out-of-the-box settings, or construct the struct explicitly to override individual fields.

FieldTypeDefaultDescription
error_modelOption<ObsErrorModel>NoneAstrometric error model used to assign accuracies to MPC-coded observatories; None leaves MPC observer accuracies unset until ObsDataset::set_error_model is called
contiguous_choiceOption<ContiguousChoice>Some(ContiguousNight)Which grouping column (if any) to sort the query by before collecting; sorting allows the corresponding index to use compact contiguous ranges instead of per-row index vectors (see below)

The contiguous_choice field (defaulting to ContiguousNight) causes DataFusion to append an ORDER BY clause to the internal SQL query before collecting the record batches. As a result, all observations belonging to the same night occupy a contiguous block in the output observations vector, enabling the night index to store a compact (start, end) range instead of a Vec of scattered positions. This is the same contiguous index optimisation applied by the Polars loader via FromPolarsArgs.

§Minimum Supported Rust Version

photom requires Rust 1.94.0 or later.

Re-exports§

pub use io::mpc_80_col::Mpc80ColError;
pub use io::ades::AdesError;
pub use io::serde::IndexLayout;
pub use io::serde::ObsDatasetSeed;
pub use observation_dataset::builder::LoadWarning;
pub use observation_dataset::builder::ObsDatasetBuilder;
pub use crate::traj_id::TrajId;

Modules§

constants
Physical and astronomical constants used throughout the crate.
coordinates
Celestial coordinate types and coordinate-system conversions.
io
I/O backends for loading astronomical observation data into photom types.
observation_dataset
Core observation data types for the photom crate.
observer
Observer metadata and geodetic conversion utilities.
photometry
Photometric measurement types used throughout the pipeline.
traj_id

Structs§

NightId
Logical identifier for a night of observation.

Traits§

ToNotNan

Type Aliases§

Arcseconds
Arcseconds.
Degrees
Degrees.
MJDTT
Modified Julian Date (Terrestrial Time).
Meters
Meters.
ObsIndex
Zero-based position of an observation inside the observations vector of ObsDataset.
Radians
Radians.