timeglyph 0.2.0

Forensic timestamp decipherment — decode, encode, and identify the many ways systems inscribe time, with scored, cited, ambiguity-first interpretation.
Documentation
//! `timeglyph` — forensic timestamp decipherment.
//!
//! A timestamp is *time inscribed as a symbol* — the raw integer or bytes a
//! system writes to mean an instant. This crate deciphers those inscriptions:
//! it decodes a known format to an instant, encodes an instant to any format,
//! and — the differentiator — **identifies** an unknown value by reporting every
//! plausible interpretation, *scored, with stated assumptions*, never "the
//! answer" (a single integer is usually underdetermined).
//!
//! # Design (see HANDOFF.md for the full record)
//! - Canonical spine: [`PosixNs`] — nanoseconds since the Unix epoch, proleptic
//!   Gregorian, **leap-second-ignoring (POSIX)**. It is *not* called UTC: UTC has
//!   discontinuities POSIX pretends away. Leap-aware scales (TAI/GPS/NTP) get
//!   their own instant types (to be added behind a `hifitime` feature).
//! - Calendar/tz math is **reused** (`jiff`), never reinvented. The value-add is
//!   the cited forensic format registry + scored auto-detection + byte decode.
//! - Panic-free (Paranoid Gatekeeper): every length/offset/width is checked.
//!
//! # Example
//!
//! ```
//! // Identify an unknown value: every plausible reading, ranked and scored —
//! // never a single verdict (a raw value is usually underdetermined).
//! let candidates = timeglyph::interpret::interpret_int(1_577_836_800);
//! let top = &candidates[0];
//! assert_eq!(top.format_id, "unix");
//! assert_eq!(top.rendered.as_deref(), Some("2020-01-01T00:00:00Z"));
//!
//! // Or decode under one known format by id.
//! let filetime = timeglyph::format("filetime").unwrap();
//! let instant = filetime.decode_int(132_223_104_000_000_000).unwrap();
//! assert_eq!(instant.to_rfc3339().as_deref(), Some("2020-01-01T00:00:00Z"));
//! ```
//!
//! # Further reading
//!
//! The authoritative, primary-source-cited reference for every supported format —
//! epochs, encodings, calendars, leap seconds, and the rollovers that eventually
//! break them — lives at <https://securityronin.github.io/timeglyph/>.
#![cfg_attr(test, allow(clippy::unwrap_used, clippy::expect_used))]

pub mod csv_enrich;
pub mod interpret;
/// Leap-aware time scales (GPS/TAI/NTP), behind the `leap` feature. Kept
/// separate from the POSIX [`PosixNs`] spine (HANDOFF §3).
#[cfg(feature = "leap")]
pub mod leap;
pub mod registry;

/// Errors from decoding, encoding, or rendering a timestamp.
#[derive(Debug, thiserror::Error)]
pub enum ChronoError {
    /// A value (or intermediate) fell outside the representable range.
    #[error("value out of representable range ({what}): {value}")]
    OutOfRange {
        /// What overflowed (e.g. "nanoseconds", "ticks").
        what: &'static str,
        /// The offending value.
        value: i128,
    },
    /// No format with the given id is registered.
    #[error("unknown format id: {0}")]
    UnknownFormat(String),
    /// Rendering the instant to a civil string failed (outside jiff's range).
    #[error("cannot render instant: {0}")]
    Render(String),
}

/// The canonical internal instant: **nanoseconds since 1970-01-01, POSIX
/// (leap-ignoring), proleptic Gregorian**. `i128` because some source epochs sit
/// >1e19 ns from Unix (FILETIME's 1601 epoch alone is ~1.16e19 ns), which
/// overflows `i64` — the wide spine is load-bearing, not luxury.
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, serde::Serialize)]
pub struct PosixNs(pub i128);

impl PosixNs {
    /// The Unix epoch (the zero of this scale).
    pub const UNIX_EPOCH: Self = Self(0);

    /// Render as an RFC 3339 / ISO 8601 UTC string. Returns `None` when the
    /// instant is outside the civil range `jiff` can represent (≈ years
    /// -9999..=9999) — surfaced as absence, never a panic.
    #[must_use]
    pub fn to_rfc3339(self) -> Option<String> {
        jiff::Timestamp::from_nanosecond(self.0)
            .ok()
            .map(|ts| ts.to_string())
    }
}

/// The tick unit a format counts in.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum Unit {
    /// Whole seconds.
    Seconds,
    /// Milliseconds (Java/JS).
    Millis,
    /// Microseconds (Chrome/WebKit, PostgreSQL).
    Micros,
    /// 100-nanosecond intervals (FILETIME, .NET ticks).
    HundredNanos,
    /// Nanoseconds (APFS, Unix-ns).
    Nanos,
    /// Whole days (OLE Automation / Excel serial — usually fractional).
    Days,
}

impl Unit {
    /// Nanoseconds per tick of this unit.
    #[must_use]
    pub const fn nanos(self) -> i128 {
        match self {
            Self::Seconds => 1_000_000_000,
            Self::Millis => 1_000_000,
            Self::Micros => 1_000,
            Self::HundredNanos => 100,
            Self::Nanos => 1,
            Self::Days => 86_400 * 1_000_000_000,
        }
    }

    /// Decimal digits of *sub-second* resolution this unit can express
    /// (seconds/days → 0, millis → 3, micros → 6, 100-nanos → 7, nanos → 9).
    /// Drives auto-detect granularity scoring: a whole-second raw value is a
    /// poor fit for a finer unit, so it is penalised, never hidden.
    #[must_use]
    pub const fn sub_second_digits(self) -> u32 {
        match self {
            Self::Seconds | Self::Days => 0,
            Self::Millis => 3,
            Self::Micros => 6,
            Self::HundredNanos => 7,
            Self::Nanos => 9,
        }
    }
}

/// How a stored value maps to an instant.
#[derive(Debug, Clone, Copy)]
pub enum Strategy {
    /// `value` (integer ticks) × `unit` + `epoch_ns` = [`PosixNs`].
    LinearInt {
        /// The format's epoch as nanoseconds relative to the Unix epoch.
        epoch_ns: i128,
        /// The tick unit.
        unit: Unit,
    },
    /// `value` (floating ticks, e.g. OLE days as `f64`) × `unit` + `epoch_ns`.
    /// Lossy by nature; the registry entry must flag the precision caveat.
    LinearFloat {
        /// The format's epoch as nanoseconds relative to the Unix epoch.
        epoch_ns: i128,
        /// The tick unit.
        unit: Unit,
    },
    /// An ID with an embedded millisecond timestamp in its high bits: the low
    /// `shift_bits` bits are worker/sequence/random, so `value >> shift_bits` is
    /// milliseconds since `epoch_ns` (Snowflake/Discord/Twitter, and the same
    /// shape as ObjectId/UUIDv7 once those are byte-decoded).
    EmbeddedMillis {
        /// The scheme's epoch as nanoseconds relative to the Unix epoch.
        epoch_ns: i128,
        /// Number of low bits to discard before reading the ms timestamp.
        shift_bits: u32,
    },
    /// A bit-packed civil datetime (FAT/DOS, SYSTEMTIME, exFAT): the integer is
    /// not a linear offset but packed calendar fields, so decoding needs a
    /// dedicated unpacker. The function returns the instant; tz semantics (e.g.
    /// FAT's LOCAL naive time) are carried on the [`Format`] entry.
    Packed(fn(i64) -> Result<PosixNs, ChronoError>),
    // TODO(HANDOFF): SYSTEMTIME / exFAT (offset field) packed layouts;
    // ASN.1 / EXIF / RFC-2822 string forms.
}

/// Timezone semantics of a format's stored value — NOT garnish: FAT stores local
/// time, EXIF often lacks an offset, Event Logs store UTC but display local.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum TzSemantics {
    /// The value denotes UTC (POSIX, leap-ignoring).
    Utc,
    /// The value denotes naive *local* time with no recorded offset (FAT/DOS).
    LocalNaive,
    /// The value carries its own offset (exFAT tz field, EXIF with offset).
    OffsetEmbedded,
}

/// Leap-second semantics — the partition Codex flagged. Most forensic epochs are
/// POSIX (leap-ignoring); only the GPS/TAI/NTP family needs true leap math.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum LeapSemantics {
    /// UTC-labelled but leap-ignoring (pure constant offset to Unix). The norm.
    PosixIgnored,
    /// True leap-aware scale (GPS/TAI/NTP) — handled by a separate instant type.
    LeapAware,
}

/// One forensic timestamp format: evidence metadata, not just a converter.
#[derive(Debug, Clone, Copy)]
pub struct Format {
    /// Stable id (e.g. `"filetime"`).
    pub id: &'static str,
    /// Human label (e.g. `"Windows FILETIME"`).
    pub label: &'static str,
    /// Where it's found / who writes it.
    pub family: &'static str,
    /// How the value maps to an instant.
    pub strategy: Strategy,
    /// Authoritative spec citation (clean-room provenance for the paper).
    pub citation: &'static str,
    /// Timezone semantics.
    pub tz: TzSemantics,
    /// Leap-second semantics.
    pub leap: LeapSemantics,
    /// Observed forensic plausibility window `[from, to)` in [`PosixNs`] — used
    /// to rank auto-detect candidates (NOT to assert a single answer).
    pub plausible: (i128, i128),
}

impl Format {
    /// Decode an integer value under this format. Errors (never panics) on
    /// overflow or on a float-only strategy.
    pub fn decode_int(&self, value: i64) -> Result<PosixNs, ChronoError> {
        match self.strategy {
            Strategy::LinearInt { epoch_ns, unit } => {
                let ticks = i128::from(value);
                let ns = ticks
                    .checked_mul(unit.nanos())
                    .and_then(|t| t.checked_add(epoch_ns))
                    .ok_or(ChronoError::OutOfRange {
                        what: "nanoseconds",
                        value: ticks,
                    })?;
                Ok(PosixNs(ns))
            }
            Strategy::EmbeddedMillis {
                epoch_ns,
                shift_bits,
            } => {
                // IDs are unsigned; a negative value is not a valid ID encoding.
                let raw = u64::try_from(value).map_err(|_| ChronoError::OutOfRange {
                    what: "embedded-id (negative)",
                    value: i128::from(value),
                })?;
                let ms = i128::from(raw >> shift_bits);
                let ns = ms
                    .checked_mul(Unit::Millis.nanos())
                    .and_then(|t| t.checked_add(epoch_ns))
                    .ok_or(ChronoError::OutOfRange {
                        what: "nanoseconds",
                        value: ms,
                    })?;
                Ok(PosixNs(ns))
            }
            Strategy::Packed(decode) => decode(value),
            Strategy::LinearFloat { .. } => Err(ChronoError::OutOfRange {
                what: "float-format decoded as integer",
                value: i128::from(value),
            }),
        }
    }

    /// Decode a floating value (OLE days etc.). Lossy; see `precision` caveat.
    pub fn decode_float(&self, value: f64) -> Result<PosixNs, ChronoError> {
        match self.strategy {
            Strategy::LinearFloat { epoch_ns, unit } => {
                // Reject non-finite or absurd magnitudes rather than let the
                // float→int cast saturate into a plausible-but-wrong instant.
                if !value.is_finite() {
                    return Err(ChronoError::OutOfRange {
                        what: "non-finite float value",
                        value: 0,
                    });
                }
                let scaled = (value * unit.nanos() as f64).round();
                // 1e38 < i128::MAX (~1.7e38): a safe ceiling below the saturating
                // cast boundary, well past any civil-range date.
                if !scaled.is_finite() || scaled.abs() >= 1.0e38 {
                    return Err(ChronoError::OutOfRange {
                        what: "float value out of representable range",
                        value: 0,
                    });
                }
                let ns = (scaled as i128)
                    .checked_add(epoch_ns)
                    .ok_or(ChronoError::OutOfRange {
                        what: "nanoseconds",
                        value: scaled as i128,
                    })?;
                Ok(PosixNs(ns))
            }
            Strategy::LinearInt { .. } | Strategy::EmbeddedMillis { .. } | Strategy::Packed(_) => {
                Err(ChronoError::OutOfRange {
                    what: "integer format decoded as float",
                    value: 0,
                })
            }
        }
    }

    /// Encode an instant to this format's integer value (truncating toward the
    /// epoch at the unit granularity). Errors on overflow / float-only formats.
    pub fn encode_int(&self, instant: PosixNs) -> Result<i64, ChronoError> {
        match self.strategy {
            Strategy::LinearInt { epoch_ns, unit } => {
                let rel = instant
                    .0
                    .checked_sub(epoch_ns)
                    .ok_or(ChronoError::OutOfRange {
                        what: "nanoseconds",
                        value: instant.0,
                    })?;
                let ticks = rel / unit.nanos();
                i64::try_from(ticks).map_err(|_| ChronoError::OutOfRange {
                    what: "ticks",
                    value: ticks,
                })
            }
            Strategy::LinearFloat { .. } => Err(ChronoError::OutOfRange {
                what: "float-format encoded as integer",
                value: 0,
            }),
            Strategy::EmbeddedMillis { .. } => Err(ChronoError::OutOfRange {
                // Encoding would have to invent the worker/sequence low bits; a
                // round-trip is not defined for ID schemes.
                what: "embedded-id format cannot be re-encoded from an instant",
                value: 0,
            }),
            Strategy::Packed(_) => Err(ChronoError::OutOfRange {
                what: "packed format cannot be re-encoded from an instant",
                value: 0,
            }),
        }
    }
}

/// Look up a registered format by id.
pub fn format(id: &str) -> Result<&'static Format, ChronoError> {
    registry::FORMATS
        .iter()
        .find(|f| f.id == id)
        .ok_or_else(|| ChronoError::UnknownFormat(id.to_string()))
}