Struct Database

Source

pub struct Database<'a> { /* private fields */ }

Expand description

Top-level handle to a buffered NSF file.

Holds a borrowed slice of the full file bytes. Cheap to construct - no copies are made. The parser walks the file lazily; consumers pay for what they enumerate.

Implementations§

Source §

impl<'a> Database<'a>

Source

pub fn open(bytes: &'a [u8]) -> Result<Self, NsfError>

Open an NSF from a full-file byte buffer. Validates the file header and DBINFO; lazy on everything else.

Parsed database header.

Source

pub fn has_data_rrv(&self) -> bool

True when the database carries a populated data RRV bucket. A fresh / never-instantiated template will return false here - it has design notes via the non-data RRV but no data notes.

Source

pub fn data_rrv_iter( &self, ) -> Result<Option<(RrvBucketHeader, RrvIter<'a>)>, NsfError>

Parse + iterate the data RRV bucket if present. Returns the bucket header for diagnostics plus an iterator over the non-empty RRV entries.

The data RRV bucket’s file position is reported in 256-byte units in DBINFO; this method converts to a byte offset and reads rrv_bucket_size bytes from that point.

Source

pub fn data_note_count(&self) -> Result<u64, NsfError>

Convenience: count non-empty entries in the data RRV. Walks the bucket but does not retain the per-entry state.

Source

pub fn has_non_data_rrv(&self) -> bool

True when the database carries a populated non-data RRV bucket. Design notes (forms, views) and, in databases like fakenames.nsf, the bulk of document notes are reached through the non-data RRV rather than the data RRV.

Source

pub fn non_data_rrv_iter( &self, ) -> Result<Option<(RrvBucketHeader, RrvIter<'a>)>, NsfError>

Parse + iterate the non-data RRV bucket if present. Mirrors Self::data_rrv_iter but reads from non_data_rrv_bucket_position. Most bucket-slot RRV entries (the ones Self::resolve_bucket_slot resolves) live here.

Source

pub fn data_rrv_take(&self, limit: usize) -> Result<Vec<RrvEntry>, NsfError>

Collect at most limit RRV entries from the data RRV for preview / list rendering. Useful for “show the first 200 notes in the viewer” without walking 40,000 entries up front.

Source

pub fn information2(&self) -> Result<Information2, NsfError>

Parse the database information extension block 2 (file offset 520, 124 bytes). Carries the 4 superblock positions + 2 BDB positions plus bucket-size knobs.

Source

pub fn superblocks(&self) -> Result<Vec<(usize, Superblock)>, NsfError>

Parse every populated superblock copy (skipping uninitialized slots). Each entry is (slot_index, Superblock) so callers can report which copy was loaded. Domino allocates 4 slots and rotates commits across them; instantiated databases typically have 3 populated and 1 empty, with the freshest by modification_time authoritative (use Self::freshest_superblock).

Forensic-tool-grade resilience: slots are skipped silently when any of these conditions hold, rather than crashing the load:

Slot is empty (position or size zero).
Slot’s declared byte offset extends past the file end.
Slot’s body does not start with the superblock signature 0E 00. This catches fresh-template uninitialized regions that Domino allocates with allocation_granularity but never commits to (empirically these are filled with AA AA AA AA, e.g. SB3 of comparedbs.ntf).

Other parse failures (e.g. unexpected short read mid-header) are not expected in practice with a fully-buffered NSF and would surface as errors. The 3-redundant-copy WAL guarantees that silently dropping an unreadable slot leaves at least one valid copy.

Source

pub fn freshest_superblock( &self, ) -> Result<Option<(usize, Superblock)>, NsfError>

Convenience: parse all populated superblocks and return the freshest one by modification_time. The other three copies are write-ahead-log redundancy and should be ignored once this one is loaded. Returns None if no superblock slots are populated (extremely rare; would indicate a partially-initialized NSF).

Source

pub fn decompressed_superblock_body(&self) -> Result<Option<Vec<u8>>, NsfError>

Decompress the freshest superblock’s body (the CX-compressed region that carries the bucket-descriptor array). Returns None when the database has no superblock.

Body layout from the superblock byte offset, per the reference: [0,100) header, then the compressed region of length size - 112 (100-byte header + 12-byte footer removed), of which the first 4 bytes are a prefix the decompressor skips. The decompressed length is the header’s uncompressed_size field.

Source

pub fn summary_bucket_offsets(&self) -> Result<Vec<u64>, NsfError>

Build the global summary-bucket descriptor map: a 0-based vector of file byte offsets where offsets[bucket_index - 1] is the byte offset of the summary bucket an RRV bucket-slot entry’s bucket_index refers to (bucket_index is 1-based on disk).

§Multi-page geometry

On modern ODS the summary bucket descriptors are spread across number_of_summary_bucket_descriptor_pages pages. The decompressed superblock body begins with a page index of (pages - 1) stride-14 records (the page’s file_position is the first 4 bytes of each record); those point to the out-of-body pages. The final (resident) page’s descriptor array is inline in the body at SUMMARY_RESIDENT_PREFIX + (pages - 1) * SUMMARY_DESCRIPTOR_BYTES. Single-page databases (pages <= 1) have only the resident page at the libnsfdb-documented offset 224.

libnsfdb itself only handles a single descriptor page (it errors on > 1), so the multi-page geometry here was reverse-engineered and validated against the rrv_identifier identity oracle (see Self::enumerate_notes). The out-of-body page header size ([OUT_OF_BODY_PAGE_HEADER]) and per-page descriptor count ([PER_OUT_OF_BODY_PAGE]) are empirical constants; mis-fits surface as identity-gate failures in Self::enumerate_notes rather than as silently wrong records.

Source

pub fn resolve_bucket_slot( &self, bucket_index: u32, slot_index: u16, ) -> Result<&'a [u8], NsfError>

Resolve a single RRV bucket-slot pair to the raw bytes of the slot’s record, using the summary-bucket descriptor map.

This is the physical resolution step: it does not identity-check the result. For verified note enumeration (where each resolved record is confirmed to carry the requested rrv_identifier), use Self::enumerate_notes. Rebuilds the descriptor map on each call; callers resolving many entries should prefer enumerate_notes, which builds the map once.

Source

pub fn bucket_descriptor_block( &self, ) -> Result<Option<BucketDescriptorBlock>, NsfError>

Parse the freshest Bucket Descriptor Block (BDB) - the master index of every RRV bucket in the database. Returns None when no BDB slot is populated (a fresh / never-instantiated shell). Of the two BDB copies in Information2 (primary + write-ahead-log redundancy) the one with the higher write_count is authoritative.

Source

pub fn enumerate_notes(&self) -> Result<NoteEnumeration, NsfError>

Enumerate every note in the database by walking the BDB -> all RRV buckets -> each RRV entry, resolving each to a note record.

Every resolution is identity-gated: a note is only accepted if the resolved record’s rrv_identifier (note header offset 6) equals the RRV entry’s identifier. This is the chain-of-custody guarantee - a record is never returned unless it provably is the note the RRV entry points to. Entries that no candidate resolves under the gate are counted in unresolved rather than returned as possibly-wrong evidence.

§Group-marker recovery

A small set of summary-descriptor slots (the page’s group-boundary slots) carry group-marker flag bits inside the file_position field: the low nibble, or bits 16-19 (in which case the true high nibble matches the locally-sequential neighbours). For each bucket-slot entry the resolver tries the raw descriptor first, then these marker-corrected candidates, accepting the first that passes the identity gate. Because acceptance requires an exact 32-bit rrv_identifier match, a wrong candidate cannot be accepted - the recovery is heuristic in what it tries but never in what it returns.

Source

pub fn non_summary_data(&self, note: &ResolvedNote) -> Option<&'a [u8]>

Return a note’s non-summary data object - the separately-stored large payload that holds rich-text ($Body / mail bodies), file attachments (OBJECT items), and other items too big for the inline summary. None when the note has no non-summary data.

Location: non_summary_data_identifier << 8 is the byte offset of the object, which opens with a header - signature 0x0010, then a u32 size and the owning note’s u32 rrv_identifier (both validated here) - followed by the payload (a CD-record stream for rich text, or object segments for attachments). The returned slice is the whole object including that header; record-level decoding (CD records, attachment extraction) is a later slice.

Source

pub fn note_content(&self, note: &ResolvedNote) -> Option<NoteContent>

Decode a note’s rich-text body and attachments from its non-summary data (CD-record stream). Returns None when the note has no non-summary data or it decodes to nothing. See crate::cd.

Source

pub fn note_items(&self, note: &ResolvedNote) -> Vec<NoteItem<'a>>

Parse the items (fields) of a resolved note: each item’s name id, type/flags, and raw value bytes. See crate::item for the layout and what is / isn’t decoded (field-name resolution is a later slice).

The record window is bounded to the note’s declared size so item values cannot read into a neighbouring record.