sherlock-nsf-parser
A pure-Rust, read-only parser for IBM / HCL Lotus Notes Storage Facility (NSF) databases.
No FFI. No Notes client. No Domino server. No C library to link. Point it at a .nsf /
.ntf file and read what is inside, from any platform Rust compiles for.
This is, to our knowledge, the first open-source pure-Rust NSF reader. It is the parsing engine behind the Sherlock NSF Viewer forensic GUI.
Why this exists
Organizations spent two decades putting mail, contacts, calendars, and application data into
Lotus Notes. That data is still sitting in .nsf files long after the Notes client was
uninstalled, the Domino server was decommissioned, and the people who ran it moved on.
Reading those files traditionally meant standing up a Notes/Domino environment or buying a five-figure forensic suite. The format is also almost entirely undocumented in public, which is why open tooling has historically depended on the proprietary Notes C API.
This crate reads the on-disk structures directly, with a hard correctness guarantee (see Forensic posture) and zero runtime dependencies.
Status
Active development, pre-1.0. The public API may change between 0.x releases as format
coverage grows.
What works today
- File identification - distinguishes NSF / NTF / NSG / mail.box and related shapes.
- Database header / DBINFO - ODS version, database id (DBID), encryption and template flags, bucket / RRV positions and sizes, Bucket Descriptor Block position.
- Superblock parsing with freshest-of-copies selection (multi-page summary bucket descriptor map).
- Bucket Descriptor Block (BDB) - the master index of every RRV bucket, plus the Unique Name Key table that gives fields their real names and types.
- RRV bucket / entry decoding and bucket-slot resolution.
- Identity-gated full-database note enumeration - walks every note in the database and verifies each resolved record against its own NoteID before trusting it.
- Per-note items with real field names and authoritative typing (TEXT, TEXT_LIST, TIME, NUMBER, FORMULA, COMPOSITE, OBJECT, and more).
- Rich-text bodies - walks the Composite Data (CD) record stream and reconstructs the message body text.
- Attachment extraction - pulls embedded images and files out of the non-summary object stream, byte-for-byte.
- TIMEDATE timestamp decoding (clock view + identifier view) and ODS version mapping.
Not yet
- Decrypting database/item-level encrypted NSFs (encryption is detected and flagged, not
decrypted; requires
.idfile parsing + RSA private key unwrap). - Form-based semantic dispatch (Memo / Person / Appointment field schemas).
- Full file-attachment segment coverage for every CD file-segment variant.
- Writing or replicating NSFs. This crate is read-only, forever.
Forensic posture
This crate is built for evidence work, and two properties are non-negotiable:
- Read-only. The parser never writes to, mmaps for write, or otherwise mutates the source
file. It takes an immutable
&[u8]. The file on disk is exactly what it was. - Identity-gated resolution. NSF addresses notes through layers of indirection (RRV
buckets, bucket slots, file positions). A resolution step is only trusted when the record it
lands on reports the same NoteID (
rrv_identifier) that was used to look it up. Records that do not match are reported as unresolved rather than silently returned. The parser would rather tell you it could not resolve a note than hand you the wrong one.
enumerate_notes() returns both the identity-verified notes and the count of entries it could
not resolve, so you always know the coverage of any extraction.
It is also #![forbid(unsafe_code)]: deterministic parsing, no global state, no panics on
malformed input.
Usage
[]
= "0.1"
use Database;
let bytes = read?;
let db = open?;
// Walk every note, identity-verified.
let result = db.enumerate_notes?;
println!;
// Field names and types live in the Bucket Descriptor Block.
let bdb = db.bucket_descriptor_block?;
for note in &result.notes
# Ok::
Just want to know what a file is?
use ;
let bytes = read?;
match identify_file
# Ok::
Proven against real data
The enumeration and resolution paths are validated against the canonical 142 MB Mindoo
fakenames.nsf Domino directory, where the parser identity-verifies over 42,000 documents and
correctly reconstructs rich-text bodies and attachments (including a multi-megabyte JPEG rebuilt
from its CD image segments). The corpus suite also covers HCL Domino 6.x-9.0.1 templates and
OpenNTF XPages demo databases.
The Sherlock NSF Viewer
This crate is the open-source engine. If you want a polished desktop application on top of it (browse, filter, keyboard navigation, attachment save, structured export, signed chain-of-custody reports), that is the Sherlock NSF Viewer, a commercial forensic tool from Sherlock Forensics. Free to view and browse; Pro unlocks export and reporting.
License
Licensed under the Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0).
Contribution
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be licensed as above, without any additional terms or conditions. Test fixtures from previously-unsupported NSF variants are especially welcome.
Format reference
This crate cross-references Joachim Metz's libnsfdb notes (https://github.com/libyal/libnsfdb) and the public HCL Domino C API documentation, but the note-addressing, item-typing, and CD-record work here was reverse-engineered directly from observed file structures.
Disclaimer
Not affiliated with, endorsed by, or sponsored by IBM or HCL. "Lotus Notes", "Domino", and related marks belong to their respective owners. This is a clean-room reader developed from publicly observable file structures for interoperability and digital-forensics purposes.