* content checksums
* incorporate the content soundex from raap
see: https://phext.io/api.php?seed=raap&token=research&coordinate=1.1.1/1.1.1/1.1.2
* conversion methods
* tar
* zip
* sqlite
* local file system
* non-linear flows
Q: what happens to information as it flows along a path of phext coordinates?
say we want to define stable regions early in the file...
we could define a phext-based mask to assist with indexing
* hierarchical mobs
* DB emulation
* Liquid Peanut Butter
* it's a bit nutty!
* fast indexing
* checksum forking: record expected offsets and checksums in .checksum files
* hierarchy map in memory
* memory-mapped I/O
* investigate data sources
* https://huggingface.co/learn/nlp-course/chapter2/4?fw=pt
* https://commoncrawl.org/get-started
* https://medium.com/@zolayola/public-data-sets-in-the-era-of-llms-0a4e89bda658