Expand description
pdfmuse-core — deterministic PDF/DOCX parser core.
The naive parse() lands in PER-33 and the self-written content-stream
interpreter (the real value) in PER-36. The unified IR — the data foundation
that every binding serializes byte-identically — lives in ir.
Re-exports§
pub use error::PdfmuseError;pub use error::Result;
Modules§
- backend
- Pluggable vision backend — the ML boundary.
- error
- Structured error type.
- ir
- Unified intermediate representation (IR).
Structs§
- Chunk
- A retrieval unit: a block’s text plus the context needed to cite it.
Enums§
Functions§
- chunk
- Split
docinto chunks (one per non-empty block), tracking heading context. - parse
- Parse
datainto the unifiedir::Document. - parse_
with_ password - Like
parse, but supplies apasswordfor encrypted PDFs. - to_json
- Serialize the entire
Documentto pretty-printed JSON. - to_
markdown - Render
docto GitHub-flavored Markdown, pages and blocks in order. - to_text
- Render
docto plain reading-order text — no Markdown syntax, just the block text joined by newlines. The cheapest useful output for search / ATS / feeding an LLM, and (via the bindings) avoids materializing the full IR on the host.