Skip to main content

Crate fleischwolf_core

Crate fleischwolf_core 

Source
Expand description

Core data model for fleischwolf.

This crate is the Rust counterpart of the docling-core Python package: it owns the unified DoclingDocument representation that every backend produces and every serializer consumes. Keeping it dependency-light and separate from the conversion logic mirrors the Python split between docling-core (the schema) and docling (the converters).

Phase 0 models a simplified, linear node tree that is enough to round-trip through Markdown. The faithful, $ref-based schema that matches docling-core’s JSON wire format lands in Phase 1 (see MIGRATION.md).

Modules§

base64
Minimal standard-alphabet Base64 codec (RFC 4648): encode for embedding image bytes as data: URIs, decode for reading them back out — avoids a dependency for the two things we need.

Structs§

DoclingDocument
The unified, format-agnostic document produced by every backend.
MarkdownStreamer
Incremental Markdown serializer: feed finalized, in-document-order batches of Nodes and receive Markdown chunks whose concatenation is byte-identical to [to_markdown_images] over the same nodes. This is the streaming counterpart of the buffered serializer — used to emit a document’s Markdown in chunks (e.g. page by page, as the parallel PDF pipeline finishes pages) instead of building the whole string up front.
PictureImage
An extracted picture’s raw encoded bytes plus its mimetype and pixel size — the fleischwolf analogue of docling-core’s ImageRef.
Table
A simple row-major table. rows[0] is the header row.

Enums§

DocItemLabel
Semantic role of a document item, mirroring docling-core’s DocItemLabel.
ImageMode
How pictures are rendered (mirrors docling-core’s ImageRefMode).
Node
A single piece of document content.