normalize_document

Function normalize_document 

Source
pub fn normalize_document(doc: ParsedDocument) -> ParsedDocument
Expand description

Normalizes a parsed document by applying all field-level normalizations.

This is the primary entry point for normalizing documents after parsing. It ensures consistent processing regardless of how the document was created.

§Normalization Steps

This function applies all normalizations in the correct order:

  1. Unicode NFC normalization - Field names are normalized to NFC form
  2. Bidi stripping - Invisible bidirectional control characters are removed
  3. HTML comment fence fixing - Trailing text after --> is preserved
  4. Guillemet conversion - <<text>> is converted to «text» in BODY fields
  5. Chevron stripping - <<text>> is stripped to text in other fields

§When to Use

Call this function after parsing and before rendering:

use quillmark_core::{ParsedDocument, normalize::normalize_document};

let markdown = "---\ntitle: Example\n---\n\nBody with <<placeholder>>";
let doc = ParsedDocument::from_markdown(markdown).unwrap();
let normalized = normalize_document(doc);
// Use normalized document for rendering...

§Direct API Usage

If you’re constructing a ParsedDocument directly via crate::parse::ParsedDocument::new rather than parsing from markdown, you MUST call this function to ensure consistent normalization:

use quillmark_core::{ParsedDocument, QuillValue, normalize::normalize_document};
use std::collections::HashMap;

// Direct construction (e.g., from API or database)
let mut fields = HashMap::new();
fields.insert("title".to_string(), QuillValue::from_json(serde_json::json!("Test")));
fields.insert("BODY".to_string(), QuillValue::from_json(serde_json::json!("<<content>>")));

let doc = ParsedDocument::new(fields);
let normalized = normalize_document(doc);

// Body now has guillemets converted
assert_eq!(normalized.body().unwrap(), "«content»");

§Idempotency

This function is idempotent - calling it multiple times produces the same result. However, for performance reasons, avoid unnecessary repeated calls.