Expand description
Document extraction for halldyll-media
Extracts documents from HTML with support for:
- PDF links
- Office documents (Word, Excel, PowerPoint)
- E-books (EPUB)
- Download links
Functionsยง
- count_
by_ type - Count documents by type
- extract_
documents - Extract all documents from HTML document
- extract_
documents_ from_ html - Extract documents from HTML string
- get_
document_ urls - Get all document URLs from HTML
- get_
office_ docs - Get Office documents
- get_
pdfs - Get PDFs only
- has_
documents - Check if HTML has documents