cf-file-parser 0.1.22

File Parser module
docs.rs failed to build cf-file-parser-0.1.22
Please check the build logs for more information.
See Builds for ideas on how to fix a failed build, or Metadata for how to configure docs.rs builds.
If you believe this is docs.rs' fault, open an issue.
Visit the last successful build: cf-file-parser-0.1.21

File Parser Module

File parsing module for CyberFabric / ModKit.

Overview

The cf-file-parser crate implements the file-parser module and registers REST routes.

All document extraction is handled by a single unified backend — kreuzberg =4.9.4 — which replaces the previous per-format library set (tl, pdf-extract, calamine, pptx-to-md).

Supported formats:

Extension(s) Format
pdf PDF
html, htm HTML
xlsx, xls, xlsm, xlsb Excel spreadsheets
pptx PowerPoint presentations

Configuration

modules:
  file-parser:
    config:
      max_file_size_mb: 100
      # Required. Only files under this directory are accessible via parse-local.
      # Symlinks that resolve outside this directory are also blocked.
      allowed_local_base_dir: /data/documents

Security: Local Path Restrictions

The parse-local endpoints validate requested file paths before any filesystem access:

  1. Paths containing .. components are always rejected.
  2. The requested path is canonicalized (symlinks resolved) and must fall under allowed_local_base_dir.
  3. allowed_local_base_dir is required — the module will fail to start if it is missing or the path cannot be resolved.

License

This module is licensed under Apache-2.0.

Third-party dependency: kreuzberg

This module depends on kreuzberg, pinned at =4.9.4 (Elastic License 2.0).

Version range License
≤ 4.7.4 MIT
≥ 4.8.0 (including =4.9.4 used here) Elastic License 2.0 (EL-2.0)

ℹ️ EL-2.0 is permitted for this use case. The deny.toml license policy includes an explicit exception for kreuzberg =4.9.4 with documented rationale: CyberFabric's document parsing is incidental to the platform — it is not sold as a standalone document-parsing product competing with kreuzberg.

EL-2.0 key restrictions to be aware of:

  • You may not provide the software (or a product whose primary functionality is substantially the same as kreuzberg) to third parties as a hosted or managed service.
  • You may not build a product sold primarily as a document-parsing service that competes with kreuzberg.

The dependency is pinned with =4.9.4 in Cargo.toml to prevent silent upgrades. Any version bump must be reviewed for license changes and approved by the maintainers before merging.