docbox-core 0.5.0

Docbox core business logic and functionality
Documentation
<h1>
<img src="assets/box.svg" width="22px" height="22px" />
Docbox
</h1>

![Tests Status](https://img.shields.io/github/actions/workflow/status/docbox-nz/docbox/test.yml?style=for-the-badge&label=Tests)

[**Website →**](https://docbox-nz.pages.dev/)

**Docbox** is a modern, multi-tenant file management, processing, and search platform designed to seamlessly integrate into your application. It provides powerful capabilities for securely storing, processing, and retrieving documents.

**Docbox** is designed to run **behind your main service**, where your application acts as a **proxy**, forwarding requests to Docbox **after performing authentication and access control**. This keeps the core service secure and private.

Files, folders, and links are organized into logical groups called **Document Boxes**. Each Document Box is uniquely identified by a **scope string** (e.g. `user:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx:files`). The scope is embedded directly in the URL (e.g. `/box/user:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx:files/file/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx`), enabling efficient and reliable access control without requiring inspecting requests beyond the request path.

## 🚀 Features

- 📂 File Management
    - Uploading and renaming files
    - Folders
    - "Link" files
- 🔍 Full-Text Search
    - Powerful Typesense based queries
    - Full text down to page-by-page level results
- ⚙️ Document Processing
    - Generate PDF files for viewing office files in browsers
        - Powered by LibreOffice generates PDF files from common office formats
    - Extract text from documents
        - Full text searchable extracted text, on a page-by-page basis
    - Generate thumbnails for PDF files (and office files)
        - Multiple sized thumbnails for browser based viewing performance
    - Email (.eml message/rfc822) file processing
        - Inlining images
        - Extracting and processing attachments
        - Text and HTML extraction
        - Full text searchable email contents
- 🌐 Web-scraping
    - Built in support for scraping OGP metadata, favicons, and other details from links stored within
- 🏢 Multi-Tenancy
    - Fully isolated per-tenant database design for secure data separation
- 🔑 HTTPS support
    - Supported for using HTTPS for server->docbox communication provided by [rustls]https://github.com/rustls/rustls

## 📦 Use Cases

- Secure, structured document storage
- Browser-based previews and thumbnails for office documents
- Email ingestion and indexing for discovery or archiving

## 🛠️ Technology Stack

- **Backend**: Rust
- **Database**: PostgreSQL
- **Search Engine**: Typesense, Opensearch, or PostgresSQL
- **Storage**: S3-compatible object storage (e.g. AWS S3, MinIO)

![Docbox Diagram](assets/docbox.drawio.png)