chatpack
Rust library for converting chat exports into compact, LLM- and RAG-ready data.
API Docs | Export Guide | Benchmarks | Website
Overview
chatpack is the core Rust crate behind the Chatpack ecosystem. It parses chat exports from Telegram, WhatsApp, Instagram, and Discord, normalizes them into one Message type, and writes token-efficient CSV, JSON, or JSONL output for LLM analysis, RAG ingestion, archival, and analytics.
Raw messenger exports often spend most of their tokens on nested JSON, repeated field names, and metadata. In a real Telegram sample with 34,478 messages, CSV output reduced 11.2M raw-export tokens to about 850K tokens: 13.2x smaller.
| Platform | Input | Notes |
|---|---|---|
| Telegram | JSON | Parses Telegram Desktop result.json, formatted text, replies, edits, and service-message filtering |
| TXT | Auto-detects US and European date formats, multiline messages, media placeholders, and common system messages | |
| JSON | Parses Meta message_*.json files, fixes common mojibake, and returns chronological messages |
|
| Discord | JSON, TXT, CSV | Supports DiscordChatExporter outputs, attachments, stickers, replies, and edited timestamps where available |
Install
Minimal builds can opt into only the parsers and writers they need:
[]
= { = "0.6.0", = false, = ["telegram", "csv-output"] }
Quick Start
use Path;
use *;
Common Workflows
Parse from a string when the export is already in memory:
use *;
Stream large files when loading the full export is not practical:
use Path;
use *;
Choose output based on the downstream task:
| Output | Best for | Why |
|---|---|---|
| CSV | LLM context windows, spreadsheets | Most compact; sender/content only by default |
| JSONL | RAG, vector DB ingestion, streaming pipelines | One message per line |
| JSON | APIs, archival, structured post-processing | Full JSON array |
Optional metadata is controlled by OutputConfig:
let compact = new;
let detailed = all;
let timestamps_only = new.with_timestamps;
Feature Flags
The default feature set is full, which enables every parser, CSV/JSON output, and streaming support.
| Feature | Description | Default |
|---|---|---|
full |
All parsers, outputs, and streaming | Yes |
telegram |
Telegram JSON parser | Yes |
whatsapp |
WhatsApp TXT parser | Yes |
instagram |
Instagram JSON parser | Yes |
discord |
Discord JSON/TXT/CSV parser | Yes |
csv-output |
CSV writer and string conversion | Yes |
json-output |
JSON and JSONL writers/string conversion | Yes |
streaming |
Native streaming parsers and progress tracking | Yes |
async |
Tokio-based async parser support, currently Telegram | No |
Documentation
| Resource | Description |
|---|---|
| API Docs | Public Rust API, modules, traits, and examples |
| Export Guide | How to prepare Telegram, WhatsApp, Instagram, and Discord files |
| Benchmarks | Compression data, current benchmark groups, and local benchmark commands |
| examples/library_usage.rs | Basic library usage patterns |
| examples/rag_integration.rs | Example chunking flow for RAG systems |
Related Tools
This repository is the Rust core library. Other Chatpack tools live separately:
| Repository | Purpose |
|---|---|
| chatpack-cli | Command-line interface |
| chatpack-web | Browser/WASM interface |
| chatpack-python | Python bindings |
Development
The crate uses Rust 2024 edition, so Rust 1.85 or newer is required. CI currently builds and tests on stable Rust across Linux, macOS, and Windows.