Expand description
Parse and convert chat exports from messaging platforms into LLM-friendly formats.
§Overview
Chatpack provides a unified API for parsing chat exports from popular messaging platforms and converting them into formats optimized for Large Language Models. It handles platform-specific quirks (encoding issues, date formats, message types) and provides tools for filtering, merging, and exporting messages.
Supported platforms:
| Platform | Export Format | Special Handling |
|---|---|---|
| Telegram | JSON | Service messages, forwarded messages |
| TXT | Auto-detects 4 locale-specific date formats | |
| JSON | Fixes Mojibake encoding from Meta exports | |
| Discord | JSON/TXT/CSV | Attachments, stickers, replies |
§Quick Start
use chatpack::prelude::*;
// Parse Telegram export
let parser = create_parser(Platform::Telegram);
let messages = parser.parse("export.json".as_ref())?;
// Filter, merge, and export
let filtered = apply_filters(messages, &FilterConfig::new().with_sender("Alice"));
let merged = merge_consecutive(filtered);
write_csv(&merged, "output.csv", &OutputConfig::default())?;§Core Concepts
§Message
Message is the universal representation of a chat message across all platforms:
use chatpack::Message;
let msg = Message::new("Alice", "Hello, world!");
assert_eq!(msg.sender, "Alice");
assert_eq!(msg.content, "Hello, world!");§Parser Trait
All platform parsers implement the Parser trait, providing
a consistent interface:
use chatpack::parser::Parser;
use chatpack::parsers::WhatsAppParser;
let parser = WhatsAppParser::new();
// Parse from file
let messages = parser.parse("chat.txt".as_ref())?;
// Or parse from string
let content = "[1/15/24, 10:30:45 AM] Alice: Hello";
let messages = parser.parse_str(content)?;§Common Patterns
§Filter by Date Range
use chatpack::prelude::*;
let messages = vec![
Message::new("Alice", "Old message"),
Message::new("Bob", "Recent message"),
];
let filter = FilterConfig::new()
.with_date_from("2024-01-01")?
.with_date_to("2024-12-31")?;
let filtered = apply_filters(messages, &filter);§Merge Consecutive Messages
Combine messages from the same sender within a time window:
use chatpack::prelude::*;
let messages = vec![
Message::new("Alice", "Hello"),
Message::new("Alice", "How are you?"),
Message::new("Bob", "I'm fine!"),
];
let merged = merge_consecutive(messages);
assert_eq!(merged.len(), 2); // Alice's messages merged
assert!(merged[0].content.contains("Hello"));
assert!(merged[0].content.contains("How are you?"));§Stream Large Files
Process files larger than available memory:
use chatpack::prelude::*;
let parser = create_streaming_parser(Platform::Telegram);
for result in parser.stream("huge_export.json".as_ref())? {
let msg = result?;
println!("{}: {}", msg.sender, msg.content);
}§Export to Multiple Formats
use chatpack::prelude::*;
let messages = vec![Message::new("Alice", "Hello!")];
let config = OutputConfig::new().with_timestamps();
// CSV - best for LLM context (13x token compression)
write_csv(&messages, "output.csv", &config)?;
// JSON - structured array for APIs
write_json(&messages, "output.json", &config)?;
// JSONL - one object per line for RAG pipelines
write_jsonl(&messages, "output.jsonl", &config)?;§Module Structure
| Module | Description |
|---|---|
parser | Unified parser API with Parser trait and Platform enum |
parsers | Platform-specific implementations: TelegramParser, WhatsAppParser, etc. |
config | Parser configurations: TelegramConfig, WhatsAppConfig, etc. |
core | Core types: Message, OutputConfig, FilterConfig |
streaming | Memory-efficient streaming parsers for large files |
format | Output formats: OutputFormat, write_to_format |
error | Error types: ChatpackError, Result |
prelude | Convenient re-exports for common usage |
§Feature Flags
Enable only the features you need to minimize compile time and dependencies:
| Feature | Description | Dependencies |
|---|---|---|
telegram | Telegram JSON parser | serde_json |
whatsapp | WhatsApp TXT parser | regex |
instagram | Instagram JSON parser | serde_json |
discord | Discord multi-format parser | serde_json, regex, csv |
csv-output | CSV output writer | csv |
json-output | JSON/JSONL output writers | serde_json |
streaming | Streaming parsers for large files | - |
async | Async parser support | tokio |
full | All features (default) | all above |
# Cargo.toml - minimal configuration
[dependencies]
chatpack = { version = "0.5", default-features = false, features = ["telegram", "csv-output"] }§Serialization
All public types implement serde::Serialize and serde::Deserialize:
use chatpack::Message;
let msg = Message::new("Alice", "Hello!");
let json = serde_json::to_string(&msg).unwrap();
let parsed: Message = serde_json::from_str(&json).unwrap();
assert_eq!(msg.content, parsed.content);Re-exports§
pub use error::ChatpackError;pub use error::Result;pub use message::Message;
Modules§
- config
- Configuration types for parsers and output.
- core
- Core processing logic for chatpack.
- error
- Unified error types for chatpack.
- format
- Output format types for the chatpack library.
- message
- Universal message type for all chat platforms.
- parser
- Unified parser API for chat exports.
- parsers
- Platform-specific chat export parsers.
- parsing
- Shared parsing utilities for all platforms.
- prelude
- Convenient re-exports for common usage patterns.
- progress
- Progress reporting types for long-running operations.
- streaming
- Memory-efficient streaming parsers for large chat exports.