Crate chatpack

Crate chatpack 

Source
Expand description

Parse and convert chat exports from messaging platforms into LLM-friendly formats.

§Overview

Chatpack provides a unified API for parsing chat exports from popular messaging platforms and converting them into formats optimized for Large Language Models. It handles platform-specific quirks (encoding issues, date formats, message types) and provides tools for filtering, merging, and exporting messages.

Supported platforms:

PlatformExport FormatSpecial Handling
TelegramJSONService messages, forwarded messages
WhatsAppTXTAuto-detects 4 locale-specific date formats
InstagramJSONFixes Mojibake encoding from Meta exports
DiscordJSON/TXT/CSVAttachments, stickers, replies

§Quick Start

use chatpack::prelude::*;

// Parse Telegram export
let parser = create_parser(Platform::Telegram);
let messages = parser.parse("export.json".as_ref())?;

// Filter, merge, and export
let filtered = apply_filters(messages, &FilterConfig::new().with_sender("Alice"));
let merged = merge_consecutive(filtered);
write_csv(&merged, "output.csv", &OutputConfig::default())?;

§Core Concepts

§Message

Message is the universal representation of a chat message across all platforms:

use chatpack::Message;

let msg = Message::new("Alice", "Hello, world!");
assert_eq!(msg.sender, "Alice");
assert_eq!(msg.content, "Hello, world!");

§Parser Trait

All platform parsers implement the Parser trait, providing a consistent interface:

use chatpack::parser::Parser;
use chatpack::parsers::WhatsAppParser;

let parser = WhatsAppParser::new();

// Parse from file
let messages = parser.parse("chat.txt".as_ref())?;

// Or parse from string
let content = "[1/15/24, 10:30:45 AM] Alice: Hello";
let messages = parser.parse_str(content)?;

§Common Patterns

§Filter by Date Range

use chatpack::prelude::*;

let messages = vec![
    Message::new("Alice", "Old message"),
    Message::new("Bob", "Recent message"),
];

let filter = FilterConfig::new()
    .with_date_from("2024-01-01")?
    .with_date_to("2024-12-31")?;

let filtered = apply_filters(messages, &filter);

§Merge Consecutive Messages

Combine messages from the same sender within a time window:

use chatpack::prelude::*;

let messages = vec![
    Message::new("Alice", "Hello"),
    Message::new("Alice", "How are you?"),
    Message::new("Bob", "I'm fine!"),
];

let merged = merge_consecutive(messages);
assert_eq!(merged.len(), 2); // Alice's messages merged
assert!(merged[0].content.contains("Hello"));
assert!(merged[0].content.contains("How are you?"));

§Stream Large Files

Process files larger than available memory:

use chatpack::prelude::*;

let parser = create_streaming_parser(Platform::Telegram);

for result in parser.stream("huge_export.json".as_ref())? {
    let msg = result?;
    println!("{}: {}", msg.sender, msg.content);
}

§Export to Multiple Formats

use chatpack::prelude::*;

let messages = vec![Message::new("Alice", "Hello!")];
let config = OutputConfig::new().with_timestamps();

// CSV - best for LLM context (13x token compression)
write_csv(&messages, "output.csv", &config)?;

// JSON - structured array for APIs
write_json(&messages, "output.json", &config)?;

// JSONL - one object per line for RAG pipelines
write_jsonl(&messages, "output.jsonl", &config)?;

§Module Structure

ModuleDescription
parserUnified parser API with Parser trait and Platform enum
parsersPlatform-specific implementations: TelegramParser, WhatsAppParser, etc.
configParser configurations: TelegramConfig, WhatsAppConfig, etc.
coreCore types: Message, OutputConfig, FilterConfig
streamingMemory-efficient streaming parsers for large files
formatOutput formats: OutputFormat, write_to_format
errorError types: ChatpackError, Result
preludeConvenient re-exports for common usage

§Feature Flags

Enable only the features you need to minimize compile time and dependencies:

FeatureDescriptionDependencies
telegramTelegram JSON parserserde_json
whatsappWhatsApp TXT parserregex
instagramInstagram JSON parserserde_json
discordDiscord multi-format parserserde_json, regex, csv
csv-outputCSV output writercsv
json-outputJSON/JSONL output writersserde_json
streamingStreaming parsers for large files-
asyncAsync parser supporttokio
fullAll features (default)all above
# Cargo.toml - minimal configuration
[dependencies]
chatpack = { version = "0.5", default-features = false, features = ["telegram", "csv-output"] }

§Serialization

All public types implement serde::Serialize and serde::Deserialize:

use chatpack::Message;

let msg = Message::new("Alice", "Hello!");
let json = serde_json::to_string(&msg).unwrap();
let parsed: Message = serde_json::from_str(&json).unwrap();
assert_eq!(msg.content, parsed.content);

Re-exports§

pub use error::ChatpackError;
pub use error::Result;
pub use message::Message;

Modules§

config
Configuration types for parsers and output.
core
Core processing logic for chatpack.
error
Unified error types for chatpack.
format
Output format types for the chatpack library.
message
Universal message type for all chat platforms.
parser
Unified parser API for chat exports.
parsers
Platform-specific chat export parsers.
parsing
Shared parsing utilities for all platforms.
prelude
Convenient re-exports for common usage patterns.
progress
Progress reporting types for long-running operations.
streaming
Memory-efficient streaming parsers for large chat exports.