# interpolize
a rust program that scrapes discord, learns how your friends talk, and generates new messages in their collective voice. yes, this is what we're doing with our lives.
## what it does
- scrapes messages from discord channels using your user token
- builds word embeddings from scratch because apparently we hate ourselves
- trains a variable-order markov chain on the corpus
- interpolates between channel "vibes" using weighted centroid math
- spits out responses in a pretty terminal shell
## why
good question.
## install
```bash
git clone <repo>
cd interpolize
cargo build --release
cp config.example.toml config.toml
```
then fill in `config.toml` with your token and channel IDs. you know what you're doing.
## usage
```bash
./interpolize scrape # steal messages
./interpolize train # build embeddings (slow, go touch grass)
./interpolize chat # talk to the excuse that is this program
```
## config
```toml
[discord]
token = "your_token_here"
[embeddings]
storage_path = "embeddings.bin"
vector_dim = 128
window_size = 4
[retrieval]
style_k = 8
context_k = 5
thread_depth = 4
[[channels]]
id = "123456789"
name = "general"
weight = 0.5
scrape_limit = 1000
```
weights don't need to sum to 1. we do the math so you don't have to.
## how it actually works
1. builds a co-occurrence matrix from all scraped messages
2. applies PPMI to get meaningful word relationships
3. runs truncated SVD via power iteration to get dense vectors
4. computes per-channel centroids, interpolates them by weight
5. at query time: KNN search for relevant messages + style examples
6. feeds both into a variable-order markov chain biased toward the style vector
7. streams output token by token so it looks cooler than it is
## caveats
- the SVD is slow on large vocabs. that's the price of not using dependencies like a normal person.
- output quality depends entirely on how much your friends type. pray they're prolific.
- interpolize may or may not generate messages that would interest INTERPOL. not our problem.
## license
do whatever