interpolize 5.0.1

a rust program that scrapes discord, learns how your friends talk, and generates new messages in their collective voice. yes, this is what we're doing with our lives.
# interpolize

a rust program that scrapes discord, learns how your friends talk, and generates new messages in their collective voice. yes, this is what we're doing with our lives.

## what it does

- scrapes messages from discord channels using your user token
- builds word embeddings from scratch because apparently we hate ourselves
- trains a variable-order markov chain on the corpus
- interpolates between channel "vibes" using weighted centroid math
- spits out responses in a pretty terminal shell

## why

good question.

## install

- **Recommended** get cargo, and install it via cargo

```bash
cargo install interpolize
```

- get it from the [releases]https://codeberg.org/Apzhyn/interpolize/releases, copy it to a directory and copy config.example.toml in the same directory as config.toml, edit the config and it should work.

```bash
git clone <repo>
cd interpolize
cargo build --release
cp config.example.toml config.toml
```

then fill in `config.toml` with your token and channel IDs. you know what you're doing.

## usage

```bash
./interpolize scrape    # steal messages
./interpolize train     # build embeddings (slow, go touch grass)
./interpolize chat      # talk to the excuse that is this program
```

## config

```toml
[discord]
token = "your_token_here"

[embeddings]
storage_path = "embeddings.bin"
vector_dim = 128
window_size = 4

[retrieval]
style_k = 8
context_k = 5
thread_depth = 4

[[channels]]
id = "123456789"
name = "general"
weight = 0.5
scrape_limit = 1000
```

weights don't need to sum to 1. we do the math so you don't have to.

## how it actually works

1. builds a co-occurrence matrix from all scraped messages
2. applies PPMI to get meaningful word relationships
3. runs truncated SVD via power iteration to get dense vectors
4. computes per-channel centroids, interpolates them by weight
5. at query time: KNN search for relevant messages + style examples
6. feeds both into a variable-order markov chain biased toward the style vector
7. streams output token by token so it looks cooler than it is

## caveats

- the SVD is slow on large vocabs. that's the price of not using dependencies like a normal person.
- output quality depends entirely on how much your friends type. pray they're prolific.
- interpolize may or may not generate messages that would interest INTERPOL. not our problem.

## license

do whatever