interpolize 1.0.0

a rust program that scrapes discord, learns how your friends talk, and generates new messages in their collective voice. yes, this is what we're doing with our lives.
interpolize-1.0.0 is not a library.

interpolize

a rust program that scrapes discord, learns how your friends talk, and generates new messages in their collective voice. yes, this is what we're doing with our lives.

what it does

  • scrapes messages from discord channels using your user token
  • builds word embeddings from scratch because apparently we hate ourselves
  • trains a variable-order markov chain on the corpus
  • interpolates between channel "vibes" using weighted centroid math
  • spits out responses in a pretty terminal shell

why

good question.

install

git clone <repo>
cd interpolize
cargo build --release
cp config.example.toml config.toml

then fill in config.toml with your token and channel IDs. you know what you're doing.

usage

./interpolize scrape    # steal messages
./interpolize train     # build embeddings (slow, go touch grass)
./interpolize chat      # talk to the excuse that is this program

config

[discord]
token = "your_token_here"

[embeddings]
storage_path = "embeddings.bin"
vector_dim = 128
window_size = 4

[retrieval]
style_k = 8
context_k = 5
thread_depth = 4

[[channels]]
id = "123456789"
name = "general"
weight = 0.5
scrape_limit = 1000

weights don't need to sum to 1. we do the math so you don't have to.

how it actually works

  1. builds a co-occurrence matrix from all scraped messages
  2. applies PPMI to get meaningful word relationships
  3. runs truncated SVD via power iteration to get dense vectors
  4. computes per-channel centroids, interpolates them by weight
  5. at query time: KNN search for relevant messages + style examples
  6. feeds both into a variable-order markov chain biased toward the style vector
  7. streams output token by token so it looks cooler than it is

caveats

  • the SVD is slow on large vocabs. that's the price of not using dependencies like a normal person.
  • output quality depends entirely on how much your friends type. pray they're prolific.
  • interpolize may or may not generate messages that would interest INTERPOL. not our problem.

license

do whatever