seqcompress 0.1.0

A toy compression algorithm that combines sequences of bytes into smaller strings
Documentation
  • Coverage
  • 0%
    0 out of 7 items documented0 out of 4 items with examples
  • Size
  • Source code size: 9.55 kB This is the summed size of all the files inside the crates.io package for this release.
  • Documentation size: 369.38 kB This is the summed size of all files generated by rustdoc for all configured targets
  • Ø build duration
  • this release: 8s Average build duration of successful builds.
  • all releases: 8s Average build duration of successful builds in releases after 2024-10-23.
  • Links
  • jewlexx/sqc
    0 0 0
  • crates.io
  • Dependencies
  • Versions
  • Owners
  • jewlexx

Sequential Compression

This is a proof-of-concept, "toy", compression algorithm that takes sequences of matching bytes and represents them as text.

This allows for unparalleled compression for a string such as aaaaaaaaaaaaaaaaaaaaaaaaaaa, but falls apart with anything more complex.

For example the string above would be represented as 97x27, 97 being the UTF-8, binary representation of the character "a". This representation takes up a mere 5 bytes whereas the uncompressed string takes up 27 bytes.

Benchmarks

I compressed a 1GB file was 0 (not the character a binary 0) repeated 8 billion times. This was compressed to the string 0x8000000000, which is equivalent to about a 9999999985% decrease in size.

By comparison if you try to compress the Cargo.toml file in this repository, it goes from 180 bytes to 977 bytes which is a 442.778% increase in size.

Spec

Every compression algorithm needs a specification :)

  • Compressed files must be represented as plain text files using UTF-8 encoding.
  • Every sequence of bytes is represented as the byte, represented as an integer, followed by the character x, followed by the number of times it was repeated.
  • Every sequence is newline delimited

Made with 💗 by Juliette Cordor