ts_zip: Text Compression using Large Language Models

The ts_zip utility can compress (and hopefully decompress) text files using a Large Language Model. The compression ratio is much higher than with other compression tools. There are some caveats of course:

Compression Ratio

The compression ratio is given in bits per byte (bpb).

File Original size
(bytes)
xz
(bytes) (bpb)
ts_zip
(bytes) (bpb)
alice29.txt 152089 48492 2.551 21713 1.142
book1 768771 261116 2.717 137477 1.431
enwik8 100000000 24865244 1.989 13825741 1.106
enwik9 1000000000 213370900 1.707 135443237 1.084
linux-1.2.13.tar 9379840 1689468 1.441 1196859 1.021

Results and speed for other programs on enwik8 and enwik9 are available at the Large Text Compression Benchmark.

Download

Technical information


Fabrice Bellard - https://bellard.org/