NNCP: Lossless Data Compression with Neural Networks

NNCP is an experiment to build a practical lossless data compressor with neural networks. The latest version uses a Transformer model.

The papers nncp_v2.1.pdf and nncp.pdf describe the algorithms and results of previous releases of NNCP.

The current release of NNCP is implemented in C and uses LibNC to get better performance than PyTorch.

Compression ratio

Result for enwik8:

Program Compr. size
(bytes)
Ratio
(bpb)
gzip 36 445 2482.92
xz 24 865 2441.99
NNCP (2023-10-21)14 915 2981.19
CMIX (v19) 14 837 9871.19

Result for enwik9:

Program Compr. size
(bytes)
Ratio
(bpb)
Program size
(zip, bytes)
Total
(bytes)
gzip 322 591 995 2.5838 801322 630 796
xz 197 331 816 1.5836 752197 368 568
CMIX (v19) 111,470,932 0.892223 485111 694 417
NNCP (2023-10-21) 106 632 3630.853628 955107 261 318

* The results for the other programs are from the Large Text Compression Benchmark.

Download

Related Links


Fabrice Bellard - https://bellard.org/