It has the following characteristics:
Model(3) | CPU Speed (tokens/s) | GPU Speed (tokens/s) |
---|---|---|
gptj_6B_q8 | 12.6 | 84.2 |
flan_t5_xxl_q8 | 12.1 | 83.3 |
gptneox_20B_q4 | 4.3 | 40.8 |
gptneox_20B_q8 | 3.5 | 27.4 |
llama_65B_q4 | 1.4 | 13.8 |
Language Models:
bloom_560M | 2 | 29.176 | 36.8% | 35.8% | 51.4% | 63.7% | 36.0% | 44.7% |
codegen_6B_mono_q4 | 5 | 69.409 | 28.0% | 35.7% | 51.1% | 60.2% | 38.0% | 42.6% |
codegen_6B_mono_q8 | 8 | 67.262 | 28.1% | 35.8% | 50.8% | 60.1% | 39.1% | 42.8% | fairseq_gpt_13B | 27 | 3.567 | 71.9% | 72.7% | 67.5% | 77.6% | 70.1% | 71.9% |
fairseq_gpt_13B_q4 | 9 | 3.646 | 71.2% | 72.5% | 67.6% | 77.4% | 70.6% | 71.9% |
fairseq_gpt_13B_q8 | 15 | 3.565 | 71.8% | 72.7% | 67.2% | 77.7% | 70.0% | 71.9% |
flan_t5_base | 1 | 12.891 | 54.2% | 36.5% | 54.7% | 65.8% | 62.1% | 54.7% |
flan_t5_base_q8 | 1 | 13.098 | 54.2% | 36.4% | 54.2% | 65.7% | 61.8% | 54.5% |
flan_t5_small | 1 | 23.343 | 46.7% | 29.2% | 50.0% | 62.4% | 47.9% | 47.2% |
flan_t5_small_q8 | 1 | 23.449 | 46.7% | 29.2% | 49.7% | 62.4% | 48.2% | 47.2% |
flan_t5_xxl_q4 | 7 | 3.010 | 77.7% | 71.5% | 73.4% | 77.6% | 71.8% | 74.4% |
flan_t5_xxl_q8 | 13 | 3.049 | 77.8% | 72.1% | 75.1% | 77.8% | 73.1% | 75.2% |
flan_ul2_20B_q4 | 12 | - | 74.1% | 24.3% | 51.1% | 49.9% | 78.8% | 55.6% |
flan_ul2_20B_q8 | 22 | - | 74.4% | 24.4% | 52.0% | 50.6% | 77.3% | 55.7% |
gpt2_117M | 1 | 40.110 | 32.9% | 31.1% | 52.1% | 62.9% | 27.3% | 41.3% | gpt2_1558M | 4 | 10.637 | 51.3% | 50.8% | 58.4% | 70.8% | 53.2% | 56.9% |
gpt2_1558M_q8 | 2 | 10.655 | 51.2% | 50.8% | 58.6% | 70.8% | 53.2% | 56.9% |
gpt2_345M | 1 | 18.272 | 43.5% | 39.4% | 53.3% | 67.7% | 43.1% | 49.4% |
gpt2_345M_q8 | 1 | 18.452 | 43.1% | 39.4% | 53.1% | 67.5% | 41.9% | 49.0% |
gpt2_774M | 2 | 12.966 | 47.8% | 45.4% | 55.6% | 70.4% | 48.5% | 53.5% |
gpt2_774M_q8 | 1 | 12.928 | 47.9% | 45.4% | 55.3% | 70.3% | 48.2% | 53.4% | gptj_6B | 13 | 4.124 | 69.0% | 66.2% | 64.8% | 75.5% | 66.9% | 68.5% |
gptj_6B_q4 | 4 | 4.153 | 68.9% | 65.7% | 63.9% | 74.4% | 67.0% | 68.0% |
gptj_6B_q8 | 7 | 4.122 | 69.1% | 66.2% | 64.4% | 75.4% | 66.4% | 68.3% | gptneox_20B | 43 | 3.657 | 72.6% | 71.4% | 65.5% | 77.5% | 73.3% | 72.0% |
gptneox_20B_q4 | 13 | 3.711 | 72.0% | 69.3% | 64.8% | 76.7% | 70.8% | 70.7% |
gptneox_20B_q8 | 23 | 3.659 | 72.6% | 71.3% | 65.8% | 77.3% | 72.9% | 72.0% | llama_13B_q4 | 8 | 3.130 | 77.1% | 78.6% | 72.2% | 78.3% | 77.8% | 76.8% | llama_13B_q8 | 15 | 3.178 | 76.5% | 79.1% | 73.2% | 79.1% | 77.1% | 77.0% | llama_30B_q4 | 20 | 2.877 | 77.5% | 82.4% | 75.7% | 80.2% | 80.2% | 79.2% | llama_30B_q8 | 36 | 2.853 | 77.7% | 82.7% | 76.3% | 80.3% | 80.4% | 79.5% | llama_65B_q4 | 39 | 2.760 | 78.5% | 83.9% | 76.6% | 81.4% | 83.2% | 80.7% | llama_7B | 14 | 3.463 | 73.6% | 76.2% | 70.4% | 78.1% | 75.4% | 74.7% | llama_7B_q4 | 5 | 3.549 | 73.2% | 75.5% | 70.4% | 78.0% | 74.7% | 74.4% | llama_7B_q8 | 8 | 3.453 | 73.7% | 76.1% | 70.2% | 78.0% | 75.5% | 74.7% |
opt_125M | 1 | 26.028 | 37.9% | 31.3% | 50.2% | 63.2% | 23.4% | 41.2% |
opt_30B_q4 | 19 | 3.656 | 71.5% | 72.1% | 68.0% | 77.4% | 69.9% | 71.8% |
opt_30B_q8 | 34 | 3.628 | 71.6% | 72.3% | 68.2% | 77.7% | 71.4% | 72.3% |
opt_66B_q4 | 40 | 3.308 | 73.4% | 74.4% | 68.4% | 78.5% | 75.0% | 73.9% | pythia_deduped_1.4B | 3 | 6.546 | 63.1% | 52.2% | 57.1% | 72.7% | 52.6% | 59.5% |
pythia_deduped_1.4B_q8 | 2 | 6.577 | 63.3% | 52.1% | 55.7% | 73.1% | 53.0% | 59.4% | pythia_deduped_12B | 25 | 3.854 | 70.9% | 69.2% | 63.9% | 76.3% | 70.8% | 70.2% |
pythia_deduped_12B_q4 | 8 | 4.187 | 69.2% | 68.5% | 63.1% | 76.4% | 69.6% | 69.4% |
pythia_deduped_12B_q8 | 14 | 3.857 | 70.9% | 69.2% | 64.2% | 76.1% | 70.9% | 70.3% |
pythia_deduped_160M | 1 | 26.380 | 36.9% | 32.3% | 51.4% | 63.8% | 23.2% | 41.5% | pythia_deduped_1B | 3 | 7.273 | 58.5% | 49.0% | 54.5% | 71.0% | 49.9% | 56.6% |
pythia_deduped_1B_q8 | 2 | 7.286 | 58.4% | 49.0% | 54.9% | 70.9% | 49.0% | 56.5% | pythia_deduped_2.8B | 6 | 4.787 | 67.1% | 61.6% | 60.9% | 74.4% | 65.5% | 65.9% |
pythia_deduped_2.8B_q8 | 4 | 4.778 | 66.9% | 61.5% | 61.2% | 74.5% | 65.6% | 66.0% |
pythia_deduped_410M | 1 | 10.827 | 51.7% | 40.8% | 54.0% | 67.2% | 43.0% | 51.4% |
pythia_deduped_410M_q8 | 1 | 10.729 | 51.8% | 40.7% | 53.8% | 67.1% | 42.7% | 51.2% | pythia_deduped_6.9B | 15 | 4.195 | 69.1% | 65.7% | 63.9% | 75.1% | 66.1% | 68.0% |
pythia_deduped_6.9B_q4 | 5 | 4.344 | 68.3% | 65.0% | 62.5% | 75.3% | 66.3% | 67.5% |
pythia_deduped_6.9B_q8 | 8 | 4.187 | 69.4% | 65.7% | 63.6% | 75.5% | 66.8% | 68.2% |
pythia_deduped_70M | 1 | 96.126 | 25.6% | 28.3% | 54.4% | 60.4% | 13.1% | 36.3% | rwkv_14B | 29 | 3.819 | 71.6% | 70.2% | 63.1% | 77.5% | 47.2% | 65.9% |
rwkv_14B_q4 | 9 | 4.076 | 68.3% | 69.8% | 63.1% | 77.1% | 45.0% | 64.7% |
rwkv_14B_q8 | 16 | 3.806 | 71.9% | 70.2% | 63.0% | 77.5% | 47.1% | 65.9% |
rwkv_raven_v8_14B_q4 | 9 | 4.296 | 67.0% | 70.6% | 63.8% | 76.7% | 43.1% | 64.2% |
rwkv_raven_v9_14B_q4 | 9 | 4.460 | 66.4% | 70.6% | 63.0% | 77.2% | 42.3% | 63.9% | rwkv_7B | 16 | 4.396 | 67.5% | 65.6% | 61.9% | 75.6% | 39.7% | 62.1% |
rwkv_7B_q4 | 5 | 4.939 | 64.7% | 64.8% | 61.2% | 75.4% | 38.4% | 60.9% |
rwkv_7B_q8 | 9 | 4.395 | 67.5% | 65.6% | 61.6% | 75.9% | 40.2% | 62.2% |
stablelm-base-alpha-7b | 15 | 10.015 | 54.2% | 41.0% | 50.7% | 66.1% | 43.6% | 51.1% |
Additional Models:
Description | ||
---|---|---|
m2m100_1_2B_q8 | 2 | Translation between 100 languages |
nllb200_1.3B_q8 | 2 | Translation between 200 languages |
nllb200_3.3B_q8 | 5 | Translation between 200 languages |
sd-v1-4 | 3 | Stable Diffusion text-to-image version 1.4 |
SHA256 of all the models: sha256.txt.
Notes: