TextSynth Server is a web server proposing a REST API to large language models. They can be used for example for text completion, question answering, classification, chat, translation, image generation, ...
It has the following characteristics:
ts_test
, ts_sd
) are provided to test the various models.
The CPU version is released as binary code under the MIT license. The GPU version and the tools to convert and quantize the models are commercial software.
The TextSynth Server works only on x86 CPUs supporting AVX2 (all Intel
CPUs since 2013 support it). The installation was tested on Fedora and
CentOS/RockyLinux 8 distributions. Other distributions should work
provided the libjpeg
and libmicrohttpd
libraries are
installed.
libjpeg
and libmicrohttpd
libraries. If you use Fedora, RHEL, CentOS or RockyLinux, you can type as root:
dnf install libjpeg libmicrohttpd
ts_test
can be used without these libraries. ts_sd
needs libjpeg
. ts_server
needs libjpeg
and
libmicrohttpd
.
tar xtf ts_server-##version##.tar.gz cd ts_server-##version##
when ##version##
is the version of the program.
wget https://www2.bellard.org/models/gpt2_117M.bin
./ts_test -m gpt2-117M.bin g "The Linux kernel is"
You can use more CPU cores with the -T
option:
./ts_test -T 4 -m gpt2-117M.bin g "The Linux kernel is"
The optimal number of cores depends on the system configuration.
./ts_server ts_server.cfg
You can edit the ts_server.cfg JSON configuration file if you want to use another model.
curl http://localhost:8080/v1/engines/gpt2_117M/completions \ -H "Content-Type: application/json" \ -d '{"prompt": "The Linux kernel is", "max_tokens": 100}'
The full request syntax is documented at https://textsynth.com/documentation.html.
Now you are ready to load a larger model and to use it from your application.
You need an Nvidia Ampere, ADA or Hopper GPU (e.g. RTX 3090, RTX A6000 or A100) in order to use the server with cuda 11.x or 12.x installed. Enough memory must be available to load the model.
ts_test
utility:
./ts_test --cuda -m gpt2-117M.bin g "The Linux kernel is"
ts_server.cfg
configuration to enable GPU support by uncommenting
cuda: true
and run the server:
./ts_server ts_server.cfg
curl http://localhost:8080/v1/engines/gpt2_117M/completions \ -H "Content-Type: application/json" \ -d '{"prompt": "The Linux kernel is", "max_tokens": 100}'
The TextSynth Server works only on x86 CPUs supporting AVX2 (all Intel CPUs since 2013 support it). The Windows support is experimental.
cd ts_server-##version##
when ##version##
is the version of the program.
wget https://www2.bellard.org/models/gpt2_117M.bin
ts_test -m gpt2-117M.bin g "The Linux kernel is"
You can use more CPU cores with the -T
option:
ts_test -T 4 -m gpt2-117M.bin g "The Linux kernel is"
The optimal number of cores depends on the system configuration.
ts_server ts_server.cfg
You can edit the ts_server.cfg JSON configuration file if you want to use another model.
curl http://localhost:8080/v1/engines/gpt2_117M/completions \ -H "Content-Type: application/json" \ -d '{"prompt": "The Linux kernel is", "max_tokens": 100}'
The full request syntax is documented at https://textsynth.com/documentation.html.
Now you are ready to load a larger model and to use it from your application.
Here are some examples with the utilities:
./ts_test -m gpt2_117M.bin g "Hello, my name is"
./ts_test -m gpt2_117M.bin cs "Hello, how are you ?" ./ts_test ds "##msg##"
where ##msg##
is the compressed message.
./ts_test -m m2m100_1_2B_q8.bin translate en fr "The dispute focuses \ on the width of seats provided on long-haul flights for economy \ passengers."
assuming you downloaded the m2m100_1_2B_q8.bin model.
./ts_sd -o out.jpg "an astronaut riding a horse"
assuming you downloaded sd-v1-4.bin.
The CPU version of the TextSynth Server software is provided as binary code under the MIT license (see the LICENSE file). The GPU version is commercial software. Please contact us for the exact licensing terms.