TextSynth Server

Table of Contents

1 Introduction

TextSynth Server is a web server proposing a REST API to large language models. They can be used for example for text completion, question answering, classification, chat, translation, image generation, ...

It has the following characteristics:

The CPU version is released as binary code under the MIT license. The GPU version and the tools to convert and quantize the models are commercial software.

2 Quick Start

2.1 Linux

2.1.1 First steps

The TextSynth Server works only on x86 CPUs supporting AVX2 (all Intel CPUs since 2013 support it). The installation was tested on Fedora and CentOS/RockyLinux 8 distributions. Other distributions should work provided the libjpeg and libmicrohttpd libraries are installed.

  1. Install the libjpeg and libmicrohttpd libraries. If you use Fedora, RHEL, CentOS or RockyLinux, you can type as root:
      dnf install libjpeg libmicrohttpd
    

    ts_test can be used without these libraries. ts_sd needs libjpeg. ts_server needs libjpeg and libmicrohttpd.

  2. Extract the archive and go into its directory:
      tar xtf ts_server-##version##.tar.gz
    
      cd ts_server-##version##
    

    when ##version## is the version of the program.

  3. Download one small example model:
      wget https://www2.bellard.org/models/gpt2_117M.bin
    
  4. Use it to generate text with the "ts_test" utility:
      ./ts_test -m gpt2-117M.bin g "The Linux kernel is"
    

    You can use more CPU cores with the -T option:

      ./ts_test -T 4 -m gpt2-117M.bin g "The Linux kernel is"
    

    The optimal number of cores depends on the system configuration.

  5. Start the server:
      ./ts_server ts_server.cfg
    

    You can edit the ts_server.cfg JSON configuration file if you want to use another model.

  6. Try one request:
      curl http://localhost:8080/v1/engines/gpt2_117M/completions \
      -H "Content-Type: application/json" \
      -d '{"prompt": "The Linux kernel is", "max_tokens": 100}'
    

    The full request syntax is documented at https://textsynth.com/documentation.html.

    Now you are ready to load a larger model and to use it from your application.

2.1.2 GPU usage (commercial version only)

You need an Nvidia Ampere, ADA or Hopper GPU (e.g. RTX 3090, RTX A6000 or A100) in order to use the server with cuda 11.x or 12.x installed. Enough memory must be available to load the model.

  1. First ensure that it is working on CPU (See First steps).
  2. Then try to use the GPU with the ts_test utility:
      ./ts_test --cuda -m gpt2-117M.bin g  "The Linux kernel is"
    
  3. Then edit the ts_server.cfg configuration to enable GPU support by uncommenting
      cuda: true
    

    and run the server:

      ./ts_server ts_server.cfg
    
  4. Assuming you have curl, Try one request:
      curl http://localhost:8080/v1/engines/gpt2_117M/completions \
      -H "Content-Type: application/json" \
      -d '{"prompt": "The Linux kernel is", "max_tokens": 100}'
    

2.2 Windows

The TextSynth Server works only on x86 CPUs supporting AVX2 (all Intel CPUs since 2013 support it). The Windows support is experimental.

  1. Extract the ZIP archive, launch the shell and go into its directory:
      cd ts_server-##version##
    

    when ##version## is the version of the program.

  2. Download one small example model:
      wget https://www2.bellard.org/models/gpt2_117M.bin
    
  3. Use it to generate text with the "ts_test" utility:
      ts_test -m gpt2-117M.bin g "The Linux kernel is"
    

    You can use more CPU cores with the -T option:

      ts_test -T 4 -m gpt2-117M.bin g "The Linux kernel is"
    

    The optimal number of cores depends on the system configuration.

  4. Start the server:
      ts_server ts_server.cfg
    

    You can edit the ts_server.cfg JSON configuration file if you want to use another model.

  5. Assuming you installed curl (you can download it from https://curl.se/windows/), try one request:
      curl http://localhost:8080/v1/engines/gpt2_117M/completions \
      -H "Content-Type: application/json" \
      -d '{"prompt": "The Linux kernel is", "max_tokens": 100}'
    

    The full request syntax is documented at https://textsynth.com/documentation.html.

    Now you are ready to load a larger model and to use it from your application.

3 Utilities

Here are some examples with the utilities:

4 License

The CPU version of the TextSynth Server software is provided as binary code under the MIT license (see the LICENSE file). The GPU version is commercial software. Please contact us for the exact licensing terms.