The test program is available here.
This test is intended to measure the overhead of the kernel space acceleration against a user space one. The primitive choosen (to draw a single pixel) is very simple so I assume that this is the worst case we could have. I tested several different cases (clipping or no clipping in user space, changing the size of the requests). The pixels coodinates were generated with a very simple function so that the overhead is very small here too (see the source code).
When the clipping was done, about 1/4 of the pixels were drawn outside the clipping rectangle. Note that with kernel acceleration, the clipping is always performed for security issues.
Here are the results :
|Benchmark description||Running time (in seconds)|
|User space, no clipping||0.22|
|Kernel space, no pixel clipped (clipping is always done), REQUEST_SIZE=1000||0.86|
|User space, with clipping||0.24|
|Kernel space, REQUEST_SIZE=1000 (clipping is always done)||0.80|
|Kernel space, REQUEST_SIZE=100||1.24|
|Kernel space, REQUEST_SIZE=10000||0.95|
This tests shows that the kernel mode acceleration introduces a non negligible overhead for each command (it is about 3.3 times slower than user space acceleration for a DrawPixel test).
This is a quite positive result because it means that the overhead of building the display list, doing the system call and interpreting the display list takes the time of about two DrawPixel primitives, which is very small. For example, the X11 server should have in every case a much bigger overhead in its protocol interpretation.
The optimal request size seem to be about 1000 words on this computer. It is not very big so the display list handling may not consume a lot of memory in user space.
This test shows that even for small primitives (even up to lines of a few pixels), the kernel space acceleration can be interesting.
By using inlining in the kernel code, we estimate that the penalty can come at about one DrawPixel lost per accelerated primitive.