Tiny Big Float library
----------------------
Copyright (c) 2017 Fabrice Bellard
LibBF is a small library to handle arbitrary precision floating point
numbers. Its compiled size is about 16 KB of x86 code and has no
dependency on other libraries. It is not the fastest library nor the
smaller but it tries to be simple while using asymptotically optimal
algorithms. All the implemented operations have a near linear running
time.
The TinyPI example computes billions of digits of Pi using the
Chudnovsky formula.
Features:
- Arbitrary precision floating point numbers in base 2^b (b=64 or 32).
- basic operations: addition, subtraction, multiplication, division,
inverse square root.
- Multiplication using a SIMD optimized Number Theoretic Transform.
- Efficient base 10 conversion.
- MIT license.
Notes:
- The code was tested on a 64 bit x86 CPU. It should be portable to
other CPUs. The portable version handles numbers with up to 4*10^16
digits. The AVX2 version handles numbers with up to 8*10^12 digits.
- 32 bits: the code compiles on 32 bit architectures but it is not
designed to be efficient nor scalable in this case. The size of the
numbers is limited to about 10 million digits.
- The Number Theoretic Transform is not the fastest algorithm for
small to medium numbers (i.e. a few million digits), but it gets
better when the size of the numbers grows. There is no round-off
errors as with Fast Fourier Transform, the memory usage is much
smaller and it is potentially easier to parallelize. This code
contains an original SIMD (AVX2 on x86) implementation using 64 bit
floating point numbers. It relies on the fact that the fused
multiply accumulate (FMA) operation gives access to the full
precision of the product of two 64 bit floating point numbers. The
portable code relies on the fact that the C compiler supports a
double word integer type (i.e. 128 bit integers on 64 bit). The
modulo operations were replaced with multiplications which are
usually faster.
- Base conversion: the algorithm is not the fastest one but it is
simple and still gives a near linear running time.
- Rounding: the addition, subtraction and multiplication implement
the round to zero rounding mode. There is no exponent overflow
test. No Infinity nor NaN special numbers are supported. The
precision of the division and of the inverse square root is not
rigorously proved. The subtraction yields loss of precision in case
of cancellation.
- This library reuses some ideas from TachusPI (
http://bellard.org/pi/pi2700e9/tpi.html ) . It is about 4 times
slower to compute Pi but is much smaller and simpler.
- TinyPI: the "tinypi" executable uses the portable code. The
"tinypi-avx2" executable uses the AVX2 implementation. An x86 CPU of
at least the Intel Haswell generation is necessary for AVX2.
TODO:
- Support cancellation in subtraction.
- Change the base to 2 instead of 2^b (as in MPFR) without
substantially increasing the size of the code.
- Add a floating point base conversion function (the current one only
handles integers).
- Use a rigorously proved algorithm for the division and inverse
square root.
- Use OpenMP to parallelize some parts