Quadruple-precision float tutorial

Quadruple-precision float tutorial#

The real128 class is a thin wrapper around the __float128 type available on GCC, Clang and the Intel compiler on some platforms [1].

__float128 is an implementation the quadruple-precision IEEE 754 binary floating-point standard, which provides up to 36 decimal digits of precision. On most platforms, __float128 is implemented in software, and thus it is typically an order of magnitude slower than the standard floating-point C++ types. A notable exception are recent versions of the PowerPC architecture, which provide hardware-accelerated quadruple-precision floating-point arithmetic. Note that, even with software implementations, real128 can be expected to be noticeably faster than real. real128 is available in mp++ if the library is configured with the MPPP_WITH_QUADMATH option enabled (see the installation instructions).

Note

On Clang<7, __float128 cannot be used in mixed-mode operations with long double. Accordingly, real128 will disable interoperability with long double if Clang<7 is being used.

As a thin wrapper, real128 adds a few extra features on top of what __float128 already provides. Specifically, real128:

  • can interact with the other mp++ classes,

  • can be constructed from string-like objects,

  • supports the standard C++ iostream facilities.

Like __float128, real128 is a literal type, and thus it can be used for constexpr compile-time computations. Additionally, real128 implements as constexpr constructs a variety of functions which are not constexpr for __float128.

Note

The Intel compiler does not implement certain __float128 floating-point primitives as constant expressions. As a result, a few real128 functions which are constexpr on GCC and Clang are not constexpr when using the Intel compiler. These occurrences are marked in the API reference.

In addition to the features common to all mp++ classes, the real128 API provides a few additional capabilities:

  • construction/conversion from/to __float128:

    real128 r{__float128(42)};                // Construction from a __float128.
    assert(r == 42);
    assert(static_cast<__float128>(r) == 42); // Conversion to __float128.
    
  • direct access to the internal __float128 instance (via the public m_value data member):

    real128 r{1};
    r.m_value += 1;                 // Modify directly the internal __float128 member.
    assert(r == 2);
    
    r.m_value = 0;
    assert(::cosq(r.m_value) == 1); // Call a libquadmath function directly on the internal member.
    
  • a variety of mathematical functions wrapping the libquadmath library routines. Note that the real128 function names drop the suffix q appearing in the names of the libquadmath routines, and, as usual in mp++, they are supposed to be found via ADL. Member function overloads for the unary functions are also available:

    real128 r{42};
    
    // Trigonometry.
    assert(cos(r) == ::cosq(r.m_value));
    assert(sin(r) == ::sinq(r.m_value));
    
    // Logarithms and exponentials.
    assert(exp(r) == ::expq(r.m_value));
    assert(log10(r) == ::log10q(r.m_value));
    
    // Etc.
    assert(lgamma(r) == ::lgammaq(r.m_value));
    assert(erf(r) == ::erfq(r.m_value));
    
    // Member function overloads.
    auto tmp = cos(r);
    assert(r.cos() == tmp); // NOTE: r.cos() will set r to its cosine.
    tmp = sin(r);
    assert(r.sin() == tmp); // NOTE: r.sin() will set r to its sine.
    
  • NaN-friendly hashing and comparison functions, for use in standard algorithms and containers;

  • a specialisation of the std::numeric_limits class template;

  • a selection of quadruple-precision compile-time mathematical constants.

The real128 reference contains the detailed description of all the features provided by real128.

Footnotes

User-defined literal#

New in version 0.19.

A user-defined literal is available to construct mppp::real128 instances. The literal is defined within the inline namespace mppp::literals, and it supports decimal and hexadecimal representations:

using namespace mppp::literals;

auto r1 = 123.456_rq;   // r1 contains the quadruple-precision
                        // approximation of 123.456 (that is,
                        // 123.455999999999999999999999999999998).

auto r2 = 4.2e1_rq;     // Scientific notation can be used.

auto r3 = 0x1.12p-1_rq; // Hexadecimal floats are supported too.