Computations in extended precision#

As hinted in the installation instructions, heyoka supports computations not only in single and double precision, but also in extended precision. Specifically, heyoka currently supports:

the 80-bit IEEE extended-precision format (~21 decimal digits),
the 128-bit IEEE quadruple-precision format (~36 decimal digits).

How these extended precision floating-point types can be accessed and used from C++ varies depending on the platform. The 80-bit extended-precision format is available as the C++ long double type on most platforms based on Intel x86 processors. Quadruple-precision computations are supported either via the long double type (e.g., on 64-bit Linux ARM) or via the the mppp::real128 type (provided that the platform supports the nonstandard __float128 floating-point type and that heyoka was compiled with support for the mp++ library - see the installation instructions).

A simple example#

We will be assuming here that long double implements the 80-bit extended-precision floating-point format. In order to verify that heyoka indeed is able to work in extended precision, we will be monitoring the evolution of the energy constant in a high-precision numerical integration of the simple pendulum.

Let us begin as usual with the definition of the dynamical equations and the creation of the integrator object:

    // Create the symbolic variables x and v.
    auto [x, v] = make_vars("x", "v");

    // Create the integrator object
    // in extended precision.
    auto ta = taylor_adaptive<long double>{// Definition of the ODE system:
                                           // x' = v
                                           // v' = -9.8 * sin(x)
                                           {prime(x) = v, prime(v) = -9.8 * sin(x)},
                                           // Initial conditions
                                           // for x and v.
                                           {-1., 0.}};

In order to activate extended precision, we created an integrator object of type taylor_adaptive<long double> - that is, we specified long double, instead of the usual double, as the (only) template parameter for the taylor_adaptive class template. Note that, for simplicity, we still used double-precision values for the initial state: these values are automatically converted to long double by the integrator’s constructor. Note also that, when operating in extended precision, all numerical values encapsulated in an integrator are represented in extended precision - this includes not only the state vector, but also the time coordinate, the tolerance, the Taylor coefficients, etc. Similarly to double-precision integrators, the default value of the tolerance is the machine epsilon of long double.

Next, we define a small helper function that will allow us to monitor the evolution of the energy constant throughout the integration:

    // Create a small helper to compute the energy constant
    // from the state vector.
    auto compute_energy = [](const auto &sv) {
        using std::cos;

        return (sv[1] * sv[1]) / 2 + 9.8 * (1 - cos(sv[0]));
    };

Before starting the integration, we compute and store the initial energy for later use:

    // Compute and store the intial energy.
    const auto orig_E = compute_energy(ta.get_state());

We can now begin a step-by-step integration. At the end of each step, we will be computing and printing to screen the relative energy error:

    // Integrate for a few timesteps.
    for (auto i = 0; i < 20; ++i) {
        using std::abs;

        ta.step();

        std::cout << "Relative energy error: " << abs((orig_E - compute_energy(ta.get_state())) / orig_E) << '\n';
    }

Relative energy error: 0
Relative energy error: 0
Relative energy error: 9.62658e-20
Relative energy error: 0
Relative energy error: 9.62658e-20
Relative energy error: 9.62658e-20
Relative energy error: 9.62658e-20
Relative energy error: 1.92532e-19
Relative energy error: 1.92532e-19
Relative energy error: 1.92532e-19
Relative energy error: 1.92532e-19
Relative energy error: 9.62658e-20
Relative energy error: 1.92532e-19
Relative energy error: 9.62658e-20
Relative energy error: 9.62658e-20
Relative energy error: 9.62658e-20
Relative energy error: 9.62658e-20
Relative energy error: 9.62658e-20
Relative energy error: 0
Relative energy error: 9.62658e-20

The console output indeed confirms that energy is conserved at the level of the epsilon of the 80-bit extended-precision format (that is, \(\sim 10^{-19}\)).

Performing quadruple-precision integrations via the mppp::real128 type follows exactly the same pattern. That is, one just needs to replace long double with mppp::real128 when instantiating the integrator object.

Other classes and functions#

Besides the adaptive integrator, several other classes and functions in heyoka can be used in extended precision.

The event classes, for instance, can be constructed in extended precision by passing long double or mppp::real128 as the template parameter (instead of double). Note that the precision of an event must match the precision of the integrator object in which the event is used, otherwise an error will be produced at compilation time.

Full code listing#

#include <iostream>

#include <heyoka/heyoka.hpp>

using namespace heyoka;

int main()
{

#if !defined(HEYOKA_ARCH_PPC)

    // Create the symbolic variables x and v.
    auto [x, v] = make_vars("x", "v");

    // Create the integrator object
    // in extended precision.
    auto ta = taylor_adaptive<long double>{// Definition of the ODE system:
                                           // x' = v
                                           // v' = -9.8 * sin(x)
                                           {prime(x) = v, prime(v) = -9.8 * sin(x)},
                                           // Initial conditions
                                           // for x and v.
                                           {-1., 0.}};

    // Create a small helper to compute the energy constant
    // from the state vector.
    auto compute_energy = [](const auto &sv) {
        using std::cos;

        return (sv[1] * sv[1]) / 2 + 9.8 * (1 - cos(sv[0]));
    };

    // Compute and store the intial energy.
    const auto orig_E = compute_energy(ta.get_state());

    // Integrate for a few timesteps.
    for (auto i = 0; i < 20; ++i) {
        using std::abs;

        ta.step();

        std::cout << "Relative energy error: " << abs((orig_E - compute_energy(ta.get_state())) / orig_E) << '\n';
    }

#endif
}

Computations in extended precision

Contents

Computations in extended precision#

A simple example#

Other classes and functions#

Full code listing#