Real basics#

Preliminaries#

Let us load the mp++ runtime, include a few headers and add a couple of using directives to reduce typing:

#pragma cling add_include_path("$CONDA_PREFIX/include")
#pragma cling add_library_path("$CONDA_PREFIX/lib")
#pragma cling load("mp++")

#include <mp++/real.hpp>
#include <mp++/integer.hpp>
#include <mp++/rational.hpp>

using namespace mppp::literals;
using real = mppp::real;
using real_kind = mppp::real_kind;

Let us also include a few bits from the standard library:

#include <complex>
#include <initializer_list>
#include <iomanip>
#include <iostream>
#include <stdexcept>
#include <utility>
#include <vector>

Construction#

real is a floating-point class in which the number of significant digits (i.e., the precision) is a runtime property of the individual class instances, rather than a fixed compile-time property of the class. This means that each real object is constructed with its own precision value, measured in bits (i.e., base-2 digits). When working with real values it is thus very important to be aware of how to set and manipulate the precision and how the precision of a real propagates throughout mathematical computations.

A default-constructed real is initialised to zero:

std::cout << real{} << '\n';
0

The precision of a default-constructed real is set to the minimum allowed value:

std::cout << "A default-constructed real has a precision of " << real{}.get_prec() << " bits\n";
A default-constructed real has a precision of 2 bits

Construction from fundamental C++ types sets a precision sufficient to represent exactly any value of that type:

std::cout << "Precision when constructing from an int  : " << real{42}.get_prec() << '\n';
std::cout << "Precision when constructing from a long  : " << real{42l}.get_prec() << '\n';
std::cout << "Precision when constructing from a float : " << real{42.f}.get_prec() << '\n';
std::cout << "Precision when constructing from a double: " << real{42.}.get_prec() << '\n';
Precision when constructing from an int  : 32
Precision when constructing from a long  : 64
Precision when constructing from a float : 24
Precision when constructing from a double: 53

This ensures that construction from a fundamental C++ type preserves exactly the input value.

Similarly, when constructing from an integer the precision is set to the bit width of the input value:

std::cout << "Precision when constructing from 42_z1         : "
          << std::setw(5) << real{42_z1}.get_prec() << '\n';
std::cout << "Precision when constructing from 42_z1 * 2**256: "
          << std::setw(5) << real{42_z1 << 256}.get_prec() << '\n';
Precision when constructing from 42_z1         :    64
Precision when constructing from 42_z1 * 2**256:   320

When constructing from a rational the precision is set to the total bit width of numerator and denominator:

std::cout << "Precision when constructing from 4/3_q1: " << real{4/3_q1}.get_prec() << '\n';
Precision when constructing from 4/3_q1: 128

Note that when constructing from a rational, the input value is preserved exactly only if the denominator is a power of 2:

std::cout << "4/3_q1  = " << real{4/3_q1}.to_string() << '\n';
std::cout << "5/16_q1 = " << real{5/16_q1}.to_string() << '\n';
4/3_q1  = 1.333333333333333333333333333333333333335
5/16_q1 = 3.125000000000000000000000000000000000000e-1

Automatic precision deduction can be overridden by explicitly specifying the desired precision upon construction:

std::cout << "The 10-bit approximation of 1.3 is: " << real{1.3, 10}.to_string() << '\n';
std::cout << "The 16-bit approximation of 4/3 is: " << real{4/3_q1, 16}.to_string() << '\n';
std::cout << "The 3-bit approximation of 123 is : " << real{123, 3}.to_string() << '\n';
The 10-bit approximation of 1.3 is: 1.3008
The 16-bit approximation of 4/3 is: 1.33334
The 3-bit approximation of 123 is : 1.3e+2

Like every mp++ class, real can be constructed from string-like objects. Construction from string always requires an explicit precision value:

std::cout << "The 53-bit approximation of 1.1 is      : "
          << real{"1.1", 53}.to_string() << '\n';
std::cout << "The 113-bit approximation of 1.1 is     : "
          << real{"1.1", 113}.to_string() << '\n';
std::cout << "The 256-bit approximation of 1.1 is     : "
          << real{"1.1", 256}.to_string() << '\n';
std::cout << "The 256-bit approximation of infinity is: "
          << real{"inf", 256}.to_string() << '\n';
The 53-bit approximation of 1.1 is      : 1.1000000000000001
The 113-bit approximation of 1.1 is     : 1.10000000000000000000000000000000008
The 256-bit approximation of 1.1 is     : 1.100000000000000000000000000000000000000000000000000000000000000000000000000003
The 256-bit approximation of infinity is: inf

Construction from string representations in other bases is supported too:

std::cout << "In base 4 1.1 is, to a precision of 53 bits : "
          << real{"1.1", 4, 53}.to_string() << '\n';
std::cout << "In base 17 1.1 is, to a precision of 53 bits: "
          << real{"1.1", 17, 53}.to_string() << '\n';
std::cout << "In base 59 1.1 is, to a precision of 53 bits: "
          << real{"1.1", 59, 53}.to_string() << '\n';
In base 4 1.1 is, to a precision of 53 bits : 1.2500000000000000
In base 17 1.1 is, to a precision of 53 bits: 1.0588235294117647
In base 59 1.1 is, to a precision of 53 bits: 1.0169491525423728

The copy and move constructors of real can be called with a new precision as optional argument. If the new precision is smaller than the original one, a rounding may occur:

std::cout << "The 23-bit approximation of 1.1 is: "
          << real{real{1.1}, 23}.to_string() << '\n';
std::cout << "Extending the 10-bit approximation of 1.1 to 20 bits yields: "
          << real{real{1.1, 10}, 20}.to_string() << '\n';
The 23-bit approximation of 1.1 is: 1.0999999
Extending the 10-bit approximation of 1.1 to 20 bits yields: 1.0996094

Move construction will leave a real object in an invalid state. After move-construction, the only valid operations on a real are:

  • destruction,

  • the invocation of the is_valid() member function,

  • copy/move assignment.

A moved-from real can be revived through re-assignment:

{
    real r1 = 42;
    
    // Move-construct r2 via r1.
    real r2{std::move(r1)};
    
    std::cout << "After move, r1 is " << (r1.is_valid() ? "valid" : "invalid") << '\n';
    
    // Revive r1 via assignment.
    r1 = r2;
    
    std::cout << "After re-assignment, r1 is " << (r1.is_valid() ? "valid" : "invalid") << '\n';
}
After move, r1 is invalid
After re-assignment, r1 is valid

The real constructors from real-valued types are implicit:

{
    real r0 = 5;
    real r1 = 6.f;
    real r2 = 1.23l;
    
    std::vector<real> v = {1, 2, 3};
}

The constructors from complex-valued types however are explicit:

{
    // real r0 = std::complex<double>{10, 0}; <-- Won't compile.
    
    // This works.
    real r0{std::complex<double>{10, 0}};
    
    try {
        real r1{std::complex<double>{10, 1}};
    } catch (const std::domain_error &de) {
        std::cerr << "Construction from complex values with nonzero imaginary part is not possible:\n"
                  << de.what() << '\n';
    }
}
Construction from complex values with nonzero imaginary part is not possible:
Cannot construct a real from a complex C++ value with a non-zero imaginary part of 1.000000

real also features a couple of specialised constructors, such as a constructor from a power of 2:

std::cout << "2**123 with a precision of 100 bits is: " << real{2l, 123, 100}.to_string() << '\n';
2**123 with a precision of 100 bits is: 2.1267647932558653966460912964486e+37

And a constructor from the special values \(\pm 0\), \(\pm \infty\) and NaN:

std::cout << "Constructing a negative zero (53 bits)    : "
          << real{real_kind::zero, -1, 53}.to_string() << '\n';
std::cout << "Constructing a positive infinity (53 bits): "
          << real{real_kind::inf, 53}.to_string() << '\n';
std::cout << "Constructing a NaN (53 bits)              : "
          << real{real_kind::nan, 53}.to_string() << '\n';
Constructing a negative zero (53 bits)    : -0.0000000000000000
Constructing a positive infinity (53 bits): inf
Constructing a NaN (53 bits)              : nan

Getting and setting the precision#

The precision of a real is not fixed after construction, and it can be altered via the set_prec() and prec_round() member functions (or their free-function counterparts).

set_prec() is destructive - in addition to changing the precision of a real, it will also reset its value to NaN:

{
    real r0 = 123;
    
    std::cout << "The initial precision of r0 is " << r0.get_prec()
              << ", the initial value is " << r0 << '\n';

    // Destructively change the precision.
    r0.set_prec(64);

    std::cout << "The new precision of r0 is " << r0.get_prec()
              << ", the new value is " << r0 << '\n';
}
The initial precision of r0 is 32, the initial value is 123
The new precision of r0 is 64, the new value is nan

In contrast, prec_round() will either preserve exactly the original value (if the new precision is higher than the old one) or it will perform a rounding operation (if the new precision is lower than the old one):

{
    real r0 = 123;
    
    std::cout << "The initial precision of r0 is " << r0.get_prec()
              << ", the initial value is " << r0 << '\n';

    // Change the precision preserving the original value.
    r0.prec_round(256);

    std::cout << "The new precision of r0 is " << r0.get_prec()
              << ", the new value is " << r0 << '\n';

    // Lower the precision.
    r0.prec_round(4);

    std::cout << "The new precision of r0 is " << r0.get_prec()
              << ", the new value is " << r0 << '\n';
}
The initial precision of r0 is 32, the initial value is 123
The new precision of r0 is 256, the new value is 123
The new precision of r0 is 4, the new value is 120

Assignment#

The assignment operators behave exactly like the corresponding constructors - after assignment, the precision of the real object will match the deduced precision of the assignment argument:

{
    real r0;
    
    std::cout << "The initial precision of r0 is: " << r0.get_prec() << '\n';
    
    // Assign an int.
    r0 = 42;
    
    std::cout << "After assignment from an int, the precision of r0 is  : "
              << r0.get_prec() << '\n';
    
    // Assign a double.
    r0 = 1.2345;
    
    std::cout << "After assignment from a double, the precision of r0 is: "
              << r0.get_prec() << '\n';
}
The initial precision of r0 is: 2
After assignment from an int, the precision of r0 is  : 32
After assignment from a double, the precision of r0 is: 53

In order to override the automatic deduction behaviour, the set() family of member functions (and their free-function counterparts) can be used:

{
    real r0{real_kind::zero, 12};
    
    std::cout << "r0 has been created with a precision of " << r0.get_prec() << '\n';
 
    // Set to an int.
    r0.set(42);
    
    std::cout << "r0 is now " << r0.to_string() << ", the precision is still " << r0.get_prec() << '\n';

    // Set to a double.
    r0.set(1.1);
    
    std::cout << "r0 is now " << r0.to_string() << ", the precision is still " << r0.get_prec() << '\n';
}
r0 has been created with a precision of 12
r0 is now 4.2000e+1, the precision is still 12
r0 is now 1.1001, the precision is still 12

The set() functions also support setting from string-like entities:

{
    real r0{real_kind::zero, 12};
    
    std::cout << "r0 has been created with a precision of " << r0.get_prec() << '\n';
 
    // Set to a string.
    r0.set("1.3");
    
    std::cout << "r0 is now " << r0.to_string() << ", the precision is still " << r0.get_prec() << '\n';
}
r0 has been created with a precision of 12
r0 is now 1.2998, the precision is still 12

Specialised setters are also available:

{
    real r0{real_kind::zero, 12};
    
    std::cout << "r0 has been created with a precision of " << r0.get_prec() << '\n';
    
    // Set to nan, inf.
    r0.set_nan();
    std::cout << "r0 has been set to " << r0 << " with a precision of " << r0.get_prec() << '\n';
    r0.set_inf(-1);
    std::cout << "r0 has been set to " << r0 << " with a precision of " << r0.get_prec() << '\n';
    
    // Set to 42*2**-128.
    set_ui_2exp(r0, 42u, -128);
    std::cout << "r0 has been set to " << r0 << " with a precision of " << r0.get_prec() << '\n';
}
r0 has been created with a precision of 12
r0 has been set to nan with a precision of 12
r0 has been set to -inf with a precision of 12
r0 has been set to 1.23427e-37 with a precision of 12

Conversion#

real can be converted to both fundamental C++ types and other mp++ classes:

{
    // Conversion to integral types truncates.
    std::cout << "1.1 converted to int is        : " << static_cast<int>(real{"1.1", 32}) << '\n';
    std::cout << "-2.1 converted to integer<1> is: " << static_cast<mppp::integer<1>>(real{"-2.1", 32}) << "\n\n";
    
    // Conversion to rational is exact.
    std::cout << "The 32-bit approximation of 1.1 is exactly: " << static_cast<mppp::rational<1>>(real{"1.1", 32}) << "\n\n";
    
    std::cout << "Extending the 12-bit approximation of 1.1 to 'double' yields: " << static_cast<double>(real{"1.1", 12}) << '\n';
}
1.1 converted to int is        : 1
-2.1 converted to integer<1> is: -2

The 32-bit approximation of 1.1 is exactly: 2362232013/2147483648

Extending the 12-bit approximation of 1.1 to 'double' yields: 1.1001

Note that, as usual, the conversion operator of real is explicit.

The conversion can fail in some cases:

try {
    static_cast<int>(real{"inf", 32});
} catch (const std::domain_error &de) {
    std::cerr << de.what() << '\n';
}
Cannot convert a non-finite real to a C++ signed integral type
try {
    static_cast<int>(real{2l, 123, 100});
} catch (const std::overflow_error &oe) {
    std::cerr << oe.what() << '\n';
}
Conversion of the real 2.1267647932558653966460912964486e+37 to the type 'int' results in overflow

If exceptions are to be avoided, the non-throwing get() family of functions can be used instead of the conversion operator.

Like the other mp++ classes, real is contextually-convertible to bool. Note that NaN converts to true:

if (real{123}) {
    std::cout << "123 is true\n";
}

if (!real{0}) {
    std::cout << "0 is false\n";
}

std::cout << "NaN is " << (real{"nan", 32} ? "true" : "false") << '\n';
123 is true
0 is false
NaN is true

User-defined literals#

The real class provides a few user-defined literals. The literals, as usual, are defined in the mppp::literals inline namespace, and they support decimal and hexadecimal representations for a few predefined precision values:

std::cout << "The 128-bit approximation of '123.456' is  : " << (123.456_r128).to_string() << '\n';
std::cout << "The 256-bit approximation of '42' is       : " << (4.2e1_r256).to_string() << '\n';
// Hexadecimal notation is supported too.
std::cout << "The 512-bit approximation of '0x1.12p-1' is: " << (0x1.12p-1_r512).to_string() << '\n';
The 128-bit approximation of '123.456' is  : 1.234559999999999999999999999999999999999e+2
The 256-bit approximation of '42' is       : 4.200000000000000000000000000000000000000000000000000000000000000000000000000000e+1
The 512-bit approximation of '0x1.12p-1' is: 5.35156250000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000e-1