Real basics#
Preliminaries#
Let us load the mp++ runtime, include a few headers and add a couple of using
directives to reduce typing:
#pragma cling add_include_path("$CONDA_PREFIX/include")
#pragma cling add_library_path("$CONDA_PREFIX/lib")
#pragma cling load("mp++")
#include <mp++/real.hpp>
#include <mp++/integer.hpp>
#include <mp++/rational.hpp>
using namespace mppp::literals;
using real = mppp::real;
using real_kind = mppp::real_kind;
Let us also include a few bits from the standard library:
#include <complex>
#include <initializer_list>
#include <iomanip>
#include <iostream>
#include <stdexcept>
#include <utility>
#include <vector>
Construction#
real
is a floating-point class in which the number of significant digits (i.e., the precision) is a runtime property of the individual class instances, rather than a fixed compile-time property of the class. This means that each real
object is constructed with its own precision value, measured in bits (i.e., base-2 digits). When working with real
values it is thus very important to be aware of how to set and manipulate the precision and how the precision of a real
propagates throughout mathematical computations.
A default-constructed real
is initialised to zero:
std::cout << real{} << '\n';
0
The precision of a default-constructed real
is set to the minimum allowed value:
std::cout << "A default-constructed real has a precision of " << real{}.get_prec() << " bits\n";
A default-constructed real has a precision of 2 bits
Construction from fundamental C++ types sets a precision sufficient to represent exactly any value of that type:
std::cout << "Precision when constructing from an int : " << real{42}.get_prec() << '\n';
std::cout << "Precision when constructing from a long : " << real{42l}.get_prec() << '\n';
std::cout << "Precision when constructing from a float : " << real{42.f}.get_prec() << '\n';
std::cout << "Precision when constructing from a double: " << real{42.}.get_prec() << '\n';
Precision when constructing from an int : 32
Precision when constructing from a long : 64
Precision when constructing from a float : 24
Precision when constructing from a double: 53
This ensures that construction from a fundamental C++ type preserves exactly the input value.
Similarly, when constructing from an integer
the precision is set to the bit width of the input value:
std::cout << "Precision when constructing from 42_z1 : "
<< std::setw(5) << real{42_z1}.get_prec() << '\n';
std::cout << "Precision when constructing from 42_z1 * 2**256: "
<< std::setw(5) << real{42_z1 << 256}.get_prec() << '\n';
Precision when constructing from 42_z1 : 64
Precision when constructing from 42_z1 * 2**256: 320
When constructing from a rational
the precision is set to the total bit width of numerator and denominator:
std::cout << "Precision when constructing from 4/3_q1: " << real{4/3_q1}.get_prec() << '\n';
Precision when constructing from 4/3_q1: 128
Note that when constructing from a rational
, the input value is preserved exactly only if the denominator is a power of 2:
std::cout << "4/3_q1 = " << real{4/3_q1}.to_string() << '\n';
std::cout << "5/16_q1 = " << real{5/16_q1}.to_string() << '\n';
4/3_q1 = 1.333333333333333333333333333333333333335
5/16_q1 = 3.125000000000000000000000000000000000000e-1
Automatic precision deduction can be overridden by explicitly specifying the desired precision upon construction:
std::cout << "The 10-bit approximation of 1.3 is: " << real{1.3, 10}.to_string() << '\n';
std::cout << "The 16-bit approximation of 4/3 is: " << real{4/3_q1, 16}.to_string() << '\n';
std::cout << "The 3-bit approximation of 123 is : " << real{123, 3}.to_string() << '\n';
The 10-bit approximation of 1.3 is: 1.3008
The 16-bit approximation of 4/3 is: 1.33334
The 3-bit approximation of 123 is : 1.3e+2
Like every mp++ class, real
can be constructed from string-like objects. Construction from string always requires an explicit precision value:
std::cout << "The 53-bit approximation of 1.1 is : "
<< real{"1.1", 53}.to_string() << '\n';
std::cout << "The 113-bit approximation of 1.1 is : "
<< real{"1.1", 113}.to_string() << '\n';
std::cout << "The 256-bit approximation of 1.1 is : "
<< real{"1.1", 256}.to_string() << '\n';
std::cout << "The 256-bit approximation of infinity is: "
<< real{"inf", 256}.to_string() << '\n';
The 53-bit approximation of 1.1 is : 1.1000000000000001
The 113-bit approximation of 1.1 is : 1.10000000000000000000000000000000008
The 256-bit approximation of 1.1 is : 1.100000000000000000000000000000000000000000000000000000000000000000000000000003
The 256-bit approximation of infinity is: inf
Construction from string representations in other bases is supported too:
std::cout << "In base 4 1.1 is, to a precision of 53 bits : "
<< real{"1.1", 4, 53}.to_string() << '\n';
std::cout << "In base 17 1.1 is, to a precision of 53 bits: "
<< real{"1.1", 17, 53}.to_string() << '\n';
std::cout << "In base 59 1.1 is, to a precision of 53 bits: "
<< real{"1.1", 59, 53}.to_string() << '\n';
In base 4 1.1 is, to a precision of 53 bits : 1.2500000000000000
In base 17 1.1 is, to a precision of 53 bits: 1.0588235294117647
In base 59 1.1 is, to a precision of 53 bits: 1.0169491525423728
The copy and move constructors of real
can be called with a new precision as optional argument. If the new precision is smaller than the original one, a rounding may occur:
std::cout << "The 23-bit approximation of 1.1 is: "
<< real{real{1.1}, 23}.to_string() << '\n';
std::cout << "Extending the 10-bit approximation of 1.1 to 20 bits yields: "
<< real{real{1.1, 10}, 20}.to_string() << '\n';
The 23-bit approximation of 1.1 is: 1.0999999
Extending the 10-bit approximation of 1.1 to 20 bits yields: 1.0996094
Move construction will leave a real
object in an invalid state. After move-construction, the only valid operations on a real
are:
destruction,
the invocation of the
is_valid()
member function,copy/move assignment.
A moved-from real
can be revived through re-assignment:
{
real r1 = 42;
// Move-construct r2 via r1.
real r2{std::move(r1)};
std::cout << "After move, r1 is " << (r1.is_valid() ? "valid" : "invalid") << '\n';
// Revive r1 via assignment.
r1 = r2;
std::cout << "After re-assignment, r1 is " << (r1.is_valid() ? "valid" : "invalid") << '\n';
}
After move, r1 is invalid
After re-assignment, r1 is valid
The real
constructors from real-valued types are implicit:
{
real r0 = 5;
real r1 = 6.f;
real r2 = 1.23l;
std::vector<real> v = {1, 2, 3};
}
The constructors from complex-valued types however are explicit:
{
// real r0 = std::complex<double>{10, 0}; <-- Won't compile.
// This works.
real r0{std::complex<double>{10, 0}};
try {
real r1{std::complex<double>{10, 1}};
} catch (const std::domain_error &de) {
std::cerr << "Construction from complex values with nonzero imaginary part is not possible:\n"
<< de.what() << '\n';
}
}
Construction from complex values with nonzero imaginary part is not possible:
Cannot construct a real from a complex C++ value with a non-zero imaginary part of 1.000000
real
also features a couple of specialised constructors, such as a constructor from a power of 2:
std::cout << "2**123 with a precision of 100 bits is: " << real{2l, 123, 100}.to_string() << '\n';
2**123 with a precision of 100 bits is: 2.1267647932558653966460912964486e+37
And a constructor from the special values \(\pm 0\), \(\pm \infty\) and NaN:
std::cout << "Constructing a negative zero (53 bits) : "
<< real{real_kind::zero, -1, 53}.to_string() << '\n';
std::cout << "Constructing a positive infinity (53 bits): "
<< real{real_kind::inf, 53}.to_string() << '\n';
std::cout << "Constructing a NaN (53 bits) : "
<< real{real_kind::nan, 53}.to_string() << '\n';
Constructing a negative zero (53 bits) : -0.0000000000000000
Constructing a positive infinity (53 bits): inf
Constructing a NaN (53 bits) : nan
Getting and setting the precision#
The precision of a real
is not fixed after construction, and it can be altered via the set_prec()
and prec_round()
member functions (or their free-function counterparts).
set_prec()
is destructive - in addition to changing the precision of a real
, it will also reset its value to NaN:
{
real r0 = 123;
std::cout << "The initial precision of r0 is " << r0.get_prec()
<< ", the initial value is " << r0 << '\n';
// Destructively change the precision.
r0.set_prec(64);
std::cout << "The new precision of r0 is " << r0.get_prec()
<< ", the new value is " << r0 << '\n';
}
The initial precision of r0 is 32, the initial value is 123
The new precision of r0 is 64, the new value is nan
In contrast, prec_round()
will either preserve exactly the original value (if the new precision is higher than the old one) or it will perform a rounding operation (if the new precision is lower than the old one):
{
real r0 = 123;
std::cout << "The initial precision of r0 is " << r0.get_prec()
<< ", the initial value is " << r0 << '\n';
// Change the precision preserving the original value.
r0.prec_round(256);
std::cout << "The new precision of r0 is " << r0.get_prec()
<< ", the new value is " << r0 << '\n';
// Lower the precision.
r0.prec_round(4);
std::cout << "The new precision of r0 is " << r0.get_prec()
<< ", the new value is " << r0 << '\n';
}
The initial precision of r0 is 32, the initial value is 123
The new precision of r0 is 256, the new value is 123
The new precision of r0 is 4, the new value is 120
Assignment#
The assignment operators behave exactly like the corresponding constructors - after assignment, the precision of the real
object will match the deduced precision of the assignment argument:
{
real r0;
std::cout << "The initial precision of r0 is: " << r0.get_prec() << '\n';
// Assign an int.
r0 = 42;
std::cout << "After assignment from an int, the precision of r0 is : "
<< r0.get_prec() << '\n';
// Assign a double.
r0 = 1.2345;
std::cout << "After assignment from a double, the precision of r0 is: "
<< r0.get_prec() << '\n';
}
The initial precision of r0 is: 2
After assignment from an int, the precision of r0 is : 32
After assignment from a double, the precision of r0 is: 53
In order to override the automatic deduction behaviour, the set()
family of member functions (and their free-function counterparts) can be used:
{
real r0{real_kind::zero, 12};
std::cout << "r0 has been created with a precision of " << r0.get_prec() << '\n';
// Set to an int.
r0.set(42);
std::cout << "r0 is now " << r0.to_string() << ", the precision is still " << r0.get_prec() << '\n';
// Set to a double.
r0.set(1.1);
std::cout << "r0 is now " << r0.to_string() << ", the precision is still " << r0.get_prec() << '\n';
}
r0 has been created with a precision of 12
r0 is now 4.2000e+1, the precision is still 12
r0 is now 1.1001, the precision is still 12
The set()
functions also support setting from string-like entities:
{
real r0{real_kind::zero, 12};
std::cout << "r0 has been created with a precision of " << r0.get_prec() << '\n';
// Set to a string.
r0.set("1.3");
std::cout << "r0 is now " << r0.to_string() << ", the precision is still " << r0.get_prec() << '\n';
}
r0 has been created with a precision of 12
r0 is now 1.2998, the precision is still 12
Specialised setters are also available:
{
real r0{real_kind::zero, 12};
std::cout << "r0 has been created with a precision of " << r0.get_prec() << '\n';
// Set to nan, inf.
r0.set_nan();
std::cout << "r0 has been set to " << r0 << " with a precision of " << r0.get_prec() << '\n';
r0.set_inf(-1);
std::cout << "r0 has been set to " << r0 << " with a precision of " << r0.get_prec() << '\n';
// Set to 42*2**-128.
set_ui_2exp(r0, 42u, -128);
std::cout << "r0 has been set to " << r0 << " with a precision of " << r0.get_prec() << '\n';
}
r0 has been created with a precision of 12
r0 has been set to nan with a precision of 12
r0 has been set to -inf with a precision of 12
r0 has been set to 1.23427e-37 with a precision of 12
Conversion#
real
can be converted to both fundamental C++ types and other mp++ classes:
{
// Conversion to integral types truncates.
std::cout << "1.1 converted to int is : " << static_cast<int>(real{"1.1", 32}) << '\n';
std::cout << "-2.1 converted to integer<1> is: " << static_cast<mppp::integer<1>>(real{"-2.1", 32}) << "\n\n";
// Conversion to rational is exact.
std::cout << "The 32-bit approximation of 1.1 is exactly: " << static_cast<mppp::rational<1>>(real{"1.1", 32}) << "\n\n";
std::cout << "Extending the 12-bit approximation of 1.1 to 'double' yields: " << static_cast<double>(real{"1.1", 12}) << '\n';
}
1.1 converted to int is : 1
-2.1 converted to integer<1> is: -2
The 32-bit approximation of 1.1 is exactly: 2362232013/2147483648
Extending the 12-bit approximation of 1.1 to 'double' yields: 1.1001
Note that, as usual, the conversion operator of real
is explicit
.
The conversion can fail in some cases:
try {
static_cast<int>(real{"inf", 32});
} catch (const std::domain_error &de) {
std::cerr << de.what() << '\n';
}
Cannot convert a non-finite real to a C++ signed integral type
try {
static_cast<int>(real{2l, 123, 100});
} catch (const std::overflow_error &oe) {
std::cerr << oe.what() << '\n';
}
Conversion of the real 2.1267647932558653966460912964486e+37 to the type 'int' results in overflow
If exceptions are to be avoided, the non-throwing get()
family of functions can be used instead of the conversion operator.
Like the other mp++ classes, real
is contextually-convertible to bool
. Note that NaN
converts to true
:
if (real{123}) {
std::cout << "123 is true\n";
}
if (!real{0}) {
std::cout << "0 is false\n";
}
std::cout << "NaN is " << (real{"nan", 32} ? "true" : "false") << '\n';
123 is true
0 is false
NaN is true
User-defined literals#
The real
class provides a few user-defined literals. The literals, as usual, are defined in the mppp::literals
inline namespace, and they support decimal and hexadecimal representations for a few predefined precision values:
std::cout << "The 128-bit approximation of '123.456' is : " << (123.456_r128).to_string() << '\n';
std::cout << "The 256-bit approximation of '42' is : " << (4.2e1_r256).to_string() << '\n';
// Hexadecimal notation is supported too.
std::cout << "The 512-bit approximation of '0x1.12p-1' is: " << (0x1.12p-1_r512).to_string() << '\n';
The 128-bit approximation of '123.456' is : 1.234559999999999999999999999999999999999e+2
The 256-bit approximation of '42' is : 4.200000000000000000000000000000000000000000000000000000000000000000000000000000e+1
The 512-bit approximation of '0x1.12p-1' is: 5.35156250000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000e-1