JIT compilation and caching

JIT compilation and caching#

heyoka.py makes extensive use of just-in-time (JIT) compilation techniques, implemented via the LLVM compiler infrastructure. JIT compilation is used not only in the implementation of the adaptive integrator, but also in compiled functions and in the implementation of dense/continuous output.

JIT compilation can provide a noticeable performance boost with respect to the usual ahead-of-time (AOT) compilation, because it takes advantage of all the features available on the target CPU. The downside is that JIT compilation is computationally expensive, and thus in some cases the compilation overhead can end up dominating the total runtime of the program.

Starting from version 2.0.0, heyoka.py implements an in-memory cache that alleviates the JIT compilation overhead by avoiding re-compilation of code that has already been compiled during the program execution. Version 7.10.0 adds an on-disk cache that persists across program executions.

In-memory cache#

Added in version 2.0.0.

Let us see the in-memory cache in action first. Let us begin by disabling the on-disk cache via set_diskcache_enabled(), so that the timings are not affected by the usage of the on-disk cache:

import heyoka as hy

hy.llvm_state.set_diskcache_enabled(False)

We can now proceed to timing the construction of an adaptive integrator:

%time ta = hy.taylor_adaptive(hy.model.pendulum(), [0., 1.])
CPU times: user 55.5 ms, sys: 4.07 ms, total: 59.5 ms
Wall time: 59.2 ms

Now we construct again the same integrator, again with timing:

%time ta = hy.taylor_adaptive(hy.model.pendulum(), [0., 1.])
CPU times: user 2.93 ms, sys: 136 μs, total: 3.07 ms
Wall time: 2.78 ms

We can see how the construction runtime has drastically decreased because heyoka.py cached the result of the compilation of the first integrator.

Let us see another example, this time involving continuous output. We propagate the system for a very short timespan, and we ask for the continuous output function object via the c_output=True flag:

%time ta.propagate_until(0.01, c_output=True)
CPU times: user 11.1 ms, sys: 992 μs, total: 12.1 ms
Wall time: 11.8 ms
(<taylor_outcome.time_limit: -4294967299>,
 inf,
 0.0,
 1,
 C++ datatype: double
 Direction   : forward
 Time range  : [0, 0.01)
 N of steps  : 1,
 None)

We can see how such a short integration took several milliseconds. Indeed, most of the time has been spent in the compilation of the function for the evaluation of the continuous output, rather than in the numerical integration.

Let us now repeat the same computation:

# Reset time and state.
ta.time = 0.0
ta.state[:] = [0.0, 1.0]

%time ta.propagate_until(0.01, c_output=True)
CPU times: user 1.13 ms, sys: 18 μs, total: 1.14 ms
Wall time: 844 μs
(<taylor_outcome.time_limit: -4294967299>,
 inf,
 0.0,
 1,
 C++ datatype: double
 Direction   : forward
 Time range  : [0, 0.01)
 N of steps  : 1,
 None)

We can see how the runtime has again drastically decreased thanks to the fact that the code for the evaluation of the continuous output had already been compiled earlier.

Several static methods are available in the llvm_state class to query and interact with the in-memory cache. For instance, we can fetch the current cache size:

f"Current cache size: {hy.llvm_state.get_memcache_size()} bytes"
'Current cache size: 159126 bytes'

By default, the maximum cache size is set to 2GB:

f"Current cache limit: {hy.llvm_state.get_memcache_limit()} bytes"
'Current cache limit: 2147483648 bytes'

If the cache size exceeds the limit, items in the cache are removed following a least-recently-used (LRU) policy. The cache limit can be changed at will:

# Set the maximum cache size to 1MB.
hy.llvm_state.set_memcache_limit(1024 * 1024)

f"New cache limit: {hy.llvm_state.get_memcache_limit()} bytes"
'New cache limit: 1048576 bytes'

The cache can be cleared:

# Clear the cache.
hy.llvm_state.clear_memcache()

f"Current cache size: {hy.llvm_state.get_memcache_size()} bytes"
'Current cache size: 0 bytes'

All the static methods to query and interact with the in-memory cache are thread-safe.

Note that in multi-processing scenarios (e.g., in process-based ensemble propagations) each process gets its own in-memory cache, and thus any custom cache setup (e.g., changing the default cache limit) needs to be performed in each and every process.

On-disk cache#

Added in version 7.10.0.

An on-disk cache, persisting across program executions, is also available and enabled by default. It is implemented as an SQLite database.

The default location of the cache directory on disk varies depending on the platform:

  • on Windows: %LOCALAPPDATA%\heyoka\cache\;

  • on OSX: $HOME/Library/Caches/heyoka/;

  • on Linux/Unix: $XDG_CACHE_HOME/heyoka/ if XDG_CACHE_HOME is defined, otherwise $HOME/.cache/heyoka/.

The default location can be overridden on program startup by defining the HEYOKA_CACHE_DIR environment variable. The cache directory can also be changed after program startup via set_diskcache_path().

Let us begin by re-enabling the on-disk cache (remember that we disabled it earlier in order to showcase the in-memory cache):

hy.llvm_state.set_diskcache_enabled(True)

The API of the on-disk cache largely mirrors the API of the in-memory cache. For instance, we can query the size limit (which by default is 20GB):

f"Current cache limit: {hy.llvm_state.get_diskcache_limit()} bytes"
'Current cache limit: 21474836480 bytes'

And we can ask for the current size:

f"Current cache size: {hy.llvm_state.get_diskcache_size()} bytes"
'Current cache size: 241392748 bytes'

Let us see an example in which we temporarily switch to another cache dir (so that we do not interfere with the default one):

import tempfile

# Clear the in-memory cache in order to enforce lookups from
# the on-disk cache.
hy.llvm_state.clear_memcache()

# Save the original cache path.
orig_path = hy.llvm_state.get_diskcache_path()

# Create a temp dir.
with tempfile.TemporaryDirectory() as tmp:
    # Use it as the cache dir.
    hy.llvm_state.set_diskcache_path(tmp)
    print(f"The new on-disk cache dir path is: {hy.llvm_state.get_diskcache_path()}")
    print(
        f"The initial on-disk cache size is: {hy.llvm_state.get_diskcache_size()} bytes"
    )

    # Construct an integrator.
    hy.taylor_adaptive(hy.model.pendulum(), [0.0, 1.0])

    print(
        f"After compilation, the on-disk cache size is: {hy.llvm_state.get_diskcache_size()} bytes"
    )

    # Restore the original cache dir path.
    hy.llvm_state.set_diskcache_path(orig_path)

print(
    f"After restoring the original cache path, the on-disk cache size is: {hy.llvm_state.get_diskcache_size()} bytes"
)
The new on-disk cache dir path is: /tmp/tmpx9dqf6iu
The initial on-disk cache size is: 0 bytes
After compilation, the on-disk cache size is: 134702 bytes
After restoring the original cache path, the on-disk cache size is: 241392748 bytes

It is important to remember that the on-disk cache is shared with other processes and that interacting with the cache from one process will affect other processes as well.

For instance, clearing the cache with clear_diskcache() will clear the cache for all processes, not just for the one in which set_diskcache_path() was invoked. Similarly, setting the cache limit via set_diskcache_limit() will also affect any other process which may be using the cache at the same time, as the size limit is stored on disk within the cache.

By contrast, setting the enabled/disabled state of the cache via set_diskcache_enabled() and setting a custom cache dir path via set_diskcache_path() are process-local operations which do not affect other processes.

Using the on-disk cache concurrently from multiple processes is safe, as long as the cache directory resides on a local file system. Network file systems (e.g., NFS, SMB/CIFS) are not supported, because the on-disk cache relies on file-locking semantics that network file systems do not reliably provide. If the default cache directory happens to reside on a network file system, a local alternative can be selected via set_diskcache_path() or via the HEYOKA_CACHE_DIR environment variable, as explained above.

If you suspect that the cache has become corrupted (for instance due to a power outage or an OS crash), you can just remove the content of the cache directory or even the directory itself - the cache will be automatically re-initialised during the next program startup.