Reproducibility

What does reproducibility mean?

We define a Criptic run (or a run of any other software) as reproducible if the results of running the executable twice, using identical inputs, and with an identical number of OpenMP threads and MPI ranks, is identical. In the case of a code like Criptic that writes checkpoints, we also require that the results of restarting a run from a checkpoint be identical to the results of carrying out the same run from the start rather than restarting.

Reproducibility is generally a desirable trait for testing and debugging. If a code is reproducible and an error occurs, we can run it again to reproduce the error and find the problem. If it is not reproducible, and the error in question occurs only under rare circumstances, it may be extremely difficult to diagnose.

Why is reproducibility an issue for Criptic?

Reproducibility is challenging for codes like Criptic that use threading because of race conditions – situations where the results depend on the order in which parallel threads are executed. In the case of Criptic, when advancing packets the default behavior is to use dynamic thread scheduling, meaning that each thread will work on the next packet to be advanced, and due to random fluctuations in the amount of time that it takes to advance a packet there is no guarantee that in two successive runs of the code the same threads will wind up working on the same packets.

The reason this breaks reproducibilty is that advancement of packets requires generation of random numbers. In order to avoid bottlenecks each thread maintains its own stream of random numbers. This means that, if dynamic scheduling causes packets to be handled by different threads in two successive runs, they may be advanced using different sets of random numbers in the two cases, leading to different outcomes.

Note that this issue only arises for runs using OpenMP threading. If threading is disabled, Criptic is fully reproducible.

Controlling reproducibility

It is possible to avoid this non-reproducible behavior by disabling dynamic scheduling. This worsens load balancing and therefore incurs a performance penalty, so by default this option is off. However, if you wish to enforce reproducibility, you can do so by compiling the code with the REPRODUCIBILITY=TRUE flag set – see Building criptic for details. This flag has no effect if OpenMP threads are not enabled.