Tips and Tricks¶
Model Initialisation¶
Base Model Initialisation
When instantiating the model subclass, it is essential that the neworder.Model base class is explicitly initialised. It must be supplied with a Timeline object and (optionally) a seeding function for the Monte-Carlo engine. Failure to do this will result in a runtime error.
Ensure the neworder.Model base class is properly initialised:
class MyModel(neworder.Model):
def __init__(self, args...) -> None:
timeline = ... # initialise an appropriate timeline
super().__init__(timeline) # (1)!
... # now initialise the subclass
- this line is essential
Custom Seeding Strategies¶
Random number generator
neworder random streams use the Mersenne Twister pseudorandom generator, as implemented in the C++ standard library.
neworder provides three basic seeding functions which initialise the model's random stream so that they are either non-reproducible (neworder.MonteCarlo.nondeterministic_stream), or reproducible and either identical (neworder.MonteCarlo.deterministic_identical_stream) or independent across parallel runs (neworder.MonteCarlo.deterministic_independent_stream). Typically, a user would select identical streams (and perturbed inputs) for sensitivity analysis, and independent streams (with identical inputs) for convergence analysis.
Seeder function signature
The seeder function must take no arguments and return an int. The inbuilt deterministic_independent_stream uses the MPI rank
of the process to create different seeds (for serial execution, the rank will always be zero).
If necessary, you can supply your own seeding strategy, for instance if you required half the processes to have identical streams:
Resetting the random streams
model.mc.reset() re-invokes the seeder. For non-deterministic seeders this produces a new seed, so the reset stream will differ from the original.
which returns the same seed for all odd-ranked processes and a different seed for the even-ranked ones. You can define your seeder inline when you instantiate the Model, e.g.
class MyModel(neworder.Model):
def __init__(self, timeline: neworder.Timeline) -> None:
super().__init__(timeline, lambda: (neworder.mpi.RANK % 2) + 12345)
...
If there was a requirement for multiple processes to all have the same nondeterministic stream, you could implement a seeding strategy like so:
def nondeterministic_identical_stream() -> int:
# only process 0 gets a seed
seed = neworder.MonteCarlo.nondeterministic_stream() if neworder.mpi.RANK == 0 else None
# then broadcasts it to the other processes
seed = neworder.mpi.COMM.bcast(seed, root=0)
return seed
Identical Streams¶
Synchronisation
Identically initialised random streams only stay in sync if the same number of samples are taken from each one .
The "option" example relies on parallel processes with identical random streams to reduce noise when computing differences for sensitivity analysis. It implements a check step that compares the internal states of the random stream in each process and fails if any are different (see the example code).
Ultimate Reproducibility¶
The MonteCarlo engine is a sequential stream: every draw advances its internal state, so the value you get depends on how many draws have been taken before it. This couples reproducibility to draw order - if agents are added, removed, or processed in a different sequence, the stream diverges.
SplitMix64 eliminates this coupling. Each variate is computed by hashing a set of integer keys (e.g. person ID, process ID, timestep) together with the seed. There is no state to advance, so:
- the draw for person i is the same whether you compute the full population or just person i in isolation,
- draws can be computed in any order, on any thread, without coordination.
Basic usage¶
Construct within your model class, passing a seeder (the same callables used by MonteCarlo):
Call uarray with any mix of scalar integers (used as context, adding no output dimension) and 1-D integer arrays (each adding one output dimension). Multi-dimensional arrays are not supported and will raise a TypeError.
# 1-D: one variate per person
draws = self.rng.uarray(person_ids, process_id, self.timeline.index)
# 2-D: one variate per (person, draw_index) pair
draws = self.rng.uarray(person_ids, process_id, self.timeline.index, draw_indices)
String keys (e.g. the name of a stochastic process) can be converted to stable integers with SplitMix64.hash64:
Repeated calls with the same arguments¶
Because SplitMix64 has no state, two calls with identical arguments return identical values. To get independent draws across repeated calls, either:
- use a non-deterministic seeder (this is called each time
uarrayis called), or - construct with
use_counter=True, which mixes an auto-incrementing counter into each call. In this case callingreset()rewinds the counter and will then replay the same sequence.
Multithreaded use with use_counter=True
The counter is a plain member variable. If multiple threads call uarray() concurrently on the same SplitMix64 instance, they will race on the increment and the results will be nondeterministic. Either give each thread its own SplitMix64 instance, or avoid use_counter=True in multithreaded contexts and instead incorporate a thread-specific deterministic scalar (e.g. task identifier or loop index) as a key argument to uarray().
When to use SplitMix64 vs MonteCarlo
Use MonteCarlo for general-purpose sampling (non-uniform distributions, arrival times, categorical transitions). Prefer SplitMix64 for uniform draws that must be stable under sub-sampling or reordering - for example when agents enter or leave the population mid-run, or when stochastic processes execute in a non-deterministic order.
External Sources of Randomness¶
Other libraries, such as numpy, contain a much broader selection of random number functionality than neworder does, and it makes no sense to reimplement such functionality. If you are using a specific seeding strategy within neworder, and are also using an external random generator, it is important to ensure they are also following the same strategy, otherwise reproducibility may be compromised.
In your model constructor, you can seed the numpy generator like so
ext_seed = self.mc.raw()
self.nprand = np.random.Generator(np.random.MT19937(ext_seed))
# ...get some values
x = self.nprand.normal(size=5)
If you've chosen a deterministic seeding strategy, then ext_seed will be reproducible, and if you've chosen an independent strategy, then ext_seed will be different for each process, thus propagating your chosen seeding strategy to the external generator.
Seeding external generators
Wherever possible, explicitly seed any external random generators using neworder's MonteCarlo engine. This will effectively propagate your seeding strategy to the external generator.
Using neworder's random generator with numpy¶
It is now possible to use the the neworder model's Monte-Carlo engine as a numpy generator. In this way all of numpy's functionality is available with neworder's MonteCarlo RNG. To achieve this use the adapter function as_np. Similarly to the example above, in your model constructor create the numpy generator, then:
NB as there is only one RNG state, you can safely get independent variates when calling both the RNG directly and via numpy.
Ending the model run¶
Models will continue to run until the end of their timeline is reached, unless explicitly told otherwise (see next section).
Finalisation
The model's finalise method can be optionally implemented as necessary, for example to write results to a file. It is automatically called by the neworder runtime only when the end of the timeline is reached.
Open-ended timelines and Conditional Halting¶
In some models, rather than (or as well as) evolving the population over a fixed timeline, it may make more sense to iterate timesteps until some condition is met. The "Schelling" example illustrates this - it runs until all agents are in a satisfied state. Currently, the inbuilt LinearTimeline and CalendarTimeline classes support both fixed and open-ended timelines. In other cases it may be useful to temporarily exit the model for later resumption.
The model's halt method can be used to stop the model run. In these situations, the step method should have some logic to (conditionally) call the halt method.
Model.halt()
This function does not end execution immediately, it signals to the neworder runtime not to iterate any further timesteps. Calling halt means that:
- the entire body of the
stepmethod (and thecheckmethod, if implemented) will still run for the current timestep, - the
finalisemethod, even if implemented, will not be excuted, - the
neworder.runmethod will then exit.
Overriding the halt method should not be necessary and is not recommended. The finalise method, if needed, must be called explicitly for models with open-ended timelines that have been halted.
Resuming execution
A model that has previously been halted but has not reached the end of its timeline can be resumed by passing it to neworder.run again. Attempting to resume a model that has reached the end of it's timeline will result in a StopIteration exception.
Deadlocks¶
Failure is All-Or-Nothing
If checks fail, or any other error occurs in one process in a parallel run, other processes must be notified, otherwise deadlocks can occur.
Blocking communications between processes will deadlock if, for instance, the receiving process has ended due to an error. This will cause the entire run to hang (and may impact your HPC bill). The option example, as described above, has a check for random stream synchronisation that looks like this:
def check(self) -> bool:
# check the rng streams are still in sync by sampling from each one,
# comparing, and broadcasting the result. If one process fails the
# check and exits without notifying the others, deadlocks can result.
# send the state representation to process 0 (others will get None)
states = neworder.mpi.COMM.gather(self.mc.state(), 0)
# process 0 checks the values
ok = all(s == states[0] for s in states) if states else True
# broadcast process 0's ok to all processes
ok = neworder.mpi.COMM.bcast(ok, root=0)
return ok
The key here is that there is only one result, shared between all processes. In this case only one process is performing the check and broadcasting the result to the others.
Tip
In general, the return value of check() should be the logical "and" of the results from each process.
Time Comparison¶
neworder uses 64-bit floating-point numbers to represent time, and the values -inf, +inf and nan respectively to represent the concepts of the distant past, the far future and never. This allows users to define, or compare against, values that are:
- unequal to any time value, including itself (
neworder.time.NEVER), - before any other (non-never) time value (
neworder.time.DISTANT_PAST) , or - after any other (non-never) time value (
neworder.time.FAR_FUTURE)
NaN comparisons
Due to the rules of IEEE754 floating-point, care must be taken when comparing to NaN/NEVER, since a direct comparison will always be false, i.e.: NEVER != NEVER.
To compare time values with "never", use the supplied function isnever():
import neworder
n = neworder.time.NEVER
neworder.log(n == n) # False!
neworder.log(neworder.time.isnever(n)) # True
Data Types¶
Static typing
Unlike python, C++ is a statically typed language and so neworder is strict about types. We strongly encourage the use of type annotations and a type checker (e.g. mypy) in python.
If an argument to a neworder method or function is not the correct type, it will fail immediately (as opposed to python, which will fail only if an invalid operation for the given type is attempted (a.k.a. "duck typing")). This applies to contained types (numpy's dtype) too. In the example below, the function is expecting an integer, and will complain if you pass it a floating-point argument:
>>> import neworder
>>> neworder.df.unique_index(3.0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unique_index(): incompatible function arguments. The following argument types are supported:
1. (n: int) -> numpy.ndarray[int64]
Invoked with: 3.0
Project Structure¶
Although obvious to many users, in order to promote reusability, it is recommended to separate out functionality into logical units, for example:
- model definition - the actual model implementation
- model data - loading and preprocessing of input data
- model execution - defining the parameters of the model and running it
- result postprocessing and visualisation
This makes life much easier when you want to:
- use the same model with different parameters and/or input data,
- run the model on different plaforms without modification (think desktop vs HPC cluster vs web service).
- have visualisations tailored to the platform you are working on.
- run multiple models from one script.
The examples use canned (i.e. already preprocessed) data but otherwise largely adhere to this pattern.