Overview¶
The Framework¶
The aim of the framework is to be as unrestrictive and flexible as possible, whilst still providing a skeleton on which to implement a model, and a suite of useful tools. Being data agnostic means that this framework can be run standalone or easily integrated with other models and/or into workflows with specific demands on input and output data formats.
It is designed to support both serial and parallel execution modes, with the latter being used to tackle large populations or to perform sensitivity or convergence analysis. neworder runs as happily on a desktop PC as it does on a HPC cluster.
To help users familiarise themselves with the framework, a number of detailed examples covering a variety of use cases are provided. What follows here is a detailed overview of the package functionality.
Provision¶
At at it heart, neworder simply provides a mechanism to iterate over a timeline and perform user-specified operations at each point on the timeline, with the help of a library of statistical and data manipulation functions.
This is provided by:
- a Model base class: providing the skeleton on which users implement their models
- a Timeline: the time horizon over which to run the model
- an optional spatial domain, which can be continuous (
Space
), discrete (StateGrid
), or a graph (GeospatialGraph
). - a MonteCarlo engine: a dedicated random number stream for each model instance, with specific configurations for parallel streams
- a library of Monte-Carlo methods and statistical functions, plus compatibility with
numpy
's statistical functionality through an adapter - data manipulation functions optimised for pandas DataFrames
- support for parallel execution using MPI (via the
mpi4py
package).
neworder explicitly does not provide any tools for things like visualisation, and users can thus use whatever packages they are most comfortable with. The examples, however, do provide various visualisations using matplotlib
.
Model Requirements¶
Timeline¶
neworder's timeline is conceptually a sequence of steps that are iterated over (calling the Model's step
and (optionally) check
methods at each iteration, plus the finalise
method at the last time point, which is commonly used to post-process the raw model data at the end of the model run. Timelines should not be incremented in client code, this happens automatically within the model.
The framework provides four types of timeline, and is extensible:
NoTimeline
: an arbitrary one-step timeline which is designed for continuous-time models in which the model evolution is computed in a single stepLinearTimeline
: a set of equally-spaced intervals in non-calendar timeNumericTimeline
: a fully-customisable non-calendar timeline allowing for unequally-spaced intervalsCalendarTimeline
: a timeline based on calendar dates with with (multiples of) daily, monthly or annual intervals
Calendar Timelines
- Calendar timelines do not provide intraday resolution
- Monthly increments preserve the day of the month (where possible)
- Daylight savings time adjustments are made which affect time intervals where the interval crosses a DST change
- Time intervals are computed in years, on the basis of a year being 365.2475 days
Custom timelines¶
If none of the supplied timelines are suitable, users can implement their own, inheriting from the abstract neworder.Timeline
base class, which also provides an index
property. The following must be implemented in the subclass:
symbol | type | description |
---|---|---|
at_end |
bool property |
whether the timeline has reached it's end point |
dt |
float property |
the size of the current timestep |
end |
Any property |
the end time of the timeline |
_next |
None method |
move to the next timestep (for internal use by model, should not normally be called in client code) |
nsteps |
int property |
the total number of timesteps |
start |
Any property |
the start time of the timeline |
time |
Any property |
the current time of the timeline |
__repr__ |
str method |
(optional) a string representation of the object, defaults to the name of the class |
As an example, this open-ended numeric timeline starts at zero and asymptotically converges to 1.0:
import neworder
class AsymptoticTimeline(neworder.Timeline):
def __init__(self) -> None:
super().__init__()
self.t = 1.0
@property
def start(self) -> float:
return 0.0
@property
def end(self) -> float:
return 1.0
@property
def nsteps(self) -> int:
return -1
@property
def time(self) -> float:
return 1.0 - self.t
@property
def dt(self) -> float:
return self.t / 2
def _next(self) -> None:
self.t /= 2
@property
def at_end(self) -> bool:
return False
class Model(neworder.Model):
def __init__(self) -> None:
super().__init__(AsymptoticTimeline())
def step(self) -> None:
neworder.log(f"{self.timeline.index}: {self.timeline.time}")
if self.timeline.time > 0.99:
self.halt()
neworder.run(Model())
Spatial Domain¶
neworder models do not require a spatial domain, but the following functionality is provided for cases where there is a spatial element to the problem being modelled.
Continuous¶
The Space
class to encapsulate a continuous space with arbirtrary dimensionality. The edges of the space can be unbounded, wrap-around, contrained, or "bounce". The move
method will ensure that entities in the space are assigned the appropriate position and velocity to conform with the edge behaviour. Additionally the methods dist2
and dist
and compute distances in the space taking into account the edge behaviour (i.e. wrap-around). dist2
returns the squared distance and, if appropriate, is a more efficient alternative.
See the boids examples for implementations.
Discrete¶
The StateGrid
class provides a discrete grid of arbitrary states. Wrapped and contrained edges (only) are supported.
See the Conway example for implementations.
Graph¶
The GeospatialGraph
class provides a wrapper around the networkx
and osmnx
packages, and provides methods for computing shortest paths, isochrones, and subgraphs as well as identifying edges connected to nodes and vice versa. Due to its heavy dependencies, it an extra - installed using pip install neworder[geospatial]
.
Model¶
In order to construct a functioning model, the minimal requirements of the model developer are to:
- create a subclass of
neworder.Model
- implement the following class methods:
- a constructor
- the
step
method (which is run at every timestep)
- define a timeline (see above) over which the model runs
- set a seeding policy for the random stream (3 are provided, but you can create your own)
- instantiate an instance of the subclass with the timeline and seeding policy
- then, simply pass your model to the
neworder.run
function.
the following can also be optionally implemented in the model:
- a
modify
method, which is called at the start of the model run and can be used for instance to modify the input data for different processes in a parallel run, for batch processing or sensitivity analysis. - a
check
method, which is run at every timestep, to e.g. perform checks simulation remains plausible.
Additional class methods
There are no restrictions on implementing additional methods in the model class, although bear in mind they won't be available to the neworder runtime (unless called by one of the functions listed above).
Pretty much everything else is entirely up to the model developer. While the module is completely agnostic about the format of data, the library functions accept and return numpy arrays, and it is recommended to use pandas dataframes where appropriate in order to be able to use the fast data manipulation functionality provided.
Like MODGEN, both time-based and case-based models are supported. In the latter, the timeline refers not to absolute time but the age of the cohort. Additionally continuous-time models can be implemented, using a "null NoTimeline
(see above) with only a single transition, and the Monte-Carlo library specifically provides functions for continuous sampling, e.g. from non-homogeneous Poisson processes.
New users should take a look at the examples, which cover a range of applications including implementations of some MODGEN teaching models.
Data and Performance¶
neworder is written in C++ with the python bindings provided by the pybind11 package. As python and C++ have very different memory models, it's generally not advisable to directly share data, i.e. to safely have a python object and a C++ object both referencing (and potentially modifying) the same memory location. Thus neworder class member variables are accessible only via member functions and results are returned by value (i.e. copied). However, there is a crucial exception to this: the numpy ndarray
type. This is fundamental to the operation of the framework, as it enables the C++ module to directly access (and modify) both numpy arrays and pandas data frames, facilitiating very fast implementation of algorithms operating directly on pandas DataFrames.*
Explicit Loops
To get the best performance, avoid using explicit loops in python code where "vectorised" neworder (or e.g. numpy) functions can be used instead.
You should also bear in mind that while python is a dynamically typed language, C++ is statically typed. If an argument to a neworder method is not the correct type, it will fail immediately (as opposed to python, which will fail only if an invalid operation for the given type is attempted). Note also that neworder
's python code has type annotations.
* the neworder.df.transition
function is over 2 or 3 orders of magnitude faster than a (naive) equivalent python implementation depending on the length of the dataset, and still an order of magnitude faster than an optimised python implementation.