Markov Chain¶

This example illustrates how to model a process that consists of probabilistic transitions between discrete states, and showcases how neworder can drastically increase performance on certain operations on dataframes.

Firstly we have 3 arbitrary states: 0, 1 and 2. The initial population starts in state 0, and the following transitions are permitted, as described by this transition matrix:

\[ \begin{pmatrix} 1-p_{01}-p_{02} & p_{01} & p_{02} \\ 0 & 1-p_{12} & p_{12} \\ p_{20} & 0 & 1-p_{20} \end{pmatrix} \]

Each transition is modelled as a Poisson process with different mean arrival times \(\mu_{ij}=1/\lambda_{ij}\), which generate the probabilities above by \(p_{ij}=\lambda_{ij}.\delta t\)

We use a time horizon of 100 (arbitrary units) with 100 steps and a population of 100000. This equates to computing ten million possible transitions during the model run. The sizes of the populations in each state, as the model progresses, is illustrated below. As you can see an equilibrium state is reached. (NB This means balanced transitions rather than no transitions)

Examples Source Code

The code for all the examples can be obtained by either:

pulling the docker image virgesmith/neworder:latest (more here), or
installing neworder, downloading the source code archive from the neworder releases page and extracting the examples subdirectory.

Performance¶

The model also implements a python-only equivalent of the no.df.transition() function, which has been optimised to use the pandas apply() rather than an explicit loop over the datafame.

The model takes about 45s to run (depending on platform). Changing MarkovChain.step() function to use neworder's C++ implementation results in a run time of 4.9s, close to a factor of 10 speedup. Note though that the C++ implementation can only operate on integer state data. If the state is expressed as another type, e.g. a string, consider changing the format, or just use the python implementation.

Input¶

import time
import numpy as np
import neworder as no
from markov_chain import MarkovChain
import visualisation

# Logging and checking options
# no.verbose()
no.checked()

npeople = 100000
tmax = 100
dt = 1.0

# params of poisson process transitions (p=lambda.exp(-lambda.x) where lambda=1/mean)
mu_01 = 13.0
mu_02 = 23.0
mu_12 = 29.0
mu_20 = 17.0

lambda_01 = 1.0 / mu_01
lambda_02 = 1.0 / mu_02
lambda_12 = 1.0 / mu_12
lambda_20 = 1.0 / mu_20

states = np.array([0, 1, 2])
# possible transitions:
# 0 -> 1
#  \    \
#    <-> 2
transition_matrix = np.array([
  [1.0 - lambda_01 * dt - lambda_02 * dt, lambda_01 * dt,       lambda_02 * dt      ],
  [0.0,                                   1.0 - lambda_12 * dt, lambda_12 * dt      ],
  [lambda_20 * dt,                        0.0,                  1.0 - lambda_20 * dt]
])

timeline = no.LinearTimeline(0, tmax, tmax)

model = MarkovChain(timeline, npeople, states, transition_matrix)

start = time.time()
no.run(model)
no.log("run time = %.2fs" % (time.time() - start))

visualisation.show(model)

[file: ./examples/markov_chain/model.py]

Implementation¶

import pandas as pd  # type: ignore
import numpy as np
import neworder as no


class MarkovChain(no.Model):
  def __init__(self, timeline: no.Timeline, npeople: int, states: np.ndarray, transition_matrix: np.ndarray) -> None:

    super().__init__(timeline, no.MonteCarlo.deterministic_identical_stream)
    self.npeople = npeople

    self.pop = pd.DataFrame(data={"state": np.full(npeople, 0),
                                  "t1": no.time.NEVER,
                                  "t2": no.time.NEVER})

    self.states = states
    self.transition_matrix = transition_matrix
    self.summary = pd.DataFrame(columns=states)
    self.summary.loc[0] = self.pop.state.value_counts().transpose()

  # pure python equivalent implementation of no.df.transition, to illustrate the performance gain
  def transition_py(self, colname: str) -> None:
    def _interp(cumprob: np.ndarray, x: float) -> int:
      lbound = 0
      while lbound < len(cumprob) - 1:
        if cumprob[lbound] > x:
          break
        lbound += 1
      return lbound

    def _sample(u: float, tc: np.ndarray, c: np.ndarray) -> float:
      return c[_interp(tc, u)]

    # u = m.mc.ustream(len(df))
    tc = np.cumsum(self.transition_matrix, axis=1)

    # reverse mapping of category label to index
    lookup = {self.states[i]: i for i in range(len(self.states))}

    # for i in range(len(df)):
    #   current = df.loc[i, colname]
    #   df.loc[i, colname] = sample(u[i], tc[lookup[current]], c)
    # this is a much faster equivalent of the loop in the commented code immediately above
    self.pop[colname] = self.pop[colname].apply(lambda current: _sample(self.mc.ustream(1), tc[lookup[current]], self.states))

  def step(self) -> None:
    # self.transition_py("state")
    # comment the above line and uncomment this line to use the faster C++ implementation
    no.df.transition(self, self.states, self.transition_matrix, self.pop, "state")
    self.summary.loc[len(self.summary)] = self.pop.state.value_counts().transpose()

  def finalise(self) -> None:
    self.summary["t"] = np.linspace(self.timeline.start, self.timeline.end, self.timeline.nsteps + 1)
    self.summary.reset_index(drop=True, inplace=True)
    self.summary.fillna(0, inplace=True)

[file: ./examples/markov_chain/markov_chain.py]

Output¶

population evolution