Skip to content

Mortality

We implement the example The Life Table from the second chapter of the book Microsimulation and Population Dynamics [3]. It models mortality in a homogeneous population with an age-specific mortality rate.

This example implements the model in two different ways: firstly a discrete case-based microsimulation, and again using a continuous sampling methodology, showcasing how the latter can be much more efficient. Rather than having a class to represent an individual, as would be standard in a MODGEN implementation, individuals are stored in a pandas Dataframe which provides fast iteration over the population.

Mortality histogram

Examples Source Code

The code for all the examples can be obtained by either:

  • pulling the docker image virgesmith/neworder:latest (more here), or
  • installing neworder, downloading the source code archive from the neworder releases page and extracting the examples subdirectory.

Inputs

The mortality data is derived from the NewETHPOP[1] project and represents the mortality rate for white British males in one of the London Boroughs. For timeline definition a maximum age of 100 is defined. Note though that the age at death may be higher than this, it's just the cutoff point at which the hazard rate is assumed to be constant.

import time

import neworder
# model implementations
from people import PeopleDiscrete, PeopleContinuous
# visualisation code
from plot import plot

# neworder.verbose()
# checks disabled to emphasise performance differences
neworder.checked(False)

# the max value in the timeline
max_age = 100.0
# The mortality rate data
mortality_hazard_file = "examples/mortality/mortality-wbi.csv"
population_size = 100000

neworder.log("Population = %d" % population_size)

# run the discrete model
mortality_discrete = PeopleDiscrete(mortality_hazard_file, population_size, max_age)
start = time.perf_counter()
neworder.run(mortality_discrete)
end = time.perf_counter()
neworder.log("Discrete model life expectancy = %f, exec time = %f" % (mortality_discrete.life_expectancy, end - start))

# run the continuous model
mortality_continuous = PeopleContinuous(mortality_hazard_file, population_size, 1.0)
start = time.perf_counter()
neworder.run(mortality_continuous)
end = time.perf_counter()
neworder.log("Continuous model life expectancy = %f, exec time = %f" % (mortality_continuous.life_expectancy, end - start))

# visualise some results
# hist_file = "docs/examples/img/mortality_%dk.png" % (population_size//1000)
# anim_file = "docs/examples/img/mortality_hist_%dk.gif" % (population_size//1000)
plot(mortality_discrete.population, mortality_continuous.population)
[file: ./examples/mortality/model.py]

Implementation

Discrete

As per the MODGEN implementation, we step through a case-based timeline and sample deaths using the marginal mortality rate as a (homogeneous) Poisson process, basically:

  • each year, sample time of death for alive individuals
  • if year is not at the end of the mortality table
    • if death occurs within the year,
      • record age at death and mark individual as dead
    • otherwise
      • increment age by 1 year and resample
  • otherwise
    • record age at death and mark individual as dead
  • take mean age at death

So the mortality hazard is treated as a series of piecewise homogeneous Poisson processes.

The discrete model is constructed, as usual, by initialising the base class with a timeline of 100 1-year steps and a seed for the Monte-Carlo engine, loads the mortality data, and initialise the population with age=0 and age at death as yet unknown:

class PeopleDiscrete(neworder.Model):
  """ Persons sampled each represented as a row in a data frame """
  def __init__(self, mortality_hazard_file: str, n: int, max_age: float) -> None:
    # This is case-based model the timeline refers to the age of the cohort
    timeline = neworder.LinearTimeline(0.0, max_age, int(max_age))
    super().__init__(timeline, neworder.MonteCarlo.deterministic_identical_stream)

    # initialise cohort
    self.mortality_hazard = pd.read_csv(mortality_hazard_file)

    # store the largest age we have a rate for
    self.max_rate_age = max(self.mortality_hazard.DC1117EW_C_AGE) - 1

    # neworder.log(self.mortality_hazard.head())
    self.population = pd.DataFrame(index=neworder.df.unique_index(n),
                                   data={"alive": True,
                                         "age": 0.0,
                                         "age_at_death": neworder.time.FAR_FUTURE})

    self.max_age = max_age
[file: ./examples/mortality/people.py]

The step method samples deaths according to the age-specific mortality rate (code not shown for brevity), then increments the age of the living by the timestep.

  def step(self) -> None:
    # kill off some people
    self.die()
    # age the living only
    alive = self.population.loc[self.population.alive].index
    self.population.loc[alive, "age"] = self.population.loc[alive, "age"] + self.timeline.dt
[file: ./examples/mortality/people.py]

When the end of the timeline is reached the finalise method is called, which checks that all individuals are now dead and computes the life expectancy:

  def finalise(self) -> None:
    # kill off any survivors
    self.die()
    assert np.sum(self.population.alive) == 0
    # the calc life expectancy
    self.life_expectancy = np.mean(self.population.age_at_death)
[file: ./examples/mortality/people.py]

Continuous

The second implementation samples the term structure of mortality directly using the Lewis-Shedler [4] "thinning" algorithm - this approach doesn't even require a timeline as each individual's age at death can be sampled directly, as a nonhomogeneous Poisson process.

The continuous model doesn't require a max_age argument, as there is no timeline, but it does need to know the time resolution of the mortality data in order to sample it correctly:

class PeopleContinuous(neworder.Model):
  """ Persons sampled each represented as a row in a data frame """
  def __init__(self, mortality_hazard_file: str, n: int, dt: float) -> None:
    # Direct sampling doesnt require a timeline
    super().__init__(neworder.NoTimeline(), neworder.MonteCarlo.deterministic_identical_stream)
    # initialise cohort
    self.mortality_hazard = pd.read_csv(mortality_hazard_file)

    # store the largest age we have a rate for
    self.max_rate_age = max(self.mortality_hazard.DC1117EW_C_AGE) - 1

    # neworder.log(self.mortality_hazard.head())
    self.population = pd.DataFrame(index=neworder.df.unique_index(n),
                                   data={"age_at_death": neworder.time.FAR_FUTURE})

    # the time interval of the mortality data values
    self.dt = dt
[file: ./examples/mortality/people.py]

The step method samples, in a single pass, all the deaths, as arrival times in a nonhomogeneous Poisson process:

  def step(self) -> None:
    self.population.age_at_death = self.mc.first_arrival(self.mortality_hazard.Rate.values, self.dt, len(self.population))
[file: ./examples/mortality/people.py]

The single check ensures that all sampled values are finite:

  def check(self) -> bool:
    # ensure all times of death are finite
    return self.population.age_at_death.isnull().sum() == 0
[file: ./examples/mortality/people.py]

And finally the finalise method computes the life expectancy

  def finalise(self) -> None:
    self.life_expectancy = np.mean(self.population.age_at_death)
[file: ./examples/mortality/people.py]

Output

Running the model script will execute both models

python examples/mortality/model.py

with output like this

[py 0/1] Population = 100000
[py 0/1] Discrete model life expectancy = 77.432356, exec time = 1.321355
[py 0/1] Continuous model life expectancy = 77.388072, exec time = 0.161716

which illustrates how much more efficient the continuous implementation is (about ten times faster).

The visualisations (see examples source code for details) show an animated histogram of the deaths (above), and a comparison of the age to death distributions from the two implementations:

Mortality rate comparison