Mortality¶
We implement the example The Life Table from the second chapter of the book Microsimulation and Population Dynamics [3]. It models mortality in a homogeneous population with an age-specific mortality rate.
This example implements the model in two different ways: firstly a discrete case-based microsimulation, and again using a continuous sampling methodology, showcasing how the latter can be much more efficient. Rather than having a class to represent an individual, as would be standard in a MODGEN implementation, individuals are stored in a pandas Dataframe
which provides fast iteration over the population.
Examples Source Code
The code for all the examples can be obtained by either:
Inputs¶
The mortality data is derived from the NewETHPOP[1] project and represents the mortality rate for white British males in one of the London Boroughs. For timeline definition a maximum age of 100 is defined. Note though that the age at death may be higher than this, it's just the cutoff point at which the hazard rate is assumed to be constant.
import time
import neworder
# model implementations
from people import PeopleDiscrete, PeopleContinuous
# visualisation code
from plot import plot
# neworder.verbose()
# checks disabled to emphasise performance differences
neworder.checked(False)
# the max value in the timeline
max_age = 100.0
# The mortality rate data
mortality_hazard_file = "examples/mortality/mortality-wbi.csv"
population_size = 100000
neworder.log("Population = %d" % population_size)
# run the discrete model
mortality_discrete = PeopleDiscrete(mortality_hazard_file, population_size, max_age)
start = time.perf_counter()
neworder.run(mortality_discrete)
end = time.perf_counter()
neworder.log("Discrete model life expectancy = %f, exec time = %f" % (mortality_discrete.life_expectancy, end - start))
# run the continuous model
mortality_continuous = PeopleContinuous(mortality_hazard_file, population_size, 1.0)
start = time.perf_counter()
neworder.run(mortality_continuous)
end = time.perf_counter()
neworder.log("Continuous model life expectancy = %f, exec time = %f" % (mortality_continuous.life_expectancy, end - start))
# visualise some results
# hist_file = "docs/examples/img/mortality_%dk.png" % (population_size//1000)
# anim_file = "docs/examples/img/mortality_hist_%dk.gif" % (population_size//1000)
plot(mortality_discrete.population, mortality_continuous.population)
Implementation¶
Discrete¶
As per the MODGEN implementation, we step through a case-based timeline and sample deaths using the marginal mortality rate as a (homogeneous) Poisson process, basically:
- each year, sample time of death for alive individuals
- if year is not at the end of the mortality table
- if death occurs within the year,
- record age at death and mark individual as dead
- otherwise
- increment age by 1 year and resample
- if death occurs within the year,
- otherwise
- record age at death and mark individual as dead
- take mean age at death
So the mortality hazard is treated as a series of piecewise homogeneous Poisson processes.
The discrete model is constructed, as usual, by initialising the base class with a timeline of 100 1-year steps and a seed for the Monte-Carlo engine, loads the mortality data, and initialise the population with age=0 and age at death as yet unknown:
class PeopleDiscrete(neworder.Model):
""" Persons sampled each represented as a row in a data frame """
def __init__(self, mortality_hazard_file: str, n: int, max_age: float) -> None:
# This is case-based model the timeline refers to the age of the cohort
timeline = neworder.LinearTimeline(0.0, max_age, int(max_age))
super().__init__(timeline, neworder.MonteCarlo.deterministic_identical_stream)
# initialise cohort
self.mortality_hazard = pd.read_csv(mortality_hazard_file)
# store the largest age we have a rate for
self.max_rate_age = max(self.mortality_hazard.DC1117EW_C_AGE) - 1
# neworder.log(self.mortality_hazard.head())
self.population = pd.DataFrame(index=neworder.df.unique_index(n),
data={"alive": True,
"age": 0.0,
"age_at_death": neworder.time.FAR_FUTURE})
self.max_age = max_age
The step
method samples deaths according to the age-specific mortality rate (code not shown for brevity), then increments the age of the living by the timestep.
def step(self) -> None:
# kill off some people
self.die()
# age the living only
alive = self.population.loc[self.population.alive].index
self.population.loc[alive, "age"] = self.population.loc[alive, "age"] + self.timeline.dt
When the end of the timeline is reached the finalise
method is called, which checks that all individuals are now dead and computes the life expectancy:
def finalise(self) -> None:
# kill off any survivors
self.die()
assert np.sum(self.population.alive) == 0
# the calc life expectancy
self.life_expectancy = np.mean(self.population.age_at_death)
Continuous¶
The second implementation samples the term structure of mortality directly using the Lewis-Shedler [4] "thinning" algorithm - this approach doesn't even require a timeline as each individual's age at death can be sampled directly, as a nonhomogeneous Poisson process.
The continuous model doesn't require a max_age
argument, as there is no timeline, but it does need to know the time resolution of the mortality data in order to sample it correctly:
class PeopleContinuous(neworder.Model):
""" Persons sampled each represented as a row in a data frame """
def __init__(self, mortality_hazard_file: str, n: int, dt: float) -> None:
# Direct sampling doesnt require a timeline
super().__init__(neworder.NoTimeline(), neworder.MonteCarlo.deterministic_identical_stream)
# initialise cohort
self.mortality_hazard = pd.read_csv(mortality_hazard_file)
# store the largest age we have a rate for
self.max_rate_age = max(self.mortality_hazard.DC1117EW_C_AGE) - 1
# neworder.log(self.mortality_hazard.head())
self.population = pd.DataFrame(index=neworder.df.unique_index(n),
data={"age_at_death": neworder.time.FAR_FUTURE})
# the time interval of the mortality data values
self.dt = dt
The step method samples, in a single pass, all the deaths, as arrival times in a nonhomogeneous Poisson process:
def step(self) -> None:
self.population.age_at_death = self.mc.first_arrival(self.mortality_hazard.Rate.values, self.dt, len(self.population))
The single check ensures that all sampled values are finite:
def check(self) -> bool:
# ensure all times of death are finite
return self.population.age_at_death.isnull().sum() == 0
And finally the finalise
method computes the life expectancy
def finalise(self) -> None:
self.life_expectancy = np.mean(self.population.age_at_death)
Output¶
Running the model script will execute both models
python examples/mortality/model.py
with output like this
[py 0/1] Population = 100000
[py 0/1] Discrete model life expectancy = 77.432356, exec time = 1.321355
[py 0/1] Continuous model life expectancy = 77.388072, exec time = 0.161716
which illustrates how much more efficient the continuous implementation is (about ten times faster).
The visualisations (see examples source code for details) show an animated histogram of the deaths (above), and a comparison of the age to death distributions from the two implementations: