Population Microsimulation¶

Population pyramid

Overview¶

In this example, the input data is a csv file containing a microsynthesised 2011 population of Newcastle generated from UK census data, by area (MSOA), age, gender and ethnicity. The transitions modelled are: ageing, births, deaths and migrations, over a period of 40 years to 2051.

Births, deaths and migrations (applied in that order) are modelled using Monte-Carlo simulation (sampling Poisson processes in various ways) using distributions parameterised by age, sex and ethnicity-specific fertility, mortality and migration rates respectively, which are largely fictitious (but inspired by data from the NewETHPOP[1] project).

For the fertility model newborns simply inherit their mother's location and ethnicity, are born aged zero, and have a randomly selected gender (with even probability). The migration model is an 'in-out' model, i.e. it is not a full origin-destination model. Flows are either inward from 'elsewhere' or outward to 'elsewhere'.

People who have died, and outward migrations are simply removed from the population. (In a larger-scale model migrations could be redistributed).

At each timestep the check method computes and displays some summary data:

the time
the size of the population
the mean age of the population
the percentage of the population that are female
the in and out migration numbers

Examples Source Code

The code for all the examples can be obtained by either:

pulling the docker image virgesmith/neworder:latest (more here), or
installing neworder, downloading the source code archive from the neworder releases page and extracting the examples subdirectory.

Setup¶

"""
model.py: Population Microsimulation - births, deaths and migration by age, gender and ethnicity
"""
import time
from datetime import date

import neworder
from population import Population

#neworder.verbose()

# input data
initial_population = "examples/people/E08000021_MSOA11_2011.csv"
# age, gender and ethnicity-specific rates
fertility_rate_data = "examples/people/fertility.csv"
mortality_rate_data = "examples/people/mortality.csv"
in_migration_rate_data = "examples/people/migration-in.csv"
out_migration_rate_data = "examples/people/migration-out.csv"

# define the evolution timeline
timeline = neworder.CalendarTimeline(date(2011, 1, 1), date(2051, 1, 1), 1, "y")

# create the model
population = Population(timeline, initial_population, fertility_rate_data, mortality_rate_data, in_migration_rate_data, out_migration_rate_data)

# run the model
start = time.time()
ok = neworder.run(population)
neworder.log("run time = %.2fs" % (time.time() - start))
assert ok

[file: examples/people/model.py]

Model Implementation¶

Births, deaths and outward migrations are modelled with Bernoulli trials with hazard rates parameterised by age, sex and ethnicity. Inward migrations are modelled by sampling counts from a Poisson process with the intensity parameterised by age, sex and ethnicity.

population.py:

"""
 population.py: Model implementation for population microsimulation
"""

import os
import pandas as pd  # type: ignore
import numpy as np
import neworder

import pyramid

class Population(neworder.Model):
  def __init__(self,
               timeline: neworder.Timeline,
               population_file: str,
               fertility_file: str,
               mortality_file: str,
               in_migration_file: str,
               out_migration_file: str):

    super().__init__(timeline, neworder.MonteCarlo.deterministic_identical_stream)

    # extract the local authority code from the filename
    self.lad = os.path.basename(population_file).split("_")[0]

    self.population = pd.read_csv(population_file)
    self.population.set_index(neworder.df.unique_index(len(self.population)), inplace=True, drop=True)

    # these datasets use a multiindex of age, gender and ethnicity
    # out migration is a hazard rate
    # in migration is the intensity of a Poisson process (not a hazard on existing residents!)
    self.fertility = pd.read_csv(fertility_file, index_col=[0,1,2])
    self.mortality = pd.read_csv(mortality_file, index_col=[0,1,2])
    self.in_migration = pd.read_csv(in_migration_file, index_col=[0,1,2])
    self.out_migration = pd.read_csv(out_migration_file, index_col=[0,1,2])

    # make gender and age categorical
    self.population.DC1117EW_C_AGE = self.population.DC1117EW_C_AGE.astype("category")
    self.population.DC1117EW_C_SEX = self.population.DC1117EW_C_SEX.astype("category")

    # actual age is randomised within the bound of the category (NB category values are age +1)
    self.population["Age"] = self.population.DC1117EW_C_AGE.astype(int) - self.mc.ustream(len(self.population))

    self.fig = None
    self.plot_pyramid()

  def step(self) -> None:
    self.births()
    self.deaths()
    self.migrations()
    self.age()

  def finalise(self) -> None:
    pass

  def age(self) -> None:
    # Increment age by timestep and update census age category (used for ASFR/ASMR lookup)
    # NB census age category max value is 86 (=85 or over)
    self.population.Age = self.population.Age + 1 # NB self.timeline.dt wont be exactly 1 as based on an average length year of 365.2475 days
    # reconstruct census age group
    self.population.DC1117EW_C_AGE = np.clip(np.ceil(self.population.Age), 1, 86).astype(int)

  def births(self) -> None:
    # First consider only females
    females = self.population[self.population.DC1117EW_C_SEX == 2].copy()

    # Now map the appropriate fertility rate to each female
    # might be a more efficient way of generating this array
    rates = females.join(self.fertility, on=["NewEthpop_ETH", "DC1117EW_C_SEX", "DC1117EW_C_AGE"])["Rate"].values
    # Then randomly determine if a birth occurred
    h = self.mc.hazard(rates * self.timeline.dt)

    # The babies are a clone of the new mothers, with with changed PID, reset age and randomised gender (keeping location and ethnicity)
    newborns = females[h == 1].copy()
    newborns.set_index(neworder.df.unique_index(len(newborns)), inplace=True, drop=True)
    newborns.Age = self.mc.ustream(len(newborns)) - 1.0 # born within the *next* 12 months (ageing step has yet to happen)
    newborns.DC1117EW_C_AGE = 1 # this is 0-1 in census category
    newborns.DC1117EW_C_SEX = 1 + self.mc.hazard(0.5, len(newborns)).astype(int) # 1=M, 2=F

    self.population = pd.concat((self.population, newborns))

  def deaths(self) -> None:
    # Map the appropriate mortality rate to each person
    # might be a more efficient way of generating this array
    rates = self.population.join(self.mortality, on=["NewEthpop_ETH", "DC1117EW_C_SEX", "DC1117EW_C_AGE"])["Rate"]

    # Then randomly determine if a death occurred
    h = self.mc.hazard(rates.values * self.timeline.dt)

    # Finally remove deceased from table
    self.population = self.population[h!=1]

  def migrations(self) -> None:

    # immigration:
    # - sample counts of migrants according to intensity
    # - append result to population

    self.in_migration["count"] = self.mc.counts(self.in_migration.Rate.values, self.timeline.dt)
    h_in = self.in_migration.loc[self.in_migration.index.repeat(self.in_migration["count"])].drop(["Rate", "count"], axis=1)
    h_in = h_in.reset_index().set_index(neworder.df.unique_index(len(h_in)))
    h_in["Area"] = self.lad
    # randomly sample exact age according to age group
    h_in["Age"] = h_in.DC1117EW_C_AGE - self.mc.ustream(len(h_in))

    # internal emigration:
    out_rates = self.population.join(self.out_migration, on=["NewEthpop_ETH", "DC1117EW_C_SEX", "DC1117EW_C_AGE"])["Rate"].values
    h_out = self.mc.hazard(out_rates * self.timeline.dt)
    # add incoming & remove outgoing migrants
    self.population = pd.concat((self.population[h_out!=1], h_in))

    # record net migration
    self.in_out = (len(h_in), h_out.sum())

  def mean_age(self) -> float:
    return self.population.Age.mean()

  def gender_split(self) -> float:
    # this is % female
    return self.population.DC1117EW_C_SEX.mean() - 1.0

  def size(self) -> int:
    return len(self.population)

  def check(self) -> bool:
    """ State of the nation """
    # check no duplicated unique indices
    if len(self.population[self.population.index.duplicated(keep=False)]):
      neworder.log("Duplicate indices found")
      return False
    # Valid ETH, SEX, AGE
    if not np.array_equal(sorted(self.population.DC1117EW_C_SEX.unique()), [1,2]):
      neworder.log("invalid gender value")
      return False
    if min(self.population.DC1117EW_C_AGE.unique().astype(int)) < 1 or \
      max(self.population.DC1117EW_C_AGE.unique().astype(int)) > 86:
      neworder.log("invalid categorical age value")
      return False
    # this can go below zero for cat 86+
    if (self.population.DC1117EW_C_AGE - self.population.Age).max() >= 1.0:
      neworder.log("invalid fractional age value")
      return False

    neworder.log("check OK: time={} size={} mean_age={:.2f}, pct_female={:.2f} net_migration={} ({}-{})" \
      .format(self.timeline.time.date(), self.size(), self.mean_age(), 100.0 * self.gender_split(),
      self.in_out[0] - self.in_out[1], self.in_out[0], self.in_out[1]))

    # if all is ok, plot the data
    self.plot_pyramid()

    return True # Faith

  def plot_pyramid(self) -> None:
    a = np.arange(86)
    s = self.population.groupby(by=["DC1117EW_C_SEX", "DC1117EW_C_AGE"])["DC1117EW_C_SEX"].count()
    m = s[s.index.isin([1], level="DC1117EW_C_SEX")].values
    f = s[s.index.isin([2], level="DC1117EW_C_SEX")].values

    if self.fig is None:
      self.fig, self.axes, self.mbar, self.fbar = pyramid.plot(a, m, f)
    else:
      # NB self.timeline.time is now the time at the *end* of the timestep since this is called from check() (as opposed to step())
      self.mbar, self.fbar = pyramid.update(str(self.timeline.time.year), self.fig, self.axes, self.mbar, self.fbar, a, m, f)

[file: examples/people/population.py]

Execution¶

To run the model:

python examples/people/model.py

Output¶

The model displays an animated population pyramid for the entire region being modelled (see above), plus some logging output with various statistics:

...
[py 0/1] check OK: time=2045-01-01 size=325865 mean_age=41.91, pct_female=49.46 net_migration=1202.0 (20320-19118.0)
[py 0/1] check OK: time=2046-01-01 size=326396 mean_age=41.91, pct_female=49.41 net_migration=787.0 (20007-19220.0)
[py 0/1] check OK: time=2047-01-01 size=327006 mean_age=41.88, pct_female=49.37 net_migration=921.0 (20252-19331.0)
[py 0/1] check OK: time=2048-01-01 size=327566 mean_age=41.87, pct_female=49.34 net_migration=780.0 (19924-19144.0)
[py 0/1] check OK: time=2049-01-01 size=328114 mean_age=41.84, pct_female=49.31 net_migration=824.0 (20140-19316.0)
[py 0/1] check OK: time=2050-01-01 size=328740 mean_age=41.81, pct_female=49.26 net_migration=826.0 (20218-19392.0)
[py 0/1] check OK: time=2051-01-01 size=329717 mean_age=41.77, pct_female=49.30 net_migration=1130.0 (20175-19045.0)
[py 0/1] run time = 17.19s

This 40 year simulation of an initial population of about 280,000 runs in under 20s on a single core of a medium-spec machine.