2020 excess mortality & voting patterns in CH

Mortality data preparation (Wohnort based)

Source

Custom file delivered by BfS covering 2021 period and standardizing municipalities to Jul 2021.

Using Wohnort (the municipality where the person lived) as unit of analysis.

w_deaths_2015_2021 <- read_csv("data-raw/BfS-closed/monthly_deaths/13OCT21_Lieferung_Panczak_Wohnort.zip", 
                               col_types = cols(year = col_integer(), 
                                                month = col_integer(), 
                                                COUNT = col_integer()),
                               locale = readr::locale(encoding = "latin1")) %>% 
  mutate(sex = if_else(sex == "F", "Female", "Male")) %>% 
  select(-canton, -community, -nationality) %>% 
  rename(GMDNAME = wohnort,
         canton = wohnkanton) %>% 
  # hyphen in name >> doesnt link to spatial data
  mutate(GMDNAME = if_else(GMDNAME == "La Grande-Béroche", "La Grande Béroche", GMDNAME)) %>% 
  # name change? >> resolved with Cordula
  mutate(GMDNAME = if_else(GMDNAME == "Klosters-Serneus", "Klosters", GMDNAME))

Municipality updates

To reach the state of 2022-01-01.

w_deaths_2015_2021 %<>% 
  # Essertes merge
  mutate(GMDNAME = if_else(GMDNAME == "Essertes", "Oron", GMDNAME)) %>% 
  # Zurzach
  mutate(GMDNAME = if_else(GMDNAME == "Bad Zurzach", "Zurzach", GMDNAME)) %>% 
  mutate(GMDNAME = if_else(GMDNAME == "Baldingen", "Zurzach", GMDNAME)) %>% 
  mutate(GMDNAME = if_else(GMDNAME == "Böbikon", "Zurzach", GMDNAME)) %>% 
  mutate(GMDNAME = if_else(GMDNAME == "Kaiserstuhl", "Zurzach", GMDNAME)) %>%
  mutate(GMDNAME = if_else(GMDNAME == "Rekingen (AG)", "Zurzach", GMDNAME)) %>%
  mutate(GMDNAME = if_else(GMDNAME == "Rietheim", "Zurzach", GMDNAME)) %>%
  mutate(GMDNAME = if_else(GMDNAME == "Rümikon", "Zurzach", GMDNAME)) %>% 
  mutate(GMDNAME = if_else(GMDNAME == "Wislikofen", "Zurzach", GMDNAME)) %>% 
  # Böztal merge
  mutate(GMDNAME = if_else(GMDNAME == "Bözen", "Böztal", GMDNAME)) %>% 
  mutate(GMDNAME = if_else(GMDNAME == "Effingen", "Böztal", GMDNAME)) %>% 
  mutate(GMDNAME = if_else(GMDNAME == "Elfingen", "Böztal", GMDNAME)) %>% 
  mutate(GMDNAME = if_else(GMDNAME == "Hornussen", "Böztal", GMDNAME)) %>% 
  # Blonay - Saint-Légier merge
  mutate(GMDNAME = if_else(GMDNAME == "Blonay", "Blonay - Saint-Légier", GMDNAME)) %>% 
  mutate(GMDNAME = if_else(GMDNAME == "Saint-Légier-La Chiésaz", "Blonay - Saint-Légier", GMDNAME)) %>% 
  # Murten merge
  mutate(GMDNAME = if_else(GMDNAME == "Galmiz", "Murten", GMDNAME)) %>% 
  mutate(GMDNAME = if_else(GMDNAME == "Gempenach", "Murten", GMDNAME)) %>% 
  mutate(GMDNAME = if_else(GMDNAME == "Clavaleyres", "Murten", GMDNAME)) %>% 
  # Schwende-Rüte merge
  mutate(GMDNAME = if_else(GMDNAME == "Schwende", "Schwende-Rüte", GMDNAME)) %>% 
  mutate(GMDNAME = if_else(GMDNAME == "Rüte", "Schwende-Rüte", GMDNAME)) %>% 
  # Val Mara merge
  mutate(GMDNAME = if_else(GMDNAME == "Melano", "Val Mara", GMDNAME)) %>% 
  mutate(GMDNAME = if_else(GMDNAME == "Maroggia", "Val Mara", GMDNAME)) %>% 
  mutate(GMDNAME = if_else(GMDNAME == "Rovio", "Val Mara", GMDNAME))

Monthly aggregation

w_deaths_2015_2021 %<>% 
  group_by(year, month, canton, GMDNAME, agegroup, sex) %>% 
  summarise(deaths = sum(COUNT)) %>% 
  ungroup()

Description of data

The data contains 267397 observations of the following 7 variables:

  - year: n = 267397, Mean = 2017.81, SD = 1.89, Median = 2018.00, MAD = 2.97,
range: [2015, 2021], Skewness = 0.03, Kurtosis = -1.19, 0% missing
  - month: n = 267397, Mean = 6.23, SD = 3.53, Median = 6.00, MAD = 4.45, range:
[1, 12], Skewness = 0.12, Kurtosis = -1.25, 0% missing
  - canton: 26 entries, such as BE (14.36%); ZH (12.24%); AG (9.35%) and 23
others (0 missing)
  - GMDNAME: 2143 entries, such as Zürich (0.46%); Genève (0.39%); Basel (0.38%)
and 2140 others (0 missing)
  - agegroup: 10 entries, such as [80-90) (31.11%); [90-116) (22.49%); [70-80)
(21.04%) and 7 others (0 missing)
  - sex: 2 entries, such as Male (51.58%); Female (48.42%); NA (0 missing)
  - deaths: n = 267397, Mean = 1.66, SD = 2.58, Median = 1.00, MAD = 0.00, range:
[1, 124], Skewness = 13.10, Kurtosis = 266.87, 0% missing

Data summary
Name	w_deaths_2015_2021
Number of rows	267397
Number of columns	7
_______________________
Column type frequency:
character	4
numeric	3
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
canton	1	2	2	26
GMDNAME	1	2	27	2143
agegroup	1	6	8	10
sex	1	4	6	2

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
year	1	2017.81	1.89	2015	2016	2018	2019	2021	▇▅▅▅▆
month	1	6.23	3.53	1	3	6	9	12	▇▅▅▅▇
deaths	1	1.66	2.58	1	1	1	2	124	▇▁▁▁▁

445,199 deaths by year:

# A tibble: 7 × 2
   year Deaths
  <int> <chr> 
1  2015 67,617
2  2016 64,948
3  2017 66,982
4  2018 67,096
5  2019 67,793
6  2020 76,181
7  2021 34,582

Codes of geographical areas

There were 2,145 municipalities on the 2022-01-01 in CH (mark #28 tho!).

2 had no deaths recorded during the study period.

# A tibble: 2 × 2
  GMDNAME   KTNAME
  <chr>     <chr> 
1 Geltwil   AG    
2 Rossenges VD

Age groups

Original age groups:

agegroup <character> 
# total N=267397 valid N=267397 mean=8.25 sd=1.65

Value    |     N | Raw % | Valid % | Cum. %
-------------------------------------------
[0-10)   |  2295 |  0.86 |    0.86 |   0.86
[10-20)  |   824 |  0.31 |    0.31 |   1.17
[20-30)  |  2092 |  0.78 |    0.78 |   1.95
[30-40)  |  3096 |  1.16 |    1.16 |   3.11
[40-50)  |  7299 |  2.73 |    2.73 |   5.84
[50-60)  | 18711 |  7.00 |    7.00 |  12.83
[60-70)  | 33492 | 12.53 |   12.53 |  25.36
[70-80)  | 56264 | 21.04 |   21.04 |  46.40
[80-90)  | 83196 | 31.11 |   31.11 |  77.51
[90-116) | 60128 | 22.49 |   22.49 | 100.00
<NA>     |     0 |  0.00 |    <NA> |   <NA>

Were simplified to <40, 40-59 , 60-69, 70-79 & 80+

w_deaths_2015_2021 %<>% 
  mutate(
    age = case_when(
      agegroup == "[0-10)" ~   "<40",
      agegroup == "[10-20)" ~  "<40",
      agegroup == "[20-30)" ~  "<40",
      agegroup == "[30-40)" ~  "<40",
      
      agegroup == "[40-50)" ~  "40-49",
      agegroup == "[50-60)" ~  "50-59",
      agegroup == "[60-70)" ~  "60-69",
      agegroup == "[70-80)" ~  "70-79",
      
      agegroup == "[80-90)" ~  "80+",
      agegroup == "[90-116)" ~ "80+" ,
      TRUE ~                    agegroup)
  ) %>% 
  select(-agegroup) %>% 
  group_by(GMDNAME, year, month, canton, sex, age) %>% 
  summarise(deaths = sum(deaths)) %>% 
  ungroup()

age <character> 
# total N=235870 valid N=235870 mean=4.95 sd=1.32

Value |      N | Raw % | Valid % | Cum. %
-----------------------------------------
<40   |   7736 |  3.28 |    3.28 |   3.28
40-49 |   7299 |  3.09 |    3.09 |   6.37
50-59 |  18711 |  7.93 |    7.93 |  14.31
60-69 |  33492 | 14.20 |   14.20 |  28.51
70-79 |  56264 | 23.85 |   23.85 |  52.36
80+   | 112368 | 47.64 |   47.64 | 100.00
<NA>  |      0 |  0.00 |    <NA> |   <NA>

Distribution of death counts

Obviously highly skewed.
Note: No zeroes here yet! No pop denominator either!

Filling with zeroes

All possible combinations of year, month, canton, municipality, age & sex are used and then empty strata are filled with zeroes.

This expansion also includes municipalities with zero deaths described above.

w_deaths_2015_2021_exp <- 
  raum %>% 
  select(GMDNAME) %>% 
  distinct() %>% 
  left_join(w_deaths_2015_2021) %>% 
  select(-deaths, -canton) %>% 
  # avoiding NAs in three communities without cases
  mutate(
    year = if_else(is.na(year), as.integer(2015), year),
    month = if_else(is.na(month), as.integer(1), month),
    sex = if_else(is.na(sex), "Male", sex),
    age = if_else(is.na(age), "<40", age)
  ) %>% 
  expand(GMDNAME, year, month, sex, age) %>%
  left_join(w_deaths_2015_2021) %>% 
  replace_na(list(deaths = 0)) %>% 
  mutate(date = ymd(paste(year, month, "1", sep = " "))) %>% 
  relocate(date, .after = month) %>% 
  left_join(raum) %>% 
  select(-canton) %>% 
  relocate(GMDNR, .before = GMDNAME) %>% 
  relocate(KTNR, id_kt, KTNAME, .after = GMDNAME) %>% 
  arrange(GMDNR, year, month, age, sex) %>% 
  filter( !(year == 2021 & month > 6))

That obviously increases the size of the dataset from 235,870 to 2,007,720 (ie. 2,163 municipalities each with 2,016 data points * 7 years * 12 months * 6 age groups * 2 sexes).

Zero cells

Needless to say that creates large amount of zero cells:

deaths == 0 <lgl> 
# total N=2007720 valid N=2007720 mean=0.88 sd=0.32

Value |       N | Raw % | Valid % | Cum. %
------------------------------------------
FALSE |  235870 | 11.75 |   11.75 |  11.75
TRUE  | 1771850 | 88.25 |   88.25 | 100.00
<NA>  |       0 |  0.00 |    <NA> |   <NA>

Situation only marginally improves with exclusion of <40 age group stratum:

deaths == 0 <lgl> 
# total N=1673100 valid N=1673100 mean=0.86 sd=0.34

Value |       N | Raw % | Valid % | Cum. %
------------------------------------------
FALSE |  228134 | 13.64 |   13.64 |  13.64
TRUE  | 1444966 | 86.36 |   86.36 | 100.00
<NA>  |       0 |  0.00 |    <NA> |   <NA>

Examples of time series

Municipalities Bern & Meienried:

All time series

Yearly aggregation

Excluding 2021.

w_deaths_2015_2020_year <- w_deaths_2015_2021_exp %>% 
  filter(year < 2021) %>% 
  group_by(year, KTNAME, KTNR, id_kt, GMDNR, GMDNAME, 
           ARGRNR, ARGRNAME, 
           ARNR, ARNAME, border, 
           sex, age) %>% 
  summarise(deaths = sum(deaths)) %>% 
  ungroup()

Miscelaneous ID variables for INLA

gg <- read_rds("data/BfS/gg.Rds") %>% 
  sf::st_drop_geometry() %>% 
  select(GMDNR, id_space)

w_deaths_2015_2020_year %<>% 
  left_join(gg) %>% 
  mutate(
    id_age = as.integer(as.factor(age)),
    observed = deaths, 
    deaths = if_else(year == 2020, NA_integer_, deaths)) %>% 
  relocate(observed, .after = deaths) %>% 
  relocate(id_space, .after = GMDNR) %>% 
  relocate(id_age, .after = age)

Description of data

The data contains 154440 observations of the following 14 variables:

  - year: n = 154440, Mean = 2017.50, SD = 1.71, Median = 2017.50, MAD = 2.22,
range: [2015, 2020], Skewness = 0.00, Kurtosis = -1.27, 0% missing
  - KTNAME: 26 entries, such as BE (15.76%); VD (13.99%); AG (9.32%) and 23
others (0 missing)
  - KTNR: n = 154440, Mean = 13.61, SD = 8.49, Median = 17.00, MAD = 8.90, range:
[1, 26], Skewness = -0.32, Kurtosis = -1.46, 0% missing
  - GMDNR: n = 154440, Mean = 3289.82, SD = 2133.44, Median = 3296.00, MAD =
3224.65, range: [1, 6810], Skewness = -0.02, Kurtosis = -1.36, 0% missing
  - GMDNAME: 2145 entries, such as Aadorf (0.05%); Aarau (0.05%); Aarberg (0.05%)
and 2142 others (0 missing)
  - ARGRNR: 16 entries, such as 12 (14.17%); 02 (12.26%); 05 (8.58%) and 13
others (0 missing)
  - ARGRNAME: 16 entries, such as Region Zürich (14.17%); Region Lausanne
(12.26%); Region Biel–Jura (8.58%) and 13 others (0 missing)
  - ARNR: 101 entries, such as 10020 (4.38%); 12010 (3.64%); 02030 (3.03%) and 98
others (0 missing)
  - ARNAME: 101 entries, such as Aarau-Olten (4.38%); Baden (3.64%); Solothurn
(3.03%) and 98 others (0 missing)
  - border: n = 154440, Mean = 0.20, SD = 0.40, Median = 0.00, MAD = 0.00, range:
[0, 1], Skewness = 1.53, Kurtosis = 0.35, 0% missing
  - sex: 2 entries, such as Female (50.00%); Male (50.00%); NA (0 missing)
  - age: 6 entries, such as <40 (16.67%); 40-49 (16.67%); 50-59 (16.67%) and 3
others (0 missing)
  - deaths: n = 154440, Mean = 2.60, SD = 15.76, Median = , MAD = 0.00, range:
[0, 1492], Skewness = 45.25, Kurtosis = 3173.33, 16.67% missing
  - observed: n = 154440, Mean = 2.66, SD = 16.03, Median = 0.00, MAD = 0.00,
range: [0, 1493], Skewness = 44.55, Kurtosis = 3087.82, 0% missing

Data summary
Name	w_deaths_2015_2020_year
Number of rows	154440
Number of columns	17
_______________________
Column type frequency:
character	8
numeric	9
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
KTNAME	1	2	2	26
GMDNAME	1	2	27	2145
ARGRNR	1	2	2	16
ARGRNAME	1	8	16	16
ARNR	1	5	5	101
ARNAME	1	3	30	101
sex	1	4	6	2
age	1	3	5	6

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
year	0	1.00	2017.50	1.71	2015	2016	2017.5	2019	2020	▇▃▃▃▃
KTNR	0	1.00	13.61	8.49	1	3	17.0	22	26	▇▃▂▇▇
id_kt	0	1.00	13.61	8.49	1	3	17.0	22	26	▇▃▂▇▇
GMDNR	0	1.00	3289.82	2133.44	1	1067	3296.0	5411	6810	▇▃▅▅▇
id_space	0	1.00	1073.00	619.21	1	537	1073.0	1609	2145	▇▇▇▇▇
border	0	1.00	0.20	0.40	0	0	0.0	0	1	▇▁▁▁▂
id_age	0	1.00	3.50	1.71	1	2	3.5	5	6	▇▃▃▃▃
deaths	25740	0.83	2.60	15.76	0	0	0.0	2	1492	▇▁▁▁▁
observed	0	1.00	2.66	16.03	0	0	0.0	2	1493	▇▁▁▁▁

That obviously decreases the size of the dataset from 2,007,720 to 154,440 (ie. 2,163 municipalities each with 2,016 data points * 6 years * 6 age groups * 2 sexes).

Zero cells

Less but still large amount of zero cells:

deaths == 0 <lgl> 
# total N=154440 valid N=128700 mean=0.54 sd=0.50

Value |     N | Raw % | Valid % | Cum. %
----------------------------------------
FALSE | 58908 | 38.14 |   45.77 |  45.77
TRUE  | 69792 | 45.19 |   54.23 | 100.00
<NA>  | 25740 | 16.67 |    <NA> |   <NA>

Situation only marginally improves with exclusion of <40 age group stratum:

deaths == 0 <lgl> 
# total N=128700 valid N=107250 mean=0.49 sd=0.50

Value |     N | Raw % | Valid % | Cum. %
----------------------------------------
FALSE | 54894 | 42.65 |   51.18 |  51.18
TRUE  | 52356 | 40.68 |   48.82 | 100.00
<NA>  | 21450 | 16.67 |    <NA> |   <NA>

2020 excess mortality & voting patterns in CH

Mortality data preparation (Wohnort based)

Source

Municipality updates

Monthly aggregation

Description of data

Codes of geographical areas

Age groups

Distribution of death counts

Filling with zeroes

Zero cells

Examples of time series

Municipalities Bern & Meienried:

All time series

Yearly aggregation

Miscelaneous ID variables for INLA

Description of data

Zero cells

Examples of Bern & Meienried: