2020 excess mortality & voting patterns in CH
Mortality data preparation (Wohnort based)
Source
Custom file delivered by BfS covering 2021 period and standardizing municipalities to Jul 2021.
Using Wohnort (the municipality where the person lived) as unit of analysis.
<- read_csv("data-raw/BfS-closed/monthly_deaths/13OCT21_Lieferung_Panczak_Wohnort.zip",
w_deaths_2015_2021 col_types = cols(year = col_integer(),
month = col_integer(),
COUNT = col_integer()),
locale = readr::locale(encoding = "latin1")) %>%
mutate(sex = if_else(sex == "F", "Female", "Male")) %>%
select(-canton, -community, -nationality) %>%
rename(GMDNAME = wohnort,
canton = wohnkanton) %>%
# hyphen in name >> doesnt link to spatial data
mutate(GMDNAME = if_else(GMDNAME == "La Grande-Béroche", "La Grande Béroche", GMDNAME)) %>%
# name change? >> resolved with Cordula
mutate(GMDNAME = if_else(GMDNAME == "Klosters-Serneus", "Klosters", GMDNAME))
Municipality updates
To reach the state of 2022-01-01
.
%<>%
w_deaths_2015_2021 # Essertes merge
mutate(GMDNAME = if_else(GMDNAME == "Essertes", "Oron", GMDNAME)) %>%
# Zurzach
mutate(GMDNAME = if_else(GMDNAME == "Bad Zurzach", "Zurzach", GMDNAME)) %>%
mutate(GMDNAME = if_else(GMDNAME == "Baldingen", "Zurzach", GMDNAME)) %>%
mutate(GMDNAME = if_else(GMDNAME == "Böbikon", "Zurzach", GMDNAME)) %>%
mutate(GMDNAME = if_else(GMDNAME == "Kaiserstuhl", "Zurzach", GMDNAME)) %>%
mutate(GMDNAME = if_else(GMDNAME == "Rekingen (AG)", "Zurzach", GMDNAME)) %>%
mutate(GMDNAME = if_else(GMDNAME == "Rietheim", "Zurzach", GMDNAME)) %>%
mutate(GMDNAME = if_else(GMDNAME == "Rümikon", "Zurzach", GMDNAME)) %>%
mutate(GMDNAME = if_else(GMDNAME == "Wislikofen", "Zurzach", GMDNAME)) %>%
# Böztal merge
mutate(GMDNAME = if_else(GMDNAME == "Bözen", "Böztal", GMDNAME)) %>%
mutate(GMDNAME = if_else(GMDNAME == "Effingen", "Böztal", GMDNAME)) %>%
mutate(GMDNAME = if_else(GMDNAME == "Elfingen", "Böztal", GMDNAME)) %>%
mutate(GMDNAME = if_else(GMDNAME == "Hornussen", "Böztal", GMDNAME)) %>%
# Blonay - Saint-Légier merge
mutate(GMDNAME = if_else(GMDNAME == "Blonay", "Blonay - Saint-Légier", GMDNAME)) %>%
mutate(GMDNAME = if_else(GMDNAME == "Saint-Légier-La Chiésaz", "Blonay - Saint-Légier", GMDNAME)) %>%
# Murten merge
mutate(GMDNAME = if_else(GMDNAME == "Galmiz", "Murten", GMDNAME)) %>%
mutate(GMDNAME = if_else(GMDNAME == "Gempenach", "Murten", GMDNAME)) %>%
mutate(GMDNAME = if_else(GMDNAME == "Clavaleyres", "Murten", GMDNAME)) %>%
# Schwende-Rüte merge
mutate(GMDNAME = if_else(GMDNAME == "Schwende", "Schwende-Rüte", GMDNAME)) %>%
mutate(GMDNAME = if_else(GMDNAME == "Rüte", "Schwende-Rüte", GMDNAME)) %>%
# Val Mara merge
mutate(GMDNAME = if_else(GMDNAME == "Melano", "Val Mara", GMDNAME)) %>%
mutate(GMDNAME = if_else(GMDNAME == "Maroggia", "Val Mara", GMDNAME)) %>%
mutate(GMDNAME = if_else(GMDNAME == "Rovio", "Val Mara", GMDNAME))
Monthly aggregation
%<>%
w_deaths_2015_2021 group_by(year, month, canton, GMDNAME, agegroup, sex) %>%
summarise(deaths = sum(COUNT)) %>%
ungroup()
Description of data
The data contains 267397 observations of the following 7 variables:
- year: n = 267397, Mean = 2017.81, SD = 1.89, Median = 2018.00, MAD = 2.97,
range: [2015, 2021], Skewness = 0.03, Kurtosis = -1.19, 0% missing
- month: n = 267397, Mean = 6.23, SD = 3.53, Median = 6.00, MAD = 4.45, range:
[1, 12], Skewness = 0.12, Kurtosis = -1.25, 0% missing
- canton: 26 entries, such as BE (14.36%); ZH (12.24%); AG (9.35%) and 23
others (0 missing)
- GMDNAME: 2143 entries, such as Zürich (0.46%); Genève (0.39%); Basel (0.38%)
and 2140 others (0 missing)
- agegroup: 10 entries, such as [80-90) (31.11%); [90-116) (22.49%); [70-80)
(21.04%) and 7 others (0 missing)
- sex: 2 entries, such as Male (51.58%); Female (48.42%); NA (0 missing)
- deaths: n = 267397, Mean = 1.66, SD = 2.58, Median = 1.00, MAD = 0.00, range:
[1, 124], Skewness = 13.10, Kurtosis = 266.87, 0% missing
Name | w_deaths_2015_2021 |
Number of rows | 267397 |
Number of columns | 7 |
_______________________ | |
Column type frequency: | |
character | 4 |
numeric | 3 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
canton | 0 | 1 | 2 | 2 | 0 | 26 | 0 |
GMDNAME | 0 | 1 | 2 | 27 | 0 | 2143 | 0 |
agegroup | 0 | 1 | 6 | 8 | 0 | 10 | 0 |
sex | 0 | 1 | 4 | 6 | 0 | 2 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
year | 0 | 1 | 2017.81 | 1.89 | 2015 | 2016 | 2018 | 2019 | 2021 | ▇▅▅▅▆ |
month | 0 | 1 | 6.23 | 3.53 | 1 | 3 | 6 | 9 | 12 | ▇▅▅▅▇ |
deaths | 0 | 1 | 1.66 | 2.58 | 1 | 1 | 1 | 2 | 124 | ▇▁▁▁▁ |
445,199 deaths by year:
# A tibble: 7 × 2
year Deaths
<int> <chr>
1 2015 67,617
2 2016 64,948
3 2017 66,982
4 2018 67,096
5 2019 67,793
6 2020 76,181
7 2021 34,582
Codes of geographical areas
There were 2,145 municipalities on the 2022-01-01
in CH
(mark #28 tho!).
2 had no deaths recorded during the study period.
# A tibble: 2 × 2
GMDNAME KTNAME
<chr> <chr>
1 Geltwil AG
2 Rossenges VD
Age groups
Original age groups:
agegroup <character>
# total N=267397 valid N=267397 mean=8.25 sd=1.65
Value | N | Raw % | Valid % | Cum. %
-------------------------------------------
[0-10) | 2295 | 0.86 | 0.86 | 0.86
[10-20) | 824 | 0.31 | 0.31 | 1.17
[20-30) | 2092 | 0.78 | 0.78 | 1.95
[30-40) | 3096 | 1.16 | 1.16 | 3.11
[40-50) | 7299 | 2.73 | 2.73 | 5.84
[50-60) | 18711 | 7.00 | 7.00 | 12.83
[60-70) | 33492 | 12.53 | 12.53 | 25.36
[70-80) | 56264 | 21.04 | 21.04 | 46.40
[80-90) | 83196 | 31.11 | 31.11 | 77.51
[90-116) | 60128 | 22.49 | 22.49 | 100.00
<NA> | 0 | 0.00 | <NA> | <NA>
Were simplified to <40
, 40-59
,
60-69
, 70-79
& 80+
%<>%
w_deaths_2015_2021 mutate(
age = case_when(
== "[0-10)" ~ "<40",
agegroup == "[10-20)" ~ "<40",
agegroup == "[20-30)" ~ "<40",
agegroup == "[30-40)" ~ "<40",
agegroup
== "[40-50)" ~ "40-49",
agegroup == "[50-60)" ~ "50-59",
agegroup == "[60-70)" ~ "60-69",
agegroup == "[70-80)" ~ "70-79",
agegroup
== "[80-90)" ~ "80+",
agegroup == "[90-116)" ~ "80+" ,
agegroup TRUE ~ agegroup)
%>%
) select(-agegroup) %>%
group_by(GMDNAME, year, month, canton, sex, age) %>%
summarise(deaths = sum(deaths)) %>%
ungroup()
age <character>
# total N=235870 valid N=235870 mean=4.95 sd=1.32
Value | N | Raw % | Valid % | Cum. %
-----------------------------------------
<40 | 7736 | 3.28 | 3.28 | 3.28
40-49 | 7299 | 3.09 | 3.09 | 6.37
50-59 | 18711 | 7.93 | 7.93 | 14.31
60-69 | 33492 | 14.20 | 14.20 | 28.51
70-79 | 56264 | 23.85 | 23.85 | 52.36
80+ | 112368 | 47.64 | 47.64 | 100.00
<NA> | 0 | 0.00 | <NA> | <NA>
Distribution of death counts
Obviously highly skewed.
Note: No zeroes here yet! No pop denominator
either!
Filling with zeroes
All possible combinations of
year, month, canton, municipality, age & sex
are used
and then empty strata are filled with zeroes.
This expansion also includes municipalities with zero deaths described above.
<-
w_deaths_2015_2021_exp %>%
raum select(GMDNAME) %>%
distinct() %>%
left_join(w_deaths_2015_2021) %>%
select(-deaths, -canton) %>%
# avoiding NAs in three communities without cases
mutate(
year = if_else(is.na(year), as.integer(2015), year),
month = if_else(is.na(month), as.integer(1), month),
sex = if_else(is.na(sex), "Male", sex),
age = if_else(is.na(age), "<40", age)
%>%
) expand(GMDNAME, year, month, sex, age) %>%
left_join(w_deaths_2015_2021) %>%
replace_na(list(deaths = 0)) %>%
mutate(date = ymd(paste(year, month, "1", sep = " "))) %>%
relocate(date, .after = month) %>%
left_join(raum) %>%
select(-canton) %>%
relocate(GMDNR, .before = GMDNAME) %>%
relocate(KTNR, id_kt, KTNAME, .after = GMDNAME) %>%
arrange(GMDNR, year, month, age, sex) %>%
filter( !(year == 2021 & month > 6))
That obviously increases the size of the dataset from 235,870 to 2,007,720 (ie. 2,163 municipalities each with 2,016 data points * 7 years * 12 months * 6 age groups * 2 sexes).
Zero cells
Needless to say that creates large amount of zero cells:
deaths == 0 <lgl>
# total N=2007720 valid N=2007720 mean=0.88 sd=0.32
Value | N | Raw % | Valid % | Cum. %
------------------------------------------
FALSE | 235870 | 11.75 | 11.75 | 11.75
TRUE | 1771850 | 88.25 | 88.25 | 100.00
<NA> | 0 | 0.00 | <NA> | <NA>
Situation only marginally improves with exclusion of
<40
age group stratum:
deaths == 0 <lgl>
# total N=1673100 valid N=1673100 mean=0.86 sd=0.34
Value | N | Raw % | Valid % | Cum. %
------------------------------------------
FALSE | 228134 | 13.64 | 13.64 | 13.64
TRUE | 1444966 | 86.36 | 86.36 | 100.00
<NA> | 0 | 0.00 | <NA> | <NA>
Examples of time series
Municipalities Bern & Meienried:
All time series
Yearly aggregation
Excluding 2021.
<- w_deaths_2015_2021_exp %>%
w_deaths_2015_2020_year filter(year < 2021) %>%
group_by(year, KTNAME, KTNR, id_kt, GMDNR, GMDNAME,
ARGRNR, ARGRNAME,
ARNR, ARNAME, border, %>%
sex, age) summarise(deaths = sum(deaths)) %>%
ungroup()
Miscelaneous ID variables for INLA
<- read_rds("data/BfS/gg.Rds") %>%
gg ::st_drop_geometry() %>%
sfselect(GMDNR, id_space)
%<>%
w_deaths_2015_2020_year left_join(gg) %>%
mutate(
id_age = as.integer(as.factor(age)),
observed = deaths,
deaths = if_else(year == 2020, NA_integer_, deaths)) %>%
relocate(observed, .after = deaths) %>%
relocate(id_space, .after = GMDNR) %>%
relocate(id_age, .after = age)
Description of data
The data contains 154440 observations of the following 14 variables:
- year: n = 154440, Mean = 2017.50, SD = 1.71, Median = 2017.50, MAD = 2.22,
range: [2015, 2020], Skewness = 0.00, Kurtosis = -1.27, 0% missing
- KTNAME: 26 entries, such as BE (15.76%); VD (13.99%); AG (9.32%) and 23
others (0 missing)
- KTNR: n = 154440, Mean = 13.61, SD = 8.49, Median = 17.00, MAD = 8.90, range:
[1, 26], Skewness = -0.32, Kurtosis = -1.46, 0% missing
- GMDNR: n = 154440, Mean = 3289.82, SD = 2133.44, Median = 3296.00, MAD =
3224.65, range: [1, 6810], Skewness = -0.02, Kurtosis = -1.36, 0% missing
- GMDNAME: 2145 entries, such as Aadorf (0.05%); Aarau (0.05%); Aarberg (0.05%)
and 2142 others (0 missing)
- ARGRNR: 16 entries, such as 12 (14.17%); 02 (12.26%); 05 (8.58%) and 13
others (0 missing)
- ARGRNAME: 16 entries, such as Region Zürich (14.17%); Region Lausanne
(12.26%); Region Biel–Jura (8.58%) and 13 others (0 missing)
- ARNR: 101 entries, such as 10020 (4.38%); 12010 (3.64%); 02030 (3.03%) and 98
others (0 missing)
- ARNAME: 101 entries, such as Aarau-Olten (4.38%); Baden (3.64%); Solothurn
(3.03%) and 98 others (0 missing)
- border: n = 154440, Mean = 0.20, SD = 0.40, Median = 0.00, MAD = 0.00, range:
[0, 1], Skewness = 1.53, Kurtosis = 0.35, 0% missing
- sex: 2 entries, such as Female (50.00%); Male (50.00%); NA (0 missing)
- age: 6 entries, such as <40 (16.67%); 40-49 (16.67%); 50-59 (16.67%) and 3
others (0 missing)
- deaths: n = 154440, Mean = 2.60, SD = 15.76, Median = , MAD = 0.00, range:
[0, 1492], Skewness = 45.25, Kurtosis = 3173.33, 16.67% missing
- observed: n = 154440, Mean = 2.66, SD = 16.03, Median = 0.00, MAD = 0.00,
range: [0, 1493], Skewness = 44.55, Kurtosis = 3087.82, 0% missing
Name | w_deaths_2015_2020_year |
Number of rows | 154440 |
Number of columns | 17 |
_______________________ | |
Column type frequency: | |
character | 8 |
numeric | 9 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
KTNAME | 0 | 1 | 2 | 2 | 0 | 26 | 0 |
GMDNAME | 0 | 1 | 2 | 27 | 0 | 2145 | 0 |
ARGRNR | 0 | 1 | 2 | 2 | 0 | 16 | 0 |
ARGRNAME | 0 | 1 | 8 | 16 | 0 | 16 | 0 |
ARNR | 0 | 1 | 5 | 5 | 0 | 101 | 0 |
ARNAME | 0 | 1 | 3 | 30 | 0 | 101 | 0 |
sex | 0 | 1 | 4 | 6 | 0 | 2 | 0 |
age | 0 | 1 | 3 | 5 | 0 | 6 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
year | 0 | 1.00 | 2017.50 | 1.71 | 2015 | 2016 | 2017.5 | 2019 | 2020 | ▇▃▃▃▃ |
KTNR | 0 | 1.00 | 13.61 | 8.49 | 1 | 3 | 17.0 | 22 | 26 | ▇▃▂▇▇ |
id_kt | 0 | 1.00 | 13.61 | 8.49 | 1 | 3 | 17.0 | 22 | 26 | ▇▃▂▇▇ |
GMDNR | 0 | 1.00 | 3289.82 | 2133.44 | 1 | 1067 | 3296.0 | 5411 | 6810 | ▇▃▅▅▇ |
id_space | 0 | 1.00 | 1073.00 | 619.21 | 1 | 537 | 1073.0 | 1609 | 2145 | ▇▇▇▇▇ |
border | 0 | 1.00 | 0.20 | 0.40 | 0 | 0 | 0.0 | 0 | 1 | ▇▁▁▁▂ |
id_age | 0 | 1.00 | 3.50 | 1.71 | 1 | 2 | 3.5 | 5 | 6 | ▇▃▃▃▃ |
deaths | 25740 | 0.83 | 2.60 | 15.76 | 0 | 0 | 0.0 | 2 | 1492 | ▇▁▁▁▁ |
observed | 0 | 1.00 | 2.66 | 16.03 | 0 | 0 | 0.0 | 2 | 1493 | ▇▁▁▁▁ |
That obviously decreases the size of the dataset from 2,007,720 to 154,440 (ie. 2,163 municipalities each with 2,016 data points * 6 years * 6 age groups * 2 sexes).
Zero cells
Less but still large amount of zero cells:
deaths == 0 <lgl>
# total N=154440 valid N=128700 mean=0.54 sd=0.50
Value | N | Raw % | Valid % | Cum. %
----------------------------------------
FALSE | 58908 | 38.14 | 45.77 | 45.77
TRUE | 69792 | 45.19 | 54.23 | 100.00
<NA> | 25740 | 16.67 | <NA> | <NA>
Situation only marginally improves with exclusion of
<40
age group stratum:
deaths == 0 <lgl>
# total N=128700 valid N=107250 mean=0.49 sd=0.50
Value | N | Raw % | Valid % | Cum. %
----------------------------------------
FALSE | 54894 | 42.65 | 51.18 | 51.18
TRUE | 52356 | 40.68 | 48.82 | 100.00
<NA> | 21450 | 16.67 | <NA> | <NA>