Analyses of swisscom data
Grid extras
Swisscom grid coordinates & IDs
Tile definitions were pulled from API using
query_swisscom_heatmaps_api.py
.
<- function(filename) {
read_tiles
<- jsonlite::fromJSON(filename)
data <- jsonlite::flatten(data$tiles) %>%
data ::as_tibble()
dplyr
$plz <- gsub("grid_|.json", "", filename)
data$plz <- gsub("data/swisscom/", "", data$plz)
data
return( data )
}
::registerDoFuture()
doFuture::plan("multisession", workers = 8)
future
<- plyr::ldply(.data = fs::dir_ls("data/swisscom/",
grid regexp = "[0-9][.]json$"),
.fun = read_tiles,
.id = NULL,
.parallel = TRUE) %>%
as_tibble() %>%
distinct()
Focusing on test area of Bern city centre and selected suburbs, including postal codes:
plz <character>
# total N=9916 valid N=9916 mean=3063.57 sd=56.61
Value | N | Raw % | Valid % | Cum. %
---------------------------------------
3005 | 198 | 2.00 | 2.00 | 2.00
3006 | 604 | 6.09 | 6.09 | 8.09
3007 | 254 | 2.56 | 2.56 | 10.65
3008 | 445 | 4.49 | 4.49 | 15.14
3010 | 28 | 0.28 | 0.28 | 15.42
3011 | 138 | 1.39 | 1.39 | 16.81
3012 | 581 | 5.86 | 5.86 | 22.67
3013 | 176 | 1.77 | 1.77 | 24.45
3014 | 366 | 3.69 | 3.69 | 28.14
3018 | 590 | 5.95 | 5.95 | 34.09
3027 | 720 | 7.26 | 7.26 | 41.35
3037 | 226 | 2.28 | 2.28 | 43.63
3047 | 237 | 2.39 | 2.39 | 46.02
3066 | 795 | 8.02 | 8.02 | 54.03
3072 | 696 | 7.02 | 7.02 | 61.05
3073 | 509 | 5.13 | 5.13 | 66.19
3074 | 389 | 3.92 | 3.92 | 70.11
3084 | 526 | 5.30 | 5.30 | 75.41
3095 | 152 | 1.53 | 1.53 | 76.95
3097 | 182 | 1.84 | 1.84 | 78.78
3098 | 1107 | 11.16 | 11.16 | 89.95
3202 | 997 | 10.05 | 10.05 | 100.00
<NA> | 0 | 0.00 | <NA> | <NA>
Points of grid were defined using lower left corner coordinates. They were also shifted by 50m east and north to better align with grids.
<- grid %>%
grid_sf st_as_sf(coords = c("ll.x", "ll.y"),
crs = 4326,
remove = TRUE) %>%
st_transform(21781) %>%
mutate(x = st_coordinates(.)[, 1],
y = st_coordinates(.)[, 2]) %>%
select(-ur.x, -ur.y)
# shifting by 50m to the centre
<- grid_sf %>%
grid_sf_50 st_drop_geometry() %>%
mutate(x = as.integer(as.integer(x) + 51), # why on earth 1?
y = as.integer(as.integer(y) + 50)) %>%
st_as_sf(coords = c("x", "y"),
crs = 21781,
remove = FALSE)
Grid derived with swisscom offset
swisscom points were linked to country grid derived in file
01.Rmd
providing access to crucial tile ID
variable needed to link to the Heatmap API outputs.
<-
bern_plz read_rds("data/grid/country.Rds") %>%
st_join(grid_sf_50,
left = FALSE)
write_rds(bern_plz, "data/grid/bern_plz.Rds")
Study area coverage
Duplicate cells
There are some cells in the grid that are duplicated because they overlap two (or more?) PLZs and were returned twice.
x <lgl>
# total N=9916 valid N=9916 mean=0.10 sd=0.29
Value | N | Raw % | Valid % | Cum. %
---------------------------------------
FALSE | 8968 | 90.44 | 90.44 | 90.44
TRUE | 948 | 9.56 | 9.56 | 100.00
<NA> | 0 | 0.00 | <NA> | <NA>
Example:
They do have unique ID so can easily be excluded in order to create correct visualizations (see issue #8). However analyses that would be based on PLZs, particularly aggregation of data would have to determine correct assignment of grid cells to PLZs. Perhaps by using (pop weighted?) centroid or sth similar?
Environment
Analyses were conducted using the R Statistical language (version 4.2.0; R Core Team, 2022) on Windows 10 x64 (build 18363), using the packages ggplot2 (version 3.3.6; Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.), jsonlite (version 1.8.0; Jeroen Ooms, 2014), sf (version 1.0.7; Pebesma, 2018. Simple Features for R: Standardized Support for Spatial Vector Data. The R Journal 10, 1), pacman (version 0.5.1; Rinker et al., 2017), tidyverse (version 1.3.1; Wickham et al., 2019), dplyr (version 1.0.9; NA), forcats (version 0.5.1; NA), magrittr (version 2.0.3; NA), purrr (version 0.3.4; NA), readr (version 2.1.2; NA), scales (version 1.2.0; NA), stringr (version 1.4.0; NA), tibble (version 3.1.7; NA), tidyr (version 1.2.0; NA) and tmap (version 3.3.3; NA).
References
- H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.
- Jeroen Ooms (2014). The jsonlite Package: A Practical and Consistent Mapping Between JSON Data and R Objects. arXiv:1403.2805 [stat.CO] URL https://arxiv.org/abs/1403.2805.
- Pebesma, E., 2018. Simple Features for R: Standardized Support for Spatial Vector Data. The R Journal 10 (1), 439-446, https://doi.org/10.32614/RJ-2018-009
- R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
- Rinker, T. W. & Kurkiewicz, D. (2017). pacman: Package Management for R. version 0.5.0. Buffalo, New York. http://github.com/trinker/pacman
- Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686
- NA
- NA
- NA
- NA
- NA
- NA
- NA
- NA
- NA
- NA