Analyses of swisscom data

Grid extras

Swisscom grid coordinates & IDs

Tile definitions were pulled from API using query_swisscom_heatmaps_api.py.

read_tiles <- function(filename) {
  
  data <- jsonlite::fromJSON(filename)
  data <- jsonlite::flatten(data$tiles) %>% 
    dplyr::as_tibble()
  
  data$plz <- gsub("grid_|.json", "", filename)
  data$plz <- gsub("data/swisscom/", "", data$plz)
  
  return( data )
}

doFuture::registerDoFuture()
future::plan("multisession", workers = 8)

grid <- plyr::ldply(.data = fs::dir_ls("data/swisscom/", 
                                       regexp = "[0-9][.]json$"),
                    .fun = read_tiles,
                    .id = NULL,
                    .parallel = TRUE) %>% 
  as_tibble() %>% 
  distinct()

Focusing on test area of Bern city centre and selected suburbs, including postal codes:

plz <character> 
# total N=9916 valid N=9916 mean=3063.57 sd=56.61

Value |    N | Raw % | Valid % | Cum. %
---------------------------------------
 3005 |  198 |  2.00 |    2.00 |   2.00
 3006 |  604 |  6.09 |    6.09 |   8.09
 3007 |  254 |  2.56 |    2.56 |  10.65
 3008 |  445 |  4.49 |    4.49 |  15.14
 3010 |   28 |  0.28 |    0.28 |  15.42
 3011 |  138 |  1.39 |    1.39 |  16.81
 3012 |  581 |  5.86 |    5.86 |  22.67
 3013 |  176 |  1.77 |    1.77 |  24.45
 3014 |  366 |  3.69 |    3.69 |  28.14
 3018 |  590 |  5.95 |    5.95 |  34.09
 3027 |  720 |  7.26 |    7.26 |  41.35
 3037 |  226 |  2.28 |    2.28 |  43.63
 3047 |  237 |  2.39 |    2.39 |  46.02
 3066 |  795 |  8.02 |    8.02 |  54.03
 3072 |  696 |  7.02 |    7.02 |  61.05
 3073 |  509 |  5.13 |    5.13 |  66.19
 3074 |  389 |  3.92 |    3.92 |  70.11
 3084 |  526 |  5.30 |    5.30 |  75.41
 3095 |  152 |  1.53 |    1.53 |  76.95
 3097 |  182 |  1.84 |    1.84 |  78.78
 3098 | 1107 | 11.16 |   11.16 |  89.95
 3202 |  997 | 10.05 |   10.05 | 100.00
 <NA> |    0 |  0.00 |    <NA> |   <NA>

Points of grid were defined using lower left corner coordinates. They were also shifted by 50m east and north to better align with grids.

grid_sf <- grid %>% 
  st_as_sf(coords = c("ll.x", "ll.y"), 
           crs = 4326,
           remove = TRUE) %>% 
  st_transform(21781) %>% 
  mutate(x = st_coordinates(.)[, 1],
         y = st_coordinates(.)[, 2]) %>% 
  select(-ur.x, -ur.y)

# shifting by 50m to the centre
grid_sf_50 <- grid_sf %>% 
  st_drop_geometry() %>% 
  mutate(x = as.integer(as.integer(x) + 51), # why on earth 1?
         y = as.integer(as.integer(y) + 50)) %>% 
  st_as_sf(coords = c("x", "y"), 
           crs = 21781,
           remove = FALSE) 

Grid derived with swisscom offset

swisscom points were linked to country grid derived in file 01.Rmd providing access to crucial tile ID variable needed to link to the Heatmap API outputs.

bern_plz <- 
  read_rds("data/grid/country.Rds") %>% 
  st_join(grid_sf_50,
          left = FALSE)

write_rds(bern_plz, "data/grid/bern_plz.Rds")

Study area coverage

Duplicate cells

There are some cells in the grid that are duplicated because they overlap two (or more?) PLZs and were returned twice.

x <lgl> 
# total N=9916 valid N=9916 mean=0.10 sd=0.29

Value |    N | Raw % | Valid % | Cum. %
---------------------------------------
FALSE | 8968 | 90.44 |   90.44 |  90.44
TRUE  |  948 |  9.56 |    9.56 | 100.00
<NA>  |    0 |  0.00 |    <NA> |   <NA>

Example:

They do have unique ID so can easily be excluded in order to create correct visualizations (see issue #8). However analyses that would be based on PLZs, particularly aggregation of data would have to determine correct assignment of grid cells to PLZs. Perhaps by using (pop weighted?) centroid or sth similar?

Environment

Analyses were conducted using the R Statistical language (version 4.2.0; R Core Team, 2022) on Windows 10 x64 (build 18363), using the packages ggplot2 (version 3.3.6; Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.), jsonlite (version 1.8.0; Jeroen Ooms, 2014), sf (version 1.0.7; Pebesma, 2018. Simple Features for R: Standardized Support for Spatial Vector Data. The R Journal 10, 1), pacman (version 0.5.1; Rinker et al., 2017), tidyverse (version 1.3.1; Wickham et al., 2019), dplyr (version 1.0.9; NA), forcats (version 0.5.1; NA), magrittr (version 2.0.3; NA), purrr (version 0.3.4; NA), readr (version 2.1.2; NA), scales (version 1.2.0; NA), stringr (version 1.4.0; NA), tibble (version 3.1.7; NA), tidyr (version 1.2.0; NA) and tmap (version 3.3.3; NA).

References

  • H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.
  • Jeroen Ooms (2014). The jsonlite Package: A Practical and Consistent Mapping Between JSON Data and R Objects. arXiv:1403.2805 [stat.CO] URL https://arxiv.org/abs/1403.2805.
  • Pebesma, E., 2018. Simple Features for R: Standardized Support for Spatial Vector Data. The R Journal 10 (1), 439-446, https://doi.org/10.32614/RJ-2018-009
  • R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
  • Rinker, T. W. & Kurkiewicz, D. (2017). pacman: Package Management for R. version 0.5.0. Buffalo, New York. http://github.com/trinker/pacman
  • Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686
  • NA
  • NA
  • NA
  • NA
  • NA
  • NA
  • NA
  • NA
  • NA
  • NA