Analyses of swisscom data

API data

Data

Grid

Testing on several PLZs of Bern & surroundings.

bern_plz <- read_rds("data/grid/bern_plz.Rds") %>% 
  # duplicates are out for visualizations! 
  # fix if PLZ is needed more precisely!
  distinct(tileId, .keep_all = TRUE) %>% 
  rename(tile_id = tileId)

bern_plz_ct <- st_centroid(bern_plz)

Tiles per PLZ:

x <character> 
# total N=8968 valid N=8968 mean=3064.65 sd=57.75

Value |    N | Raw % | Valid % | Cum. %
---------------------------------------
 3005 |  198 |  2.21 |    2.21 |   2.21
 3006 |  583 |  6.50 |    6.50 |   8.71
 3007 |  227 |  2.53 |    2.53 |  11.24
 3008 |  406 |  4.53 |    4.53 |  15.77
 3010 |    6 |  0.07 |    0.07 |  15.83
 3011 |   82 |  0.91 |    0.91 |  16.75
 3012 |  535 |  5.97 |    5.97 |  22.71
 3013 |  130 |  1.45 |    1.45 |  24.16
 3014 |  291 |  3.24 |    3.24 |  27.41
 3018 |  562 |  6.27 |    6.27 |  33.68
 3027 |  655 |  7.30 |    7.30 |  40.98
 3037 |  196 |  2.19 |    2.19 |  43.16
 3047 |  199 |  2.22 |    2.22 |  45.38
 3066 |  795 |  8.86 |    8.86 |  54.25
 3072 |  617 |  6.88 |    6.88 |  61.13
 3073 |  467 |  5.21 |    5.21 |  66.34
 3074 |  327 |  3.65 |    3.65 |  69.98
 3084 |  473 |  5.27 |    5.27 |  75.26
 3095 |  122 |  1.36 |    1.36 |  76.62
 3097 |  123 |  1.37 |    1.37 |  77.99
 3098 | 1003 | 11.18 |   11.18 |  89.17
 3202 |  971 | 10.83 |   10.83 | 100.00
 <NA> |    0 |  0.00 |    <NA> |   <NA>

Dwell density

Data is from http://mip.swisscom.ch which swisscom describes as:

Our new API platform offering 3 endpoints focusing on density, dwell times and origin destination

Important note: free data is limited to 2020-01-27 only!

We are using Heatmaps API to retrieve daily and hourly dwell times for one postcode. Code to retrieve data, kindly provided by Yann Steimer from swisscom, is in example_notebook_SC_heatmaps_API_UNIBE.ipynb.

Daily dwell density

read_day <- function(filename) {
  
  data <- readr::read_delim(filename, 
                            delim = ";", escape_double = FALSE, trim_ws = TRUE,
                            show_col_types = FALSE) %>% 
    dplyr::select(tile_id, time, score) %>% 
    dplyr::as_tibble()
  
  data$plz <- gsub("_day|.csv", "", filename)
  data$plz <- gsub("data/swisscom/", "", data$plz)
  
  return(data)
}

doFuture::registerDoFuture()
future::plan("multisession", workers = 8)

data_day <- plyr::ldply(.data = fs::dir_ls("data/swisscom/", 
                                           regexp = "[0-9]_day[.]csv$"),
                        .fun = read_day,
                        .id = NULL,
                        .parallel = TRUE) %>% 
  as_tibble() %>% 
  distinct(tile_id, time, .keep_all = TRUE)

Hourly dwell density

read_hour <- function(filename) {
  
  data <- readr::read_delim(filename, 
                            delim = ";", escape_double = FALSE, trim_ws = TRUE,
                            show_col_types = FALSE) %>% 
    dplyr::select(tile_id, time, score) %>% 
    dplyr::as_tibble()
  
  data$plz <- gsub("_hour|.csv", "", filename)
  data$plz <- gsub("data/swisscom/", "", data$plz)
  
  return(data)
}

data_hour <- plyr::ldply(.data = fs::dir_ls("data/swisscom/", 
                                            regexp = "[0-9]_hour[.]csv$"),
                         .fun = read_hour,
                         .id = NULL,
                         .parallel = TRUE) %>% 
  as_tibble() %>% 
  distinct(tile_id, time, .keep_all = TRUE)

EDA

Daily

bern_plz_day <- bern_plz %>% 
  left_join(data_day %>% select(-time))

Hourly

bern_plz_hour <- bern_plz %>% 
  left_join(data_hour)

Averages per postcode:

Detailed, individual cell lines by postcode:

Spatial distribution comparing 4AM and 3PM:

Environment

Analyses were conducted using the R Statistical language (version 4.2.0; R Core Team, 2022) on Windows 10 x64 (build 18363), using the packages lubridate (version 1.8.0; Garrett Grolemund, Hadley Wickham, 2011), ggplot2 (version 3.3.6; Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.), sf (version 1.0.7; Pebesma, 2018. Simple Features for R: Standardized Support for Spatial Vector Data. The R Journal 10, 1), pacman (version 0.5.1; Rinker et al., 2017), tidyverse (version 1.3.1; Wickham et al., 2019), dplyr (version 1.0.9; NA), forcats (version 0.5.1; NA), magrittr (version 2.0.3; NA), purrr (version 0.3.4; NA), readr (version 2.1.2; NA), scales (version 1.2.0; NA), stringr (version 1.4.0; NA), tibble (version 3.1.7; NA), tidyr (version 1.2.0; NA) and tmap (version 3.3.3; NA).

References

  • Garrett Grolemund, Hadley Wickham (2011). Dates and Times Made Easy with lubridate. Journal of Statistical Software, 40(3), 1-25. URL https://www.jstatsoft.org/v40/i03/.
  • H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.
  • Pebesma, E., 2018. Simple Features for R: Standardized Support for Spatial Vector Data. The R Journal 10 (1), 439-446, https://doi.org/10.32614/RJ-2018-009
  • R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
  • Rinker, T. W. & Kurkiewicz, D. (2017). pacman: Package Management for R. version 0.5.0. Buffalo, New York. http://github.com/trinker/pacman
  • Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686
  • NA
  • NA
  • NA
  • NA
  • NA
  • NA
  • NA
  • NA
  • NA
  • NA