Analyses of swisscom data
API data
Data
Grid
Testing on several PLZs of Bern & surroundings.
<- read_rds("data/grid/bern_plz.Rds") %>%
bern_plz # duplicates are out for visualizations!
# fix if PLZ is needed more precisely!
distinct(tileId, .keep_all = TRUE) %>%
rename(tile_id = tileId)
<- st_centroid(bern_plz) bern_plz_ct
Tiles per PLZ:
x <character>
# total N=8968 valid N=8968 mean=3064.65 sd=57.75
Value | N | Raw % | Valid % | Cum. %
---------------------------------------
3005 | 198 | 2.21 | 2.21 | 2.21
3006 | 583 | 6.50 | 6.50 | 8.71
3007 | 227 | 2.53 | 2.53 | 11.24
3008 | 406 | 4.53 | 4.53 | 15.77
3010 | 6 | 0.07 | 0.07 | 15.83
3011 | 82 | 0.91 | 0.91 | 16.75
3012 | 535 | 5.97 | 5.97 | 22.71
3013 | 130 | 1.45 | 1.45 | 24.16
3014 | 291 | 3.24 | 3.24 | 27.41
3018 | 562 | 6.27 | 6.27 | 33.68
3027 | 655 | 7.30 | 7.30 | 40.98
3037 | 196 | 2.19 | 2.19 | 43.16
3047 | 199 | 2.22 | 2.22 | 45.38
3066 | 795 | 8.86 | 8.86 | 54.25
3072 | 617 | 6.88 | 6.88 | 61.13
3073 | 467 | 5.21 | 5.21 | 66.34
3074 | 327 | 3.65 | 3.65 | 69.98
3084 | 473 | 5.27 | 5.27 | 75.26
3095 | 122 | 1.36 | 1.36 | 76.62
3097 | 123 | 1.37 | 1.37 | 77.99
3098 | 1003 | 11.18 | 11.18 | 89.17
3202 | 971 | 10.83 | 10.83 | 100.00
<NA> | 0 | 0.00 | <NA> | <NA>
Dwell density
Data is from http://mip.swisscom.ch which swisscom describes as:
Our new API platform offering 3 endpoints focusing on density, dwell times and origin destination
Important note: free data is limited to 2020-01-27 only!
We are using Heatmaps
API to retrieve daily and hourly dwell times for one postcode. Code
to retrieve data, kindly provided by Yann Steimer from swisscom, is in
example_notebook_SC_heatmaps_API_UNIBE.ipynb
.
Daily dwell density
<- function(filename) {
read_day
<- readr::read_delim(filename,
data delim = ";", escape_double = FALSE, trim_ws = TRUE,
show_col_types = FALSE) %>%
::select(tile_id, time, score) %>%
dplyr::as_tibble()
dplyr
$plz <- gsub("_day|.csv", "", filename)
data$plz <- gsub("data/swisscom/", "", data$plz)
data
return(data)
}
::registerDoFuture()
doFuture::plan("multisession", workers = 8)
future
<- plyr::ldply(.data = fs::dir_ls("data/swisscom/",
data_day regexp = "[0-9]_day[.]csv$"),
.fun = read_day,
.id = NULL,
.parallel = TRUE) %>%
as_tibble() %>%
distinct(tile_id, time, .keep_all = TRUE)
Hourly dwell density
<- function(filename) {
read_hour
<- readr::read_delim(filename,
data delim = ";", escape_double = FALSE, trim_ws = TRUE,
show_col_types = FALSE) %>%
::select(tile_id, time, score) %>%
dplyr::as_tibble()
dplyr
$plz <- gsub("_hour|.csv", "", filename)
data$plz <- gsub("data/swisscom/", "", data$plz)
data
return(data)
}
<- plyr::ldply(.data = fs::dir_ls("data/swisscom/",
data_hour regexp = "[0-9]_hour[.]csv$"),
.fun = read_hour,
.id = NULL,
.parallel = TRUE) %>%
as_tibble() %>%
distinct(tile_id, time, .keep_all = TRUE)
EDA
Daily
<- bern_plz %>%
bern_plz_day left_join(data_day %>% select(-time))
Hourly
<- bern_plz %>%
bern_plz_hour left_join(data_hour)
Averages per postcode:
Detailed, individual cell lines by postcode:
Spatial distribution comparing 4AM and 3PM:
Environment
Analyses were conducted using the R Statistical language (version 4.2.0; R Core Team, 2022) on Windows 10 x64 (build 18363), using the packages lubridate (version 1.8.0; Garrett Grolemund, Hadley Wickham, 2011), ggplot2 (version 3.3.6; Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.), sf (version 1.0.7; Pebesma, 2018. Simple Features for R: Standardized Support for Spatial Vector Data. The R Journal 10, 1), pacman (version 0.5.1; Rinker et al., 2017), tidyverse (version 1.3.1; Wickham et al., 2019), dplyr (version 1.0.9; NA), forcats (version 0.5.1; NA), magrittr (version 2.0.3; NA), purrr (version 0.3.4; NA), readr (version 2.1.2; NA), scales (version 1.2.0; NA), stringr (version 1.4.0; NA), tibble (version 3.1.7; NA), tidyr (version 1.2.0; NA) and tmap (version 3.3.3; NA).
References
- Garrett Grolemund, Hadley Wickham (2011). Dates and Times Made Easy with lubridate. Journal of Statistical Software, 40(3), 1-25. URL https://www.jstatsoft.org/v40/i03/.
- H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York, 2016.
- Pebesma, E., 2018. Simple Features for R: Standardized Support for Spatial Vector Data. The R Journal 10 (1), 439-446, https://doi.org/10.32614/RJ-2018-009
- R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
- Rinker, T. W. & Kurkiewicz, D. (2017). pacman: Package Management for R. version 0.5.0. Buffalo, New York. http://github.com/trinker/pacman
- Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686
- NA
- NA
- NA
- NA
- NA
- NA
- NA
- NA
- NA
- NA