1 Ferry passengers

1.1 Time series

Ferry passengers, one full week, aggregated hourly to match google places

1.2 Distribution of counts

Frequency of counts from one week

Note: not all hours are represented; missing values are filled with zeroes below.

2 Google places

Popular times, all values from campus with ferry terminal highlighted

3 EDA

3.1 Stocks at the terminal

3.1.1 Data

3.1.2 Distribution of counts

Frequency of counts from one week

ferry_poi <- fitdist(uq_ferry_places_agg$count, "pois")
ferry_nb <- fitdist(uq_ferry_places_agg$count, "nbinom")

par(mfrow = c(1,2))
denscomp(list(ferry_poi, ferry_nb), legendtext = c("Poisson", "negative binomial"), fitlty = 1)
cdfcomp(list(ferry_poi, ferry_nb), legendtext = c("Poisson", "negative binomial"), fitlty = 1)

gofstat(list(ferry_poi, ferry_nb), fitnames = c("Poisson", "negative binomial"))
Chi-squared statistic:  Inf 70.82962 
Degree of freedom of the Chi-squared distribution:  9 8 
Chi-squared p-value:  0 0.000000000003359121 
   the p-value may be wrong with some theoretical counts < 5  
Chi-squared table:
       obscounts                     theo Poisson theo negative binomial
<= 0          41   0.0000000000000000000002641408              27.381911
<= 9          13   0.0000000000038717541059815836              41.302870
<= 14         12   0.0000000089103846541210810503              10.448500
<= 26         12   0.0019670000701348363050802170              17.076772
<= 45         12  17.0636425431348008885379385902              17.060045
<= 63         12 130.4951840071383628583134850487              10.894257
<= 81         13  20.3784755604183658306283177808               8.050143
<= 97         12   0.0607150499597270965068673831               5.585067
<= 109        13   0.0000158242374750372505332052               3.477690
<= 139        12   0.0000000061268750073395494837               6.769476
> 139         16   0.0000000000000000000000000000              19.953269

Goodness-of-fit criteria
                                Poisson negative binomial
Akaike's Information Criterion 11566.73          1580.991
Bayesian Information Criterion 11569.85          1587.239

3.2 Departures

3.2.1 Data

3.2.2 Distribution of counts

Frequency of counts from one week

# Best distribution:
ferry_poi <- fitdist(uq_ferry_places_dep$count, "pois")
ferry_nb <- fitdist(uq_ferry_places_dep$count, "nbinom")

par(mfrow = c(1,2))
denscomp(list(ferry_poi, ferry_nb), legendtext = c("Poisson", "negative binomial"), fitlty = 1)
cdfcomp(list(ferry_poi, ferry_nb), legendtext = c("Poisson", "negative binomial"), fitlty = 1)

gofstat(list(ferry_poi, ferry_nb), fitnames = c("Poisson", "negative binomial"))
Chi-squared statistic:  Inf 17.58816 
Degree of freedom of the Chi-squared distribution:  9 8 
Chi-squared p-value:  0 0.02453515 
   the p-value may be wrong with some theoretical counts < 5  
Chi-squared table:
       obscounts         theo Poisson theo negative binomial
<= 0          44  0.00000000001590908              36.240451
<= 3          16  0.00000007913613795              26.656444
<= 8          13  0.00034672032032836              20.244097
<= 14         13  0.15535790125945512              14.729387
<= 20         12  5.79898663386964941              10.396261
<= 27         12 50.09845701895704906               9.156764
<= 35         12 85.59695394811666347               8.039414
<= 55         12 26.34752645359362688              13.641054
<= 74         12  0.00237124410908862               8.162125
<= 101        12  0.00000000062209171               7.354514
> 101         10  0.00000000000000000              13.379490

Goodness-of-fit criteria
                                Poisson negative binomial
Akaike's Information Criterion 8440.666          1372.212
Bayesian Information Criterion 8443.790          1378.460

3.3 Arrivals

3.3.1 Data

3.3.2 Distribution of counts

Frequency of counts from one week

# Best distribution:
ferry_poi <- fitdist(uq_ferry_places_arr$count, "pois")
ferry_nb <- fitdist(uq_ferry_places_arr$count, "nbinom")

par(mfrow = c(1,2))
denscomp(list(ferry_poi, ferry_nb), legendtext = c("Poisson", "negative binomial"), fitlty = 1)
cdfcomp(list(ferry_poi, ferry_nb), legendtext = c("Poisson", "negative binomial"), fitlty = 1)

gofstat(list(ferry_poi, ferry_nb), fitnames = c("Poisson", "negative binomial"))
Chi-squared statistic:  Inf 25.19206 
Degree of freedom of the Chi-squared distribution:  9 8 
Chi-squared p-value:  0 0.001442216 
   the p-value may be wrong with some theoretical counts < 5  
Chi-squared table:
       obscounts       theo Poisson theo negative binomial
<= 0          47  0.000000002789329              35.952304
<= 5          13  0.000271157958291              38.661476
<= 8          13  0.014086265061311              11.449308
<= 12         14  0.568177072496587              11.263440
<= 18         12 15.851226877138243              12.322156
<= 24         13 65.490918145396066               9.157113
<= 32         12 74.907731966976058               9.236482
<= 42         12 11.070122067171795               8.517029
<= 63         12  0.097466438442975              11.593794
<= 109        12  0.000000006569350              11.814118
> 109          8  0.000000000000000               8.032780

Goodness-of-fit criteria
                                Poisson negative binomial
Akaike's Information Criterion 6960.914          1331.073
Bayesian Information Criterion 6964.038          1337.321

4 GAMs

4.1 Stocks at the terminal

4.1.1 Time only

mod1 <- gam(count ~ 
              s(time, k = 24, bs = "cr"), 
            family = nb(), 
            data = uq_ferry_places_agg)

4.1.2 Two time smoothers

mod2 <- gam(count ~ 
              s(time, k = 24, bs = "cr") +
              s(hod, k = 12, bs = "cc"), 
            family = nb(), 
            data = uq_ferry_places_agg)

4.1.3 Time smoother & popularity

mod3 <- gam(count ~ popularity + 
              s(time, k = 24, bs = "cr"), 
            family = nb(), 
            data = uq_ferry_places_agg)

4.1.4 Two time smoothers & popularity

mod4 <- gam(count ~ s(popularity, k = 6, bs = "cr") + 
              s(time, k = 24, bs = "cr") +
              s(hod, k = 12, bs = "cc"), 
            family = nb(), 
            data = uq_ferry_places_agg)

4.1.5 Comparing

4.1.6 mod4

Summary


Family: Negative Binomial(9.351) 
Link function: log 

Formula:
count ~ s(popularity, k = 6, bs = "cr") + s(time, k = 24, bs = "cr") + 
    s(hod, k = 12, bs = "cc")

Parametric coefficients:
            Estimate Std. Error z value            Pr(>|z|)    
(Intercept)  3.83229    0.03568   107.4 <0.0000000000000002 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
                edf Ref.df Chi.sq              p-value    
s(popularity) 3.543  4.191  27.50            0.0000227 ***
s(time)       3.302  4.100  55.35 < 0.0000000000000002 ***
s(hod)        9.031 10.000 196.98 < 0.0000000000000002 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.784   Deviance explained =   86%
-REML = 573.02  Scale est. = 1         n = 127

Fitted values

4.2 Departures

4.2.1 Time only

mod1 <- gam(count ~ 
              s(time, k = 24, bs = "cr"), 
            family = nb(), 
            data = uq_ferry_places_dep)

4.2.2 Two time smoothers

mod2 <- gam(count ~ 
              s(time, k = 24, bs = "cr") +
              s(hod, k = 12, bs = "cc"), 
            family = nb(), 
            data = uq_ferry_places_dep)

4.2.3 Time & popularity

mod3 <- gam(count ~ popularity + 
              s(time, k = 24, bs = "cr"), 
            family = nb(), 
            data = uq_ferry_places_dep)

4.2.4 Two time smoothers & popularity

mod4 <- gam(count ~ s(popularity, k = 6, bs = "cr") + 
              s(time, k = 24, bs = "cr") +
              s(hod, k = 12, bs = "cc"),
            family = nb(), 
            data = uq_ferry_places_dep)

4.2.5 Comparing

4.2.6 mod4

Summary


Family: Negative Binomial(10.376) 
Link function: log 

Formula:
count ~ s(popularity, k = 6, bs = "cr") + s(time, k = 24, bs = "cr") + 
    s(hod, k = 12, bs = "cc")

Parametric coefficients:
            Estimate Std. Error z value            Pr(>|z|)    
(Intercept)   3.0260     0.0417   72.57 <0.0000000000000002 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
                edf Ref.df Chi.sq              p-value    
s(popularity) 1.001  1.001  33.53 < 0.0000000000000002 ***
s(time)       3.033  3.772  30.01           0.00000536 ***
s(hod)        7.853 10.000 267.89 < 0.0000000000000002 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.826   Deviance explained = 90.7%
-REML = 464.16  Scale est. = 1         n = 127

Fitted values

4.3 Arrivals

4.3.1 Time only

mod1 <- gam(count ~ 
              s(time, k = 24, bs = "cr"), 
            family = nb(), 
            data = uq_ferry_places_arr)

4.3.2 Two time smoothers

mod2 <- gam(count ~ 
              s(time, k = 24, bs = "cr") +
              s(hod, k = 12, bs = "cc"), 
            family = nb(), 
            data = uq_ferry_places_arr)

4.3.3 Time & popularity

mod3 <- gam(count ~ popularity + 
              s(time, k = 24, bs = "cr"), 
            family = nb(), 
            data = uq_ferry_places_arr)

4.3.4 Two time smoothers & popularity

mod4 <- gam(count ~ s(popularity, k = 6, bs = "cr") + 
              s(time, k = 24, bs = "cr") +
              s(hod, k = 12, bs = "cc"),
            family = nb(), 
            data = uq_ferry_places_arr)

4.3.5 Comparing

4.3.6 mod4

Summary


Family: Negative Binomial(11.496) 
Link function: log 

Formula:
count ~ s(popularity, k = 6, bs = "cr") + s(time, k = 24, bs = "cr") + 
    s(hod, k = 12, bs = "cc")

Parametric coefficients:
            Estimate Std. Error z value            Pr(>|z|)    
(Intercept)  2.93214    0.03875   75.68 <0.0000000000000002 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
                 edf Ref.df Chi.sq             p-value    
s(popularity)  3.327  3.981  15.85              0.0033 ** 
s(time)       17.132 19.284 138.31 <0.0000000000000002 ***
s(hod)         9.228 10.000 148.80 <0.0000000000000002 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.833   Deviance explained = 89.5%
-REML =  500.1  Scale est. = 1         n = 127

Fitted values

5 INLA

5.1 Setup

# ####
# Jeffreys prior 
a1 <- 5e-5
b1 <- 5e-5
lgprior1 <- list(prec = list(param = c(a1, b1)))

# Gelman prior
a2 <- -0.5
b2 <- 5e-5
lgprior2 <- list(prec = list(param = c(a2, b2)))

# iid prior 
# Schrödle & Held 2010 & Blangiardo et al 2013
a0 <- 1
b0 <- 0.1
prior.nu <- list(prec = list(param = c(a0, b0)))

# intercept & fixed
inla.set.control.fixed.default() 

# intercept ~ N(0,0) 
# other fixed effects ~ N(0, 0.001) 
# 
# where the format is N(mean, precision) 
# precision = inverse of the variance. 

# PC prior
U <- 1
hyper.prec = list(theta = list(
  prior = "pc.prec",
  param = c(U, 0.01)
))

# scaling
inla.setOption(scale.model.default = TRUE)

5.2 Stocks at the terminal

5.2.1 Two time smoothers (rw1)

mod1 <- inla(count ~ 
               f(time, model = "rw1", hyper = hyper.prec) + 
               f(hod, model = "rw1", hyper = hyper.prec), 
             family = "nbinomial", 
             control.compute = list(dic= TRUE, waic = TRUE),  
             data = uq_ferry_places_agg)

5.2.2 Two time smoothers (rw2)

mod2 <- inla(count ~ 
               f(time, model = "rw2", hyper = hyper.prec) + 
               f(hod, model = "rw2", hyper = hyper.prec), 
             family = "nbinomial", 
             control.compute = list(dic= TRUE, waic = TRUE),  
             data = uq_ferry_places_agg)

5.2.3 Two time smoothers + smooth covariate (rw1)

mod3 <- inla(count ~ 
               f(popularity, model = "rw1", hyper = hyper.prec) + 
               f(time, model = "rw1", hyper = hyper.prec) + 
               f(hod, model = "rw1", hyper = hyper.prec), 
             family = "nbinomial", 
             control.compute = list(dic= TRUE, waic = TRUE),  
             data = uq_ferry_places_agg)

5.2.4 Two time smoothers + smooth covariate (rw2)

mod4 <- inla(count ~ 
               f(popularity, model = "rw2", hyper = hyper.prec) + 
               f(time, model = "rw2", hyper = hyper.prec) + 
               f(hod, model = "rw2", hyper = hyper.prec), 
             family = "nbinomial", 
             control.compute = list(dic= TRUE, waic = TRUE),  
             data = uq_ferry_places_agg)

5.2.5 Comparing

5.2.6 mod3

Summary


Call:
   c("inla.core(formula = formula, family = family, contrasts = contrasts, 
   ", " data = data, quantiles = quantiles, E = E, offset = offset, ", " 
   scale = scale, weights = weights, Ntrials = Ntrials, strata = strata, 
   ", " lp.scale = lp.scale, link.covariates = link.covariates, verbose = 
   verbose, ", " lincomb = lincomb, selection = selection, control.compute 
   = control.compute, ", " control.predictor = control.predictor, 
   control.family = control.family, ", " control.inla = control.inla, 
   control.fixed = control.fixed, ", " control.mode = control.mode, 
   control.expert = control.expert, ", " control.hazard = control.hazard, 
   control.lincomb = control.lincomb, ", " control.update = 
   control.update, control.lp.scale = control.lp.scale, ", " 
   control.pardiso = control.pardiso, only.hyperparam = only.hyperparam, 
   ", " inla.call = inla.call, inla.arg = inla.arg, num.threads = 
   num.threads, ", " blas.num.threads = blas.num.threads, keep = keep, 
   working.directory = working.directory, ", " silent = silent, inla.mode 
   = inla.mode, safe = FALSE, debug = debug, ", " .parent.frame = 
   .parent.frame)") 
Time used:
    Pre = 0.599, Running = 0.914, Post = 0.209, Total = 1.72 
Fixed effects:
             mean    sd 0.025quant 0.5quant 0.975quant  mode kld
(Intercept) 2.479 0.094      2.276    2.485      2.646 2.497   0

Random effects:
  Name    Model
    popularity RW1 model
   time RW1 model
   hod RW1 model

Model hyperparameters:
                                                           mean        sd
size for the nbinomial observations (1/overdispersion) 3413.935 22058.583
Precision for popularity                               2490.046 23009.094
Precision for time                                        0.444     0.091
Precision for hod                                         0.349     0.105
                                                       0.025quant 0.5quant
size for the nbinomial observations (1/overdispersion)     77.434  678.134
Precision for popularity                                   32.643  365.712
Precision for time                                          0.288    0.436
Precision for hod                                           0.185    0.335
                                                       0.975quant    mode
size for the nbinomial observations (1/overdispersion)  22277.593 163.599
Precision for popularity                                16304.439  71.686
Precision for time                                          0.646   0.421
Precision for hod                                           0.594   0.309

Deviance Information Criterion (DIC) ...............: 1021.64
Deviance Information Criterion (DIC, saturated) ....: 1050.28
Effective number of parameters .....................: 100.75

Watanabe-Akaike information criterion (WAIC) ...: 1007.81
Effective number of parameters .................: 65.35

Marginal log-Likelihood:  -949.90 
 is computed 
Posterior summaries for the linear predictor and the fitted values are computed
(Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')

Fitted values

5.3 Departures

5.3.1 Two time smoothers (rw1)

mod1 <- inla(count ~ 
               f(time, model = "rw1", hyper = hyper.prec) + 
               f(hod, model = "rw1", hyper = hyper.prec), 
             family = "nbinomial", 
             control.compute = list(dic= TRUE, waic = TRUE),  
             data = uq_ferry_places_dep)

5.3.2 Two time smoothers (rw2)

mod2 <- inla(count ~ 
               f(time, model = "rw2", hyper = hyper.prec) + 
               f(hod, model = "rw2", hyper = hyper.prec), 
             family = "nbinomial", 
             control.compute = list(dic= TRUE, waic = TRUE),  
             data = uq_ferry_places_dep)

5.3.3 Two time smoothers + smooth covariate (rw1)

mod3 <- inla(count ~ 
               f(popularity, model = "rw1", hyper = hyper.prec) + 
               f(time, model = "rw1", hyper = hyper.prec) + 
               f(hod, model = "rw1", hyper = hyper.prec), 
             family = "nbinomial", 
             control.compute = list(dic= TRUE, waic = TRUE),  
             data = uq_ferry_places_dep)

5.3.4 Two time smoothers + smooth covariate (rw2)

mod4 <- inla(count ~ 
               f(popularity, model = "rw2", hyper = hyper.prec) + 
               f(time, model = "rw2", hyper = hyper.prec) + 
               f(hod, model = "rw2", hyper = hyper.prec), 
             family = "nbinomial", 
             control.compute = list(dic= TRUE, waic = TRUE),  
             data = uq_ferry_places_dep)

5.3.5 Comparing

5.3.6 mod3

Summary


Call:
   c("inla.core(formula = formula, family = family, contrasts = contrasts, 
   ", " data = data, quantiles = quantiles, E = E, offset = offset, ", " 
   scale = scale, weights = weights, Ntrials = Ntrials, strata = strata, 
   ", " lp.scale = lp.scale, link.covariates = link.covariates, verbose = 
   verbose, ", " lincomb = lincomb, selection = selection, control.compute 
   = control.compute, ", " control.predictor = control.predictor, 
   control.family = control.family, ", " control.inla = control.inla, 
   control.fixed = control.fixed, ", " control.mode = control.mode, 
   control.expert = control.expert, ", " control.hazard = control.hazard, 
   control.lincomb = control.lincomb, ", " control.update = 
   control.update, control.lp.scale = control.lp.scale, ", " 
   control.pardiso = control.pardiso, only.hyperparam = only.hyperparam, 
   ", " inla.call = inla.call, inla.arg = inla.arg, num.threads = 
   num.threads, ", " blas.num.threads = blas.num.threads, keep = keep, 
   working.directory = working.directory, ", " silent = silent, inla.mode 
   = inla.mode, safe = FALSE, debug = debug, ", " .parent.frame = 
   .parent.frame)") 
Time used:
    Pre = 0.551, Running = 0.793, Post = 0.203, Total = 1.55 
Fixed effects:
             mean    sd 0.025quant 0.5quant 0.975quant  mode kld
(Intercept) 1.775 0.117      1.524    1.783      1.984 1.798   0

Random effects:
  Name    Model
    popularity RW1 model
   time RW1 model
   hod RW1 model

Model hyperparameters:
                                                         mean     sd 0.025quant
size for the nbinomial observations (1/overdispersion) 28.850 12.445     11.163
Precision for popularity                               36.372 29.556      7.429
Precision for time                                      3.650  2.235      1.089
Precision for hod                                       0.412  0.129      0.215
                                                       0.5quant 0.975quant
size for the nbinomial observations (1/overdispersion)   26.678     59.293
Precision for popularity                                 28.118    114.281
Precision for time                                        3.089      9.523
Precision for hod                                         0.393      0.718
                                                         mode
size for the nbinomial observations (1/overdispersion) 22.622
Precision for popularity                               17.434
Precision for time                                      2.281
Precision for hod                                       0.357

Deviance Information Criterion (DIC) ...............: 897.12
Deviance Information Criterion (DIC, saturated) ....: 651.05
Effective number of parameters .....................: 48.14

Watanabe-Akaike information criterion (WAIC) ...: 899.27
Effective number of parameters .................: 40.68

Marginal log-Likelihood:  -825.66 
 is computed 
Posterior summaries for the linear predictor and the fitted values are computed
(Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')

Fitted values

5.4 Arrivals

5.4.1 Two time smoothers (rw1)

mod1 <- inla(count ~ 
               f(time, model = "rw1", hyper = hyper.prec) + 
               f(hod, model = "rw1", hyper = hyper.prec), 
             family = "nbinomial", 
             control.compute = list(dic= TRUE, waic = TRUE),  
             data = uq_ferry_places_arr)

5.4.2 Two time smoothers (rw2)

mod2 <- inla(count ~ 
               f(time, model = "rw2", hyper = hyper.prec) + 
               f(hod, model = "rw2", hyper = hyper.prec), 
             family = "nbinomial", 
             control.compute = list(dic= TRUE, waic = TRUE),  
             data = uq_ferry_places_arr)

5.4.3 Two time smoothers + smooth covariate (rw1)

5.4.4 Two time smoothers + smooth covariate (rw2)

mod4 <- inla(count ~ 
               f(popularity, model = "rw2", hyper = hyper.prec) + 
               f(time, model = "rw2", hyper = hyper.prec) + 
               f(hod, model = "rw2", hyper = hyper.prec), 
             family = "nbinomial", 
             control.compute = list(dic= TRUE, waic = TRUE),  
             data = uq_ferry_places_arr)

5.4.5 Comparing

5.4.6 mod3

Summary

Fitted values

P3_fit <- mod3$summary.fitted.values

P3_fit$x <- uq_ferry_places_arr$time
P3_fit$y <- uq_ferry_places_arr$count

ggplot(P3_fit, aes(x = x)) +
  geom_ribbon(aes(ymin = `0.025quant`, ymax = `0.975quant`), alpha=0.3) +
  geom_line(aes(y = `0.5quant`), color = "purple") +
  geom_point(aes(y = y), alpha = 0.5) +
  theme_minimal() + ylab("Counts") + xlab("Time")