Ferry passengers, one full week, aggregated hourly to match google places
Frequency of counts from one week
Note: not all hours are represented; missing values are filled with zeroes below.
Popular times, all values from campus with ferry terminal highlighted
Frequency of counts from one week
<- fitdist(uq_ferry_places_agg$count, "pois")
ferry_poi <- fitdist(uq_ferry_places_agg$count, "nbinom")
ferry_nb
par(mfrow = c(1,2))
denscomp(list(ferry_poi, ferry_nb), legendtext = c("Poisson", "negative binomial"), fitlty = 1)
cdfcomp(list(ferry_poi, ferry_nb), legendtext = c("Poisson", "negative binomial"), fitlty = 1)
gofstat(list(ferry_poi, ferry_nb), fitnames = c("Poisson", "negative binomial"))
Chi-squared statistic: Inf 70.82962
Degree of freedom of the Chi-squared distribution: 9 8
Chi-squared p-value: 0 0.000000000003359121
the p-value may be wrong with some theoretical counts < 5
Chi-squared table:
obscounts theo Poisson theo negative binomial
<= 0 41 0.0000000000000000000002641408 27.381911
<= 9 13 0.0000000000038717541059815836 41.302870
<= 14 12 0.0000000089103846541210810503 10.448500
<= 26 12 0.0019670000701348363050802170 17.076772
<= 45 12 17.0636425431348008885379385902 17.060045
<= 63 12 130.4951840071383628583134850487 10.894257
<= 81 13 20.3784755604183658306283177808 8.050143
<= 97 12 0.0607150499597270965068673831 5.585067
<= 109 13 0.0000158242374750372505332052 3.477690
<= 139 12 0.0000000061268750073395494837 6.769476
> 139 16 0.0000000000000000000000000000 19.953269
Goodness-of-fit criteria
Poisson negative binomial
Akaike's Information Criterion 11566.73 1580.991
Bayesian Information Criterion 11569.85 1587.239
Frequency of counts from one week
# Best distribution:
<- fitdist(uq_ferry_places_dep$count, "pois")
ferry_poi <- fitdist(uq_ferry_places_dep$count, "nbinom")
ferry_nb
par(mfrow = c(1,2))
denscomp(list(ferry_poi, ferry_nb), legendtext = c("Poisson", "negative binomial"), fitlty = 1)
cdfcomp(list(ferry_poi, ferry_nb), legendtext = c("Poisson", "negative binomial"), fitlty = 1)
gofstat(list(ferry_poi, ferry_nb), fitnames = c("Poisson", "negative binomial"))
Chi-squared statistic: Inf 17.58816
Degree of freedom of the Chi-squared distribution: 9 8
Chi-squared p-value: 0 0.02453515
the p-value may be wrong with some theoretical counts < 5
Chi-squared table:
obscounts theo Poisson theo negative binomial
<= 0 44 0.00000000001590908 36.240451
<= 3 16 0.00000007913613795 26.656444
<= 8 13 0.00034672032032836 20.244097
<= 14 13 0.15535790125945512 14.729387
<= 20 12 5.79898663386964941 10.396261
<= 27 12 50.09845701895704906 9.156764
<= 35 12 85.59695394811666347 8.039414
<= 55 12 26.34752645359362688 13.641054
<= 74 12 0.00237124410908862 8.162125
<= 101 12 0.00000000062209171 7.354514
> 101 10 0.00000000000000000 13.379490
Goodness-of-fit criteria
Poisson negative binomial
Akaike's Information Criterion 8440.666 1372.212
Bayesian Information Criterion 8443.790 1378.460
Frequency of counts from one week
# Best distribution:
<- fitdist(uq_ferry_places_arr$count, "pois")
ferry_poi <- fitdist(uq_ferry_places_arr$count, "nbinom")
ferry_nb
par(mfrow = c(1,2))
denscomp(list(ferry_poi, ferry_nb), legendtext = c("Poisson", "negative binomial"), fitlty = 1)
cdfcomp(list(ferry_poi, ferry_nb), legendtext = c("Poisson", "negative binomial"), fitlty = 1)
gofstat(list(ferry_poi, ferry_nb), fitnames = c("Poisson", "negative binomial"))
Chi-squared statistic: Inf 25.19206
Degree of freedom of the Chi-squared distribution: 9 8
Chi-squared p-value: 0 0.001442216
the p-value may be wrong with some theoretical counts < 5
Chi-squared table:
obscounts theo Poisson theo negative binomial
<= 0 47 0.000000002789329 35.952304
<= 5 13 0.000271157958291 38.661476
<= 8 13 0.014086265061311 11.449308
<= 12 14 0.568177072496587 11.263440
<= 18 12 15.851226877138243 12.322156
<= 24 13 65.490918145396066 9.157113
<= 32 12 74.907731966976058 9.236482
<= 42 12 11.070122067171795 8.517029
<= 63 12 0.097466438442975 11.593794
<= 109 12 0.000000006569350 11.814118
> 109 8 0.000000000000000 8.032780
Goodness-of-fit criteria
Poisson negative binomial
Akaike's Information Criterion 6960.914 1331.073
Bayesian Information Criterion 6964.038 1337.321
<- gam(count ~
mod1 s(time, k = 24, bs = "cr"),
family = nb(),
data = uq_ferry_places_agg)
<- gam(count ~
mod2 s(time, k = 24, bs = "cr") +
s(hod, k = 12, bs = "cc"),
family = nb(),
data = uq_ferry_places_agg)
<- gam(count ~ popularity +
mod3 s(time, k = 24, bs = "cr"),
family = nb(),
data = uq_ferry_places_agg)
<- gam(count ~ s(popularity, k = 6, bs = "cr") +
mod4 s(time, k = 24, bs = "cr") +
s(hod, k = 12, bs = "cc"),
family = nb(),
data = uq_ferry_places_agg)
mod4
Summary
Family: Negative Binomial(9.351)
Link function: log
Formula:
count ~ s(popularity, k = 6, bs = "cr") + s(time, k = 24, bs = "cr") +
s(hod, k = 12, bs = "cc")
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.83229 0.03568 107.4 <0.0000000000000002 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(popularity) 3.543 4.191 27.50 0.0000227 ***
s(time) 3.302 4.100 55.35 < 0.0000000000000002 ***
s(hod) 9.031 10.000 196.98 < 0.0000000000000002 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.784 Deviance explained = 86%
-REML = 573.02 Scale est. = 1 n = 127
Fitted values
<- gam(count ~
mod1 s(time, k = 24, bs = "cr"),
family = nb(),
data = uq_ferry_places_dep)
<- gam(count ~
mod2 s(time, k = 24, bs = "cr") +
s(hod, k = 12, bs = "cc"),
family = nb(),
data = uq_ferry_places_dep)
<- gam(count ~ popularity +
mod3 s(time, k = 24, bs = "cr"),
family = nb(),
data = uq_ferry_places_dep)
<- gam(count ~ s(popularity, k = 6, bs = "cr") +
mod4 s(time, k = 24, bs = "cr") +
s(hod, k = 12, bs = "cc"),
family = nb(),
data = uq_ferry_places_dep)
mod4
Summary
Family: Negative Binomial(10.376)
Link function: log
Formula:
count ~ s(popularity, k = 6, bs = "cr") + s(time, k = 24, bs = "cr") +
s(hod, k = 12, bs = "cc")
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.0260 0.0417 72.57 <0.0000000000000002 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(popularity) 1.001 1.001 33.53 < 0.0000000000000002 ***
s(time) 3.033 3.772 30.01 0.00000536 ***
s(hod) 7.853 10.000 267.89 < 0.0000000000000002 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.826 Deviance explained = 90.7%
-REML = 464.16 Scale est. = 1 n = 127
Fitted values
<- gam(count ~
mod1 s(time, k = 24, bs = "cr"),
family = nb(),
data = uq_ferry_places_arr)
<- gam(count ~
mod2 s(time, k = 24, bs = "cr") +
s(hod, k = 12, bs = "cc"),
family = nb(),
data = uq_ferry_places_arr)
<- gam(count ~ popularity +
mod3 s(time, k = 24, bs = "cr"),
family = nb(),
data = uq_ferry_places_arr)
<- gam(count ~ s(popularity, k = 6, bs = "cr") +
mod4 s(time, k = 24, bs = "cr") +
s(hod, k = 12, bs = "cc"),
family = nb(),
data = uq_ferry_places_arr)
mod4
Summary
Family: Negative Binomial(11.496)
Link function: log
Formula:
count ~ s(popularity, k = 6, bs = "cr") + s(time, k = 24, bs = "cr") +
s(hod, k = 12, bs = "cc")
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.93214 0.03875 75.68 <0.0000000000000002 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(popularity) 3.327 3.981 15.85 0.0033 **
s(time) 17.132 19.284 138.31 <0.0000000000000002 ***
s(hod) 9.228 10.000 148.80 <0.0000000000000002 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.833 Deviance explained = 89.5%
-REML = 500.1 Scale est. = 1 n = 127
Fitted values
# ####
# Jeffreys prior
<- 5e-5
a1 <- 5e-5
b1 <- list(prec = list(param = c(a1, b1)))
lgprior1
# Gelman prior
<- -0.5
a2 <- 5e-5
b2 <- list(prec = list(param = c(a2, b2)))
lgprior2
# iid prior
# Schrödle & Held 2010 & Blangiardo et al 2013
<- 1
a0 <- 0.1
b0 <- list(prec = list(param = c(a0, b0)))
prior.nu
# intercept & fixed
inla.set.control.fixed.default()
# intercept ~ N(0,0)
# other fixed effects ~ N(0, 0.001)
#
# where the format is N(mean, precision)
# precision = inverse of the variance.
# PC prior
<- 1
U = list(theta = list(
hyper.prec prior = "pc.prec",
param = c(U, 0.01)
))
# scaling
inla.setOption(scale.model.default = TRUE)
<- inla(count ~
mod1 f(time, model = "rw1", hyper = hyper.prec) +
f(hod, model = "rw1", hyper = hyper.prec),
family = "nbinomial",
control.compute = list(dic= TRUE, waic = TRUE),
data = uq_ferry_places_agg)
<- inla(count ~
mod2 f(time, model = "rw2", hyper = hyper.prec) +
f(hod, model = "rw2", hyper = hyper.prec),
family = "nbinomial",
control.compute = list(dic= TRUE, waic = TRUE),
data = uq_ferry_places_agg)
<- inla(count ~
mod3 f(popularity, model = "rw1", hyper = hyper.prec) +
f(time, model = "rw1", hyper = hyper.prec) +
f(hod, model = "rw1", hyper = hyper.prec),
family = "nbinomial",
control.compute = list(dic= TRUE, waic = TRUE),
data = uq_ferry_places_agg)
<- inla(count ~
mod4 f(popularity, model = "rw2", hyper = hyper.prec) +
f(time, model = "rw2", hyper = hyper.prec) +
f(hod, model = "rw2", hyper = hyper.prec),
family = "nbinomial",
control.compute = list(dic= TRUE, waic = TRUE),
data = uq_ferry_places_agg)
mod3
Summary
Call:
c("inla.core(formula = formula, family = family, contrasts = contrasts,
", " data = data, quantiles = quantiles, E = E, offset = offset, ", "
scale = scale, weights = weights, Ntrials = Ntrials, strata = strata,
", " lp.scale = lp.scale, link.covariates = link.covariates, verbose =
verbose, ", " lincomb = lincomb, selection = selection, control.compute
= control.compute, ", " control.predictor = control.predictor,
control.family = control.family, ", " control.inla = control.inla,
control.fixed = control.fixed, ", " control.mode = control.mode,
control.expert = control.expert, ", " control.hazard = control.hazard,
control.lincomb = control.lincomb, ", " control.update =
control.update, control.lp.scale = control.lp.scale, ", "
control.pardiso = control.pardiso, only.hyperparam = only.hyperparam,
", " inla.call = inla.call, inla.arg = inla.arg, num.threads =
num.threads, ", " blas.num.threads = blas.num.threads, keep = keep,
working.directory = working.directory, ", " silent = silent, inla.mode
= inla.mode, safe = FALSE, debug = debug, ", " .parent.frame =
.parent.frame)")
Time used:
Pre = 0.599, Running = 0.914, Post = 0.209, Total = 1.72
Fixed effects:
mean sd 0.025quant 0.5quant 0.975quant mode kld
(Intercept) 2.479 0.094 2.276 2.485 2.646 2.497 0
Random effects:
Name Model
popularity RW1 model
time RW1 model
hod RW1 model
Model hyperparameters:
mean sd
size for the nbinomial observations (1/overdispersion) 3413.935 22058.583
Precision for popularity 2490.046 23009.094
Precision for time 0.444 0.091
Precision for hod 0.349 0.105
0.025quant 0.5quant
size for the nbinomial observations (1/overdispersion) 77.434 678.134
Precision for popularity 32.643 365.712
Precision for time 0.288 0.436
Precision for hod 0.185 0.335
0.975quant mode
size for the nbinomial observations (1/overdispersion) 22277.593 163.599
Precision for popularity 16304.439 71.686
Precision for time 0.646 0.421
Precision for hod 0.594 0.309
Deviance Information Criterion (DIC) ...............: 1021.64
Deviance Information Criterion (DIC, saturated) ....: 1050.28
Effective number of parameters .....................: 100.75
Watanabe-Akaike information criterion (WAIC) ...: 1007.81
Effective number of parameters .................: 65.35
Marginal log-Likelihood: -949.90
is computed
Posterior summaries for the linear predictor and the fitted values are computed
(Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')
Fitted values
<- inla(count ~
mod1 f(time, model = "rw1", hyper = hyper.prec) +
f(hod, model = "rw1", hyper = hyper.prec),
family = "nbinomial",
control.compute = list(dic= TRUE, waic = TRUE),
data = uq_ferry_places_dep)
<- inla(count ~
mod2 f(time, model = "rw2", hyper = hyper.prec) +
f(hod, model = "rw2", hyper = hyper.prec),
family = "nbinomial",
control.compute = list(dic= TRUE, waic = TRUE),
data = uq_ferry_places_dep)
<- inla(count ~
mod3 f(popularity, model = "rw1", hyper = hyper.prec) +
f(time, model = "rw1", hyper = hyper.prec) +
f(hod, model = "rw1", hyper = hyper.prec),
family = "nbinomial",
control.compute = list(dic= TRUE, waic = TRUE),
data = uq_ferry_places_dep)
<- inla(count ~
mod4 f(popularity, model = "rw2", hyper = hyper.prec) +
f(time, model = "rw2", hyper = hyper.prec) +
f(hod, model = "rw2", hyper = hyper.prec),
family = "nbinomial",
control.compute = list(dic= TRUE, waic = TRUE),
data = uq_ferry_places_dep)
mod3
Summary
Call:
c("inla.core(formula = formula, family = family, contrasts = contrasts,
", " data = data, quantiles = quantiles, E = E, offset = offset, ", "
scale = scale, weights = weights, Ntrials = Ntrials, strata = strata,
", " lp.scale = lp.scale, link.covariates = link.covariates, verbose =
verbose, ", " lincomb = lincomb, selection = selection, control.compute
= control.compute, ", " control.predictor = control.predictor,
control.family = control.family, ", " control.inla = control.inla,
control.fixed = control.fixed, ", " control.mode = control.mode,
control.expert = control.expert, ", " control.hazard = control.hazard,
control.lincomb = control.lincomb, ", " control.update =
control.update, control.lp.scale = control.lp.scale, ", "
control.pardiso = control.pardiso, only.hyperparam = only.hyperparam,
", " inla.call = inla.call, inla.arg = inla.arg, num.threads =
num.threads, ", " blas.num.threads = blas.num.threads, keep = keep,
working.directory = working.directory, ", " silent = silent, inla.mode
= inla.mode, safe = FALSE, debug = debug, ", " .parent.frame =
.parent.frame)")
Time used:
Pre = 0.551, Running = 0.793, Post = 0.203, Total = 1.55
Fixed effects:
mean sd 0.025quant 0.5quant 0.975quant mode kld
(Intercept) 1.775 0.117 1.524 1.783 1.984 1.798 0
Random effects:
Name Model
popularity RW1 model
time RW1 model
hod RW1 model
Model hyperparameters:
mean sd 0.025quant
size for the nbinomial observations (1/overdispersion) 28.850 12.445 11.163
Precision for popularity 36.372 29.556 7.429
Precision for time 3.650 2.235 1.089
Precision for hod 0.412 0.129 0.215
0.5quant 0.975quant
size for the nbinomial observations (1/overdispersion) 26.678 59.293
Precision for popularity 28.118 114.281
Precision for time 3.089 9.523
Precision for hod 0.393 0.718
mode
size for the nbinomial observations (1/overdispersion) 22.622
Precision for popularity 17.434
Precision for time 2.281
Precision for hod 0.357
Deviance Information Criterion (DIC) ...............: 897.12
Deviance Information Criterion (DIC, saturated) ....: 651.05
Effective number of parameters .....................: 48.14
Watanabe-Akaike information criterion (WAIC) ...: 899.27
Effective number of parameters .................: 40.68
Marginal log-Likelihood: -825.66
is computed
Posterior summaries for the linear predictor and the fitted values are computed
(Posterior marginals needs also 'control.compute=list(return.marginals.predictor=TRUE)')
Fitted values
<- inla(count ~
mod1 f(time, model = "rw1", hyper = hyper.prec) +
f(hod, model = "rw1", hyper = hyper.prec),
family = "nbinomial",
control.compute = list(dic= TRUE, waic = TRUE),
data = uq_ferry_places_arr)
<- inla(count ~
mod2 f(time, model = "rw2", hyper = hyper.prec) +
f(hod, model = "rw2", hyper = hyper.prec),
family = "nbinomial",
control.compute = list(dic= TRUE, waic = TRUE),
data = uq_ferry_places_arr)
<- inla(count ~
mod4 f(popularity, model = "rw2", hyper = hyper.prec) +
f(time, model = "rw2", hyper = hyper.prec) +
f(hod, model = "rw2", hyper = hyper.prec),
family = "nbinomial",
control.compute = list(dic= TRUE, waic = TRUE),
data = uq_ferry_places_arr)
mod3
Summary
Fitted values
<- mod3$summary.fitted.values
P3_fit
$x <- uq_ferry_places_arr$time
P3_fit$y <- uq_ferry_places_arr$count
P3_fit
ggplot(P3_fit, aes(x = x)) +
geom_ribbon(aes(ymin = `0.025quant`, ymax = `0.975quant`), alpha=0.3) +
geom_line(aes(y = `0.5quant`), color = "purple") +
geom_point(aes(y = y), alpha = 0.5) +
theme_minimal() + ylab("Counts") + xlab("Time")