Title: | Spatially and Temporally Varying Coefficient Models Using Generalized Additive Models |
---|---|
Description: | A framework for specifying spatially, temporally and spatially-and-temporally varying coefficient models using Generalized Additive Models with Gaussian Process smooths. The smooths are parameterised with location and / or time attributes. Importantly the framework supports the investigation of the presence and nature of any space-time dependencies in the data, allows the user to evaluate different model forms (specifications) and to pick the most probable model or to combine multiple varying coefficient models using Bayesian Model Averaging. For more details see: Brunsdon et al (2023) <doi:10.4230/LIPIcs.GIScience.2023.17>, Comber et al (2023) <doi:10.4230/LIPIcs.GIScience.2023.22> and Comber et al (2024) <doi:10.1080/13658816.2023.2270285>, Comber et al (2004) <doi:10.3390/ijgi13120459>. |
Authors: | Lex Comber [aut, cre], Paul Harris [ctb], Chris Brunsdon [ctb] |
Maintainer: | Lex Comber <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.1.3 |
Built: | 2025-03-04 06:22:10 UTC |
Source: | https://github.com/lexcomber/stgam |
Extracts varying coefficient estimates (for SVC, TVC and STVC models).
calculate_vcs(input_data, model, terms)
calculate_vcs(input_data, model, terms)
input_data |
the data used to create the GAM model in |
model |
a GAM model with smooths created using the |
terms |
a vector of names starting with "Intercept" plus the names of the covariates used in the GAM model (these are the names of the variables in |
A data.frame
of the input data and the coefficient and standard error estimates for each covariate.
library(dplyr) library(mgcv) # SVC data(productivity) input_data = productivity |> dplyr::filter(year == "1970") |> mutate(Intercept = 1) gam.svc.mod = gam(privC ~ 0 + Intercept + s(X, Y, bs = 'gp', by = Intercept) + unemp + s(X, Y, bs = "gp", by = unemp) + pubC + s(X, Y, bs = "gp", by = pubC), data = input_data) terms = c("Intercept", "unemp", "pubC") svcs = calculate_vcs(input_data, gam.svc.mod, terms) head(svcs)
library(dplyr) library(mgcv) # SVC data(productivity) input_data = productivity |> dplyr::filter(year == "1970") |> mutate(Intercept = 1) gam.svc.mod = gam(privC ~ 0 + Intercept + s(X, Y, bs = 'gp', by = Intercept) + unemp + s(X, Y, bs = "gp", by = unemp) + pubC + s(X, Y, bs = "gp", by = pubC), data = input_data) terms = c("Intercept", "unemp", "pubC") svcs = calculate_vcs(input_data, gam.svc.mod, terms) head(svcs)
Undertake undertake coefficient averaging using Bayesian Model Avergaing (BMA), weighting different models by their probabilities
do_bma(model_table, terms, thresh = 0.1, relative = FALSE, input_data)
do_bma(model_table, terms, thresh = 0.1, relative = FALSE, input_data)
model_table |
a table of competing models generated by the |
terms |
a vector of names starting with "Intercept" plus the names of the covariates used in the GAM model (these are the names of the variables in |
thresh |
a probability threshold value above which to combine competing models |
relative |
logical to indicate whether the probabilities in |
input_data |
the input data with a named Intercept term, in |
A tibble
of the input data plus the probability weighted averaged coefficient estimates from multiple models, all starting with b_<covariate name>
library(cols4all) library(dplyr) library(sf) library(glue) library(purrr) library(mgcv) library(sf) library(ggplot2) # data data(productivity) input_data = productivity |> filter(year == "1970") |> mutate(Intercept = 1) # create and evaluate multiple models svc_res_gam = evaluate_models(input_data, STVC = FALSE) # determine their probabilities mod_comp_svc <- gam_model_probs(svc_res_gam) # combine the model coefficients svc_bma <- do_bma(model_table = mod_comp_svc, terms = c("Intercept", "unemp", "pubC"), thresh = 0.1, relative = FALSE, input_data = input_data) head(svc_bma) # join back to spatial layer data(us_data) svc_bma_sf <- us_data |> select(GEOID) |> left_join(svc_bma) # and map tit =expression(paste(""*beta[`Public Capital`]*" ")) ggplot(data = svc_bma_sf, aes(fill=b_pubC)) + geom_sf() + scale_fill_continuous_c4a_div(palette="brewer.yl_or_rd",name=tit) + coord_sf() + theme_linedraw()
library(cols4all) library(dplyr) library(sf) library(glue) library(purrr) library(mgcv) library(sf) library(ggplot2) # data data(productivity) input_data = productivity |> filter(year == "1970") |> mutate(Intercept = 1) # create and evaluate multiple models svc_res_gam = evaluate_models(input_data, STVC = FALSE) # determine their probabilities mod_comp_svc <- gam_model_probs(svc_res_gam) # combine the model coefficients svc_bma <- do_bma(model_table = mod_comp_svc, terms = c("Intercept", "unemp", "pubC"), thresh = 0.1, relative = FALSE, input_data = input_data) head(svc_bma) # join back to spatial layer data(us_data) svc_bma_sf <- us_data |> select(GEOID) |> left_join(svc_bma) # and map tit =expression(paste(""*beta[`Public Capital`]*" ")) ggplot(data = svc_bma_sf, aes(fill=b_pubC)) + geom_sf() + scale_fill_continuous_c4a_div(palette="brewer.yl_or_rd",name=tit) + coord_sf() + theme_linedraw()
Creates and evaluates multiple varying coefficient GAM GP smooth models (SVC or STVC)
evaluate_models( input_data, target_var = "privC", covariates = c("unemp", "pubC"), coords_x = "X", coords_y = "Y", STVC = FALSE, time_var = NULL, ncores = 2 )
evaluate_models( input_data, target_var = "privC", covariates = c("unemp", "pubC"), coords_x = "X", coords_y = "Y", STVC = FALSE, time_var = NULL, ncores = 2 )
input_data |
a |
target_var |
the name of the target variable in |
covariates |
the name of the covariates (predictor variables) in |
coords_x |
the name of the X, Easting or Longitude variable in |
coords_y |
the name of the Y, Northing or Latitude variable in |
STVC |
a logical operator to indicate whether the models Space-Time ( |
time_var |
the name of the time variable if undertaking STVC model evaluations |
ncores |
the number of cores to use in parallelised approaches (default is 2 to overcome CRAN package checks). This can be determined for your computer by running |
A data table in data.frame
format of all possible model combinations with each covariate specified in all possible ways, with the BIC of the model and the model formula.
library(dplyr) library(glue) library(purrr) library(doParallel) library(mgcv) data("productivity") input_data = productivity |> filter(year == "1970") svc_res_gam = evaluate_models(input_data = input_data, target_var = "privC", covariates = c("unemp", "pubC"), coords_x = "X", coords_y = "Y", STVC = FALSE) head(svc_res_gam)
library(dplyr) library(glue) library(purrr) library(doParallel) library(mgcv) data("productivity") input_data = productivity |> filter(year == "1970") svc_res_gam = evaluate_models(input_data = input_data, target_var = "privC", covariates = c("unemp", "pubC"), coords_x = "X", coords_y = "Y", STVC = FALSE) head(svc_res_gam)
model probabilities of the different GAM models generated by
evaluate_models'Calculates the model probabilities of the different GAM models generated by
evaluate_models'
gam_model_probs(res_tab, n = 10)
gam_model_probs(res_tab, n = 10)
res_tab |
a table generated by |
n |
the number of models to retain and generate probabilities for |
A ranked data table in tibble
format of the top n
models, their form, BIC and model or (Pr(M|D)
) or relative (Pr(M)
) probability value. Model probability indicates the probability of the each model being the correct model and the relative probabilities provide a measure of the doubt about the differences in model specification, when compared to the best or highest ranked model. The relative probabilities are needed when large BIC values generate near zero probability values.
library(dplyr) library(purrr) library(glue) library(mgcv) data(productivity) input_data = productivity |> filter(year == "1970") svc_res_gam = evaluate_models(input_data, STVC = FALSE) mod_comp_svc <- gam_model_probs(svc_res_gam, n = 10) # print out the terms and probabilities mod_comp_svc|> select(-f)
library(dplyr) library(purrr) library(glue) library(mgcv) data(productivity) input_data = productivity |> filter(year == "1970") svc_res_gam = evaluate_models(input_data, STVC = FALSE) mod_comp_svc <- gam_model_probs(svc_res_gam, n = 10) # print out the terms and probabilities mod_comp_svc|> select(-f)
Plots a 1-Dimensional GAM smooth
plot_1d_smooth(mod, ncol = NULL, nrow = NULL, fills = "lightblue")
plot_1d_smooth(mod, ncol = NULL, nrow = NULL, fills = "lightblue")
mod |
a GAM model with smooths created using the |
ncol |
the number of columns in the compound plot |
nrow |
the number of rows in the compound plot |
fills |
the fill colours (single or vector) |
A compound plot of the GAM 1-dimensional smooths (rendered using cowplot::plot_grid()
).
library(mgcv) library(ggplot2) library(dplyr) library(cowplot) # 1. from the `mgcv` `gam` function help set.seed(2) ## simulate some data... dat <- gamSim(1,n=400,dist="normal",scale=2) b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat) plot_1d_smooth(b, ncol = 2, fills = c("lightblue", "lightblue3")) dev.off() # 2. using a TVC data(productivity) data = productivity |> mutate(Intercept = 1) gam.tvc.mod = gam(privC ~ 0 + Intercept + s(year, bs = 'gp', by = Intercept) + unemp + s(year, bs = "gp", by = unemp) + pubC + s(year, bs = "gp", by = pubC), data = data) plot_1d_smooth(gam.tvc.mod, fills = "lightblue")
library(mgcv) library(ggplot2) library(dplyr) library(cowplot) # 1. from the `mgcv` `gam` function help set.seed(2) ## simulate some data... dat <- gamSim(1,n=400,dist="normal",scale=2) b <- gam(y~s(x0)+s(x1)+s(x2)+s(x3),data=dat) plot_1d_smooth(b, ncol = 2, fills = c("lightblue", "lightblue3")) dev.off() # 2. using a TVC data(productivity) data = productivity |> mutate(Intercept = 1) gam.tvc.mod = gam(privC ~ 0 + Intercept + s(year, bs = 'gp', by = Intercept) + unemp + s(year, bs = "gp", by = unemp) + pubC + s(year, bs = "gp", by = pubC), data = data) plot_1d_smooth(gam.tvc.mod, fills = "lightblue")
Plots a 2-Dimensional GAM smooth
plot_2d_smooth(mod, filled = FALSE, outline = NULL, ncol = NULL, nrow = NULL)
plot_2d_smooth(mod, filled = FALSE, outline = NULL, ncol = NULL, nrow = NULL)
mod |
a GAM model with smooths created using the |
filled |
|
outline |
the name of an |
ncol |
the number of columns for the compound plot |
nrow |
the number of rows for the compound plot |
A compound plot of the 2-dimensional smooths (rendered using cowplot::plot_grid
).
library(mgcv) library(ggplot2) library(dplyr) library(metR) library(cowplot) set.seed(2) ## simulate some data... dat <- gamSim(1,n=400,dist="normal",scale=2) # use x1 and x2 as the coordinates b <- gam(y~s(x0, x1, bs = 'gp', by = x2),data=dat) plot_2d_smooth(b, filled = TRUE)
library(mgcv) library(ggplot2) library(dplyr) library(metR) library(cowplot) set.seed(2) ## simulate some data... dat <- gamSim(1,n=400,dist="normal",scale=2) # use x1 and x2 as the coordinates b <- gam(y~s(x0, x1, bs = 'gp', by = x2),data=dat) plot_2d_smooth(b, filled = TRUE)
A dataset of annual economic productivity data for the 48 contiguous US states (with Washington DC merged into Maryland), from 1970 to 1985 (17 years) in long format. The data productivity data table was extracted from the plm
package.
productivity
productivity
A tibble with 816 rows and 14 columns.
The name of the state
The state code
The year of observation
The region
Public capital which is composed of highways and streets (hwy) water and sewer facilities (water) and other public buildings and structures (util)
Highway and streets assets
Water utility assets
Other public buildings and structures
Private captial stock
Gross state product
Labour input measured by the employment in non-agricultural payrolls
State unemployment rate capture elements of the business cycle
Easting in metres from USA Contiguous Equidistant Conic projection (ESRI:102005)
Northing in metres from USA Contiguous Equidistant Conic projection (ESRI:102005)
Croissant, Yves, Giovanni Millo, and Kevin Tappe. 2022. Plm: Linear Models for Panel Data
data(productivity)
data(productivity)
A dataset of of the boundaries of 48 contiguous US states (with Washington DC merged into Maryland), extracted from the spData
package.
us_data
us_data
A sf
polygon dataset with 48 rows and 6 fields.
The state code
The name of the state
The US region
The area in km^2
Population in 2010
Population in 2015
Bivand, Roger, Jakub Nowosad, and Robin Lovelace. 2019. spData: Datasets for Spatial Analysis. R package
data(us_data)
data(us_data)