| Title: | Spatially and Temporally Varying Coefficient Models Using Generalized Additive Models |
|---|---|
| Description: | A framework for undertaking space and time varying coefficient models (varying parameter models) using a Generalized Additive Model (GAM) with smooths approach. The framework suggests the need to investigate for the presence and nature of any space-time dependencies in the data. It proposes a workflow that creates and refines an initial space-time GAM and includes tools to create and evaluate multiple model forms. The workflow sequence is to: i) Prepare the data by lengthening it to have a single location and time variables for each observation. ii) Create all possible space and/or time models in which each predictor is specified in different ways in smooths. iii) Evaluate each model via their AIC value and pick the best one. iv) Create the final model. v) Calculate the varying coefficient estimates to quantify how the relationships between the target and predictor variables vary over space, time or space-time. vi) Create maps, time series plots etc. The number of knots used in each smooth can be specified directly or iteratively increased. This is illustrated with a climate point dataset of the dry rain forest in South America. This builds on work in Comber et al (2024) <doi:10.1080/13658816.2023.2270285> and Comber et al (2004) <doi:10.3390/ijgi13120459>. |
| Authors: | Lex Comber [aut, cre], Paul Harris [ctb], Gonzalo Irisarri [ctb], Chris Brunsdon [ctb] |
| Maintainer: | Lex Comber <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.2.1 |
| Built: | 2026-05-18 11:19:30 UTC |
| Source: | https://github.com/lexcomber/stgam |
Extracts varying coefficient estimates (for SVC, TVC and STVC models).
calculate_vcs(input_data, mgcv_model, terms = NULL)calculate_vcs(input_data, mgcv_model, terms = NULL)
input_data |
the data used to create the GAM model in |
mgcv_model |
a GAM model with smooths created using the |
terms |
a vector of names starting with "Intercept" plus the names of the covariates used in the GAM model (these are the names of the variables in the |
A data.frame of the input data, the coefficient estimates, the standard errors and the t-values estimates for each covariate. It can be used to generate coefficient estimates for specific time slices and over gridded surfaces as described in the package vignette.
require(dplyr) require(doParallel) # define input data data("chaco") input_data <- chaco |> # create Intercept as an addressable term mutate(Intercept = 1) # create a model for example as result of running `evaluate_models` gam.m = gam(ndvi ~ 0 + s(X, Y, by = Intercept) + s(X, Y, by = tmax) + s(X, Y, by = pr), data = input_data) # calculate the Varying Coefficients terms = c("Intercept", "tmax", "pr") vcs = calculate_vcs(input_data, gam.m, terms) vcs |> select(ndvi, X, Y, starts_with(c("b_", "se_", "t_")), yhat)require(dplyr) require(doParallel) # define input data data("chaco") input_data <- chaco |> # create Intercept as an addressable term mutate(Intercept = 1) # create a model for example as result of running `evaluate_models` gam.m = gam(ndvi ~ 0 + s(X, Y, by = Intercept) + s(X, Y, by = tmax) + s(X, Y, by = pr), data = input_data) # calculate the Varying Coefficients terms = c("Intercept", "tmax", "pr") vcs = calculate_vcs(input_data, gam.m, terms) vcs |> select(ndvi, X, Y, starts_with(c("b_", "se_", "t_")), yhat)
A point dataset of NDVI and climate data. The data are sample of 2000 observations of Normalised Difference Vegetation Index (NDVI) (2012-2022) of the Chaco dry rainforest in South America with some climate data. These are found via Google Earth Engine (Gorelick et al., 2017). The NDVI data is sourced from the PKU GIMMS NDVI v1.2 dataset, which provides NDVI observations at 1/12° spatial resolution at bi-monthly intervals from 1982 to 2022 (Li et al., 2023). The climate data was derived from the TerraClimate dataset (IDAHO_EPSCOR/TERRACLIMATE). Maximum temperature (tmax) and Precipitation (pr) were selected and means calculated for each monthly image across all pixels.
chacochaco
A sf POINT dataset with 2000 observations and 12 fields.
An observation identifier
Normalised Difference Vegetation Index (NDVI)
Maximum temperature (°C)
Preciptation
A continous integer variable from 1 to 120
The year of observation
Longitude in degrees (WGS84)
Latitude in degrees (WGS84)
Easting in metres from the SIRGAS 2000 / Brazil Mercator projection (EPSG:5641)
Northing in metres from the SIRGAS 2000 / Brazil Mercator projection (EPSG:5641)
The spatial geometry of the observation in the SIRGAS 2000 / Brazil Mercator projection (EPSG:5641)
Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., & Moore, R. (2017). Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sensing of Environment, 202, 18–27. https://doi.org/10.1016/J.RSE.2017.06.031
Li, M., Cao, S., Zhu, Z., Wang, Z., Myneni, R. B., & Piao, S. (2023). Spatiotemporally consistent global dataset of the GIMMS Normalized Difference Vegetation Index (PKU GIMMS NDVI) from 1982 to 2022. Earth System Science Data, 15(9), 4181–4203. https://doi.org/10.5194/ESSD-15-4181-2023
library(sf) data("chaco")library(sf) data("chaco")
mgcv GAMQuantifies the relative effect sizes of each component of mgcv GAM
effect_size(mgcv_model, digits = 3)effect_size(mgcv_model, digits = 3)
mgcv_model |
a GAM model created by the |
digits |
the number of significant figures of uysed to report effect size |
a matrix of the model terms, the size of the effect (range) ad standard deviation (sd)
require(dplyr) require(stringr) require(purrr) require(doParallel) # define input data data("chaco") m <- gam( ndvi ~ te(X,Y, by = tmax) + s(X,Y, by = pr), data = chaco, method = "REML", family = gaussian() ) # examine the effect size effect_size(m, 3)require(dplyr) require(stringr) require(purrr) require(doParallel) # define input data data("chaco") m <- gam( ndvi ~ te(X,Y, by = tmax) + s(X,Y, by = pr), data = chaco, method = "REML", family = gaussian() ) # examine the effect size effect_size(m, 3)
Evaluates multiple models with each predictor variable specified in different ways in order to determining model form
evaluate_models( input_data, target_var, vars, model_family = "gaussian()", bam = FALSE, coords_x = "X", coords_y = "Y", VC_type = "SVC", time_var = NULL, k_set = FALSE, spatial_k = 50, temporal_k = 10, k_increase = FALSE, k2edf_ratio = 1.5, k_multiplier = 2, max_iter = 10, ncores = 2 )evaluate_models( input_data, target_var, vars, model_family = "gaussian()", bam = FALSE, coords_x = "X", coords_y = "Y", VC_type = "SVC", time_var = NULL, k_set = FALSE, spatial_k = 50, temporal_k = 10, k_increase = FALSE, k2edf_ratio = 1.5, k_multiplier = 2, max_iter = 10, ncores = 2 )
input_data |
he data to be used used to create the GAM model in ( |
target_var |
the name of the target variable. |
vars |
a vector of the predictor variable names (without the Intercept). |
model_family |
the model family, defaults to Guassian |
bam |
a logical parameter indicating whether to use the |
coords_x |
the name of the X, Easting or Longitude variable in |
coords_y |
the name of the Y, Northing or Latitude variable in |
VC_type |
the type of varying coefficient model: options are "TVC" for temporally varying, "SVC" for spatially varying and "STVC" for space-time. |
time_var |
the name of the time variable if undertaking STVC model evaluations. |
k_set |
a logical value for user defined |
spatial_k |
the value of |
temporal_k |
the value of |
k_increase |
a logical value of whether to check and increase the number of knots in each smooth. The default is |
k2edf_ratio |
a threshold of the ratio of the number of knots, |
k_multiplier |
a multiplier by which the knots are increased on each iteration. The default is 2. |
max_iter |
the maximum number of iterations that |
ncores |
the number of cores to use in parallelised approaches (default is 2 to overcome CRAN package checks). This can be determined for your computer by running parallel::detectCores()-1. Parallel approaches are only undertaken if the number of models to evaluate is greater than 30. |
a data.frame with indices for each predictor variable, the knots specified in each smooth (ks), a AIC score (aic) for each model and the associated formula (f). The output should be passed to the gam_model_rank function.
## Not run: require(dplyr) require(doParallel) require(sf) # define input data data("chaco") input_data <- chaco |> # create Intercept as an addressable term mutate(Intercept = 1) |> # remove the geometry st_drop_geometry() # Evaluate different model forms # example 1 with 6 models and no `k` adjustment svc_mods <- evaluate_models( input_data = input_data, target_var = "ndvi", model_family = "gaussian()", vars = c("tmax"), coords_x = "X", coords_y = "Y", VC_type = "SVC" ) # have a look! svc_mods # example 2 with 6 models and `k` adjustment svc_k2_mods <- evaluate_models( input_data = input_data, target_var = "ndvi", vars = c("tmax"), model_family = "gaussian()", coords_x = "X", coords_y = "Y", VC_type = "SVC", k_increase = TRUE, k2edf_ratio = 1.5, k_multiplier = 2, max_iter = 10 ) # have a look! svc_k2_mods # example 3 with 6 models and `k` set by user svc_k3_mods <- evaluate_models( input_data = input_data, model_family = "gaussian()", target_var = "ndvi", vars = c("tmax"), coords_x = "X", coords_y = "Y", VC_type = "SVC", time_var = NULL, k_set = TRUE, spatial_k = 20, ) # have a look! svc_k3_mods # example comparing `gam` (slower) and `bam` (faster) t1 <- Sys.time() svc_GAM_mods <- evaluate_models( input_data = input_data, target_var = "ndvi", vars = c("tmax", "pr"), model_family = "gaussian()", bam = FALSE, coords_x = "X", coords_y = "Y", time_var = "month", VC_type = "STVC", ncores = detectCores()-1 ) t_gam <- Sys.time() - t1 t1 <- Sys.time() svc_BAM_mods <- evaluate_models( input_data = input_data, target_var = "ndvi", vars = c("tmax", "pr"), model_family = "gaussian()", bam = TRUE, coords_x = "X", coords_y = "Y", time_var = "month", VC_type = "STVC", ncores = detectCores()-1 ) t_bam <- Sys.time() - t1 t_gam - t_bam ## End(Not run)## Not run: require(dplyr) require(doParallel) require(sf) # define input data data("chaco") input_data <- chaco |> # create Intercept as an addressable term mutate(Intercept = 1) |> # remove the geometry st_drop_geometry() # Evaluate different model forms # example 1 with 6 models and no `k` adjustment svc_mods <- evaluate_models( input_data = input_data, target_var = "ndvi", model_family = "gaussian()", vars = c("tmax"), coords_x = "X", coords_y = "Y", VC_type = "SVC" ) # have a look! svc_mods # example 2 with 6 models and `k` adjustment svc_k2_mods <- evaluate_models( input_data = input_data, target_var = "ndvi", vars = c("tmax"), model_family = "gaussian()", coords_x = "X", coords_y = "Y", VC_type = "SVC", k_increase = TRUE, k2edf_ratio = 1.5, k_multiplier = 2, max_iter = 10 ) # have a look! svc_k2_mods # example 3 with 6 models and `k` set by user svc_k3_mods <- evaluate_models( input_data = input_data, model_family = "gaussian()", target_var = "ndvi", vars = c("tmax"), coords_x = "X", coords_y = "Y", VC_type = "SVC", time_var = NULL, k_set = TRUE, spatial_k = 20, ) # have a look! svc_k3_mods # example comparing `gam` (slower) and `bam` (faster) t1 <- Sys.time() svc_GAM_mods <- evaluate_models( input_data = input_data, target_var = "ndvi", vars = c("tmax", "pr"), model_family = "gaussian()", bam = FALSE, coords_x = "X", coords_y = "Y", time_var = "month", VC_type = "STVC", ncores = detectCores()-1 ) t_gam <- Sys.time() - t1 t1 <- Sys.time() svc_BAM_mods <- evaluate_models( input_data = input_data, target_var = "ndvi", vars = c("tmax", "pr"), model_family = "gaussian()", bam = TRUE, coords_x = "X", coords_y = "Y", time_var = "month", VC_type = "STVC", ncores = detectCores()-1 ) t_bam <- Sys.time() - t1 t_gam - t_bam ## End(Not run)
Ranks models by AIC, giving the model form for each predictor variable.
gam_model_rank(res_tab, n = 10, show_k = FALSE)gam_model_rank(res_tab, n = 10, show_k = FALSE)
res_tab |
a |
n |
the number of ranked models to return. |
show_k |
whether to include the smooth knots in the output (default is |
a tibble of the 'n' best models, ranked by AIC, with the form of each predictor variable where '—' indicates the absence of a predictor, 'Fixed' that a parametric form was specified, 's(S)' a spatial smooth, 's(T)' a temporal smooth and 'te(ST)' a combined space-time smooth. Model AIC is reported as are the knots in each smooth (ks) and the formula of each model (f).
require(dplyr) require(stringr) require(purrr) require(doParallel) require(sf) # define input data data("chaco") input_data <- chaco |> # create Intercept as an addressable term mutate(Intercept = 1) |> # remove the geometry st_drop_geometry() # evaluate different model forms # example 1 with 6 models and no `k` adjustment svc_mods <- evaluate_models( input_data = input_data, target_var = "ndvi", vars = c("tmax"), coords_x = "X", coords_y = "Y", VC_type = "SVC" ) # rank the models gam_model_rank(svc_mods, show_k = TRUE)require(dplyr) require(stringr) require(purrr) require(doParallel) require(sf) # define input data data("chaco") input_data <- chaco |> # create Intercept as an addressable term mutate(Intercept = 1) |> # remove the geometry st_drop_geometry() # evaluate different model forms # example 1 with 6 models and no `k` adjustment svc_mods <- evaluate_models( input_data = input_data, target_var = "ndvi", vars = c("tmax"), coords_x = "X", coords_y = "Y", VC_type = "SVC" ) # rank the models gam_model_rank(svc_mods, show_k = TRUE)