1. Converting sf objects to spm

library(smile)
library(ggplot2)
library(sf)

Spatial data often needs to be analyzed at different scales or resolutions. This package helps you align disparate spatial data using two approaches:

This vignette focuses on the model-based approach for areal data (like, for instance, census tract data). This method involves converting sf(Pebesma 2018) objects (a common format for spatial data in R) to the spm format used by this package. We’ll use the ggplot2(Wickham 2011) package for visualization.

To demonstrate this conversion, we’ll use life expectancy at birth (LEB) and the index of multiple deprivation (IMD) data for Liverpool from Johnson, Diggle, and Giorgi (2020). This data is available at different spatial resolutions (MSOA and LSOA). See Figure \(\ref{fig:leb-msoa}\) for a visualization of life expectancy at the MSOA level.


data(liv_lsoa) # loading the LSOA data
data(liv_msoa) # loading the MSOA data

## workaround for compatibility with different PROJ versions
st_crs(liv_msoa) <-
    st_crs(liv_msoa)$input
st_crs(liv_lsoa) <-
    st_crs(liv_lsoa)$input
ggplot(data = liv_msoa,
       aes(fill = leb_est)) +
    geom_sf(color = "black",
            lwd   = .1) +
    scale_fill_viridis_b(option = "H") +
    theme_minimal()
LEB in Liverpool at the MSOA.
LEB in Liverpool at the MSOA.

To analyze the relationship between life expectancy (LEB) and deprivation at the LSOA level, we need to estimate LEB at this higher resolution. We assume LEB follows a continuous spatial pattern represented by a Gaussian process.

Mathematically, the observed LEB in an areal resolution (e.g., an MSOA) as averages of a continuous underlying process across that area. If we knew the exact parameters of this process, we could calculate the LEB for any location. However, these calculations involve complex integrals that are difficult to solve analytically. For more details on the theory behind this, see this vignette.

The sf_to_spm function controls how the model-based approach will approximate these integrals. It supports either numerical or Monte Carlo integration. Here’s how different parameters of this function change the integration method:

This approach allows us to estimate LEB at the LSOA level while accounting for the underlying spatial structure of the data.

The code below converts the liv_msoa object (in sf format) to an spm object. We generate a grid of 1000 points across Liverpool and assign each point to its corresponding area.

msoa_spm <-
    sf_to_spm(sf_obj = liv_msoa, n_pts = 1000,
              type = "regular", by_polygon = FALSE,
              poly_ids = "msoa11cd", var_ids = "leb_est")

Here’s what the additional arguments of the sf_to_spm function do:

This conversion prepares the data for spatial analysis using the smile package.

For the sake of comparison, the Figure below displays the grids associated with each of the possible combinations of the parameters type and by_polygon when calling the sf_to_spm function. Panel illustrating the grids generated for different approaches to approximate numerical integrals in the model-based approach.

For details on fitting models and making predictions, see this vignette.

References

Gelfand, Alan E, Li Zhu, and Bradley P Carlin. 2001. “On the Change of Support Problem for Spatio-Temporal Data.” Biostatistics 2 (1): 31–45.
Johnson, Olatunji, Peter Diggle, and Emanuele Giorgi. 2020. “Dealing with Spatial Misalignment to Model the Relationship Between Deprivation and Life Expectancy: A Model-Based Geostatistical Approach.” International Journal of Health Geographics 19 (1): 1–13.
Moraga, Paula, Susanna M Cramb, Kerrie L Mengersen, and Marcello Pagano. 2017. “A Geostatistical Model for Combined Analysis of Point-Level and Area-Level Data Using INLA and SPDE.” Spatial Statistics 21: 27–41.
Pebesma, Edzer. 2018. Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal 10 (1): 439–46. https://doi.org/10.32614/RJ-2018-009.
Wickham, Hadley. 2011. “Ggplot2.” Wiley Interdisciplinary Reviews: Computational Statistics 3 (2): 180–85.
Wilson, Katie, and Jon Wakefield. 2020. “Pointless Spatial Modeling.” Biostatistics 21 (2): e17–32.