Let us set up the notations first. Suppose a there exists a partition of a region \(\mathrm{D} \in \mathcal{R}^2\) (e.g., a city). This partition is denoted by \(A_i\), \(i = 1, \ldots, n\). Moreover, there exists another partition of the same city, denoted \(B_j\), where \(j = 1, \ldots, m\). These partitions can be seen as two different administrative divisions within the same city. It is common for different government agencies to release data for different divisions of a same city, country, or state.
Consequently,
\[\mathrm{E}[Y(A_i)] = \frac{1}{\lvert A_i \rvert} \int_{A_i} \mathrm{E}[Y(\mathbf{s})] \, \mathrm{d} \mathbf{s} = \frac{1}{\lvert A_i \rvert} \int_{A_i} \mu \, \mathrm{d} \mathbf{s} = \mu\]
and
\[\begin{align*} \mathrm{Cov}[Y(A_i), Y(A_j)] & = \frac{1}{\lvert A_i \rvert \lvert A_j \rvert} \int_{A_i \times A_j} \mathrm{Cov}[Y(\mathbf{s}, Y(\mathbf{s}')] \, \mathrm{d} \mathbf{s} \, \mathrm{d} \mathbf{s'} \\ & = \frac{1}{\lvert A_i \rvert \lvert A_j \rvert} \int_{A_i \times A_j} \mathrm{C}( \lVert \mathbf{s} - \mathbf{s}' \rVert ; \boldsymbol{\theta}) \, \mathrm{d} \mathbf{s} \, \mathrm{d} \mathbf{s'}, \end{align*}\]
where \(\lVert \mathbf{s} - \mathbf{s}' \rVert\) is the Euclidean distance between the coordinates \(\mathbf{s}\) and \(\mathbf{s}'\), and \(\mathrm{C}( \lVert \mathbf{s} - \mathbf{s}' \rVert ; \boldsymbol{\theta})\) is an isotropic covariance function depending on the parameter \(\boldsymbol{\theta}\).
Assume we observe a random variable \(Y(\cdot)\) at each region \(A_i\) and we are interested in predict/estimate this variable in each of the regions \(B_j\). Now suppose the random variable \(Y(\cdot)\) varies continuously over \(\mathrm{D}\) and is defined as follows \[Y(\mathbf{s}) = \mu + S(\mathbf{s}) + \varepsilon(\mathbf{s}), \, \mathbf{s} \in \mathrm{D} \subset \mathcal{R}^2.\]
where \[ S(\cdot) \sim \mathrm{GP}(0, \sigma^2 \rho(\cdot; \, \phi, \kappa)) \; \text{ and } \; \varepsilon(\cdot) \overset{\mathrm{i.i.d.}}{\sim} \mathrm{N}(0, \sigma^2 \rho(\cdot; \, \phi, \kappa)), \] where \(S\) and \(\varepsilon\) are independent. For now, let’s make the unrealistic assumption that all those parameters are known. Then, our assumption is that the observed data is as follows
\[\begin{align*} Y(A_i) & = \frac{1}{\lvert A_i \rvert} \int_{A_i} Y(\mathbf{s}) \, \mathrm{d} \mathbf{s} \\ & = \frac{1}{\lvert A_i \rvert} \int_{A_i} [\mu + S(\mathbf{s}) + \varepsilon(\mathbf{s})] \, \mathrm{d} \mathbf{s} \\ & = \mu + \frac{1}{\lvert A_i \rvert} \int_{A_i} S(\mathbf{s}) \mathrm{d} \mathbf{s} + \frac{1}{\lvert A_i \rvert} \int_{A_i} \varepsilon(\mathbf{s}) \mathrm{d} \mathbf{s}, \end{align*}\]
where \(\lvert \cdot \rvert\) returns the area of a polygon. Furthermore, it can be shown that (using Fubini’s Theorem and some algebraic manipulation) \[ \mathrm{Cov}(Y(A_i), Y(A_j)) = \frac{\sigma^2}{\lvert A_i \rvert \lvert A_j \rvert} \int_{A_i \times A_j} \rho( \lVert \mathbf{s} - \mathbf{s}' \rVert; \, \phi, \kappa ) \, \mathrm{d} \mathbf{s} \, \mathrm{d} \mathbf{s}' + \mathbf{I}(i = j) \frac{\tau}{\lvert A_i \rvert}, \] where \(\rho(\cdot ; \, \phi, \kappa)\) is a positive definite correlation function. Now, let \(\mathrm{R}_{\kappa}(\phi)\) be a correlation matrix such that \[ \mathrm{R}_{\kappa}(\phi)_{ij} = \frac{1}{\lvert A_i \rvert \lvert A_j \rvert} \int_{A_i \times A_j} \rho( \lVert \mathbf{s} - \mathbf{s}' \rVert; \, \phi, \kappa ) \, \mathrm{d} \mathbf{s} \, \mathrm{d} \mathbf{s}', \] thus, \[ Y(A_1, \cdots, A_n) \sim \mathrm{N}( \mu \mathbf{1}_n, \sigma^2 \mathrm{R}_{\kappa}(\phi) + \tau \mathrm{diag}(\lvert A_1 \rvert^{-1}, \ldots, \lvert A_1 \rvert^{-1})). \] Then, if we assume \((Y^{\top}(A_1, \cdots, A_n), Y^{\top}(B_1, \cdots, A_m)^{\top})\) to be jointly normal, we use can the conditional mean of \(Y^{\top}(B_1, \cdots, A_m)^{\top}\) given \(Y^{\top}(A_1, \cdots, A_n)\) to estimate the observed random variable in the partition \(B_1, \ldots, B_m\).
Now, suppose the parameters \(\boldsymbol{\theta} = (\mu, \sigma^2, \phi, \tau)\) are unknown. The Likelihood of \(Y(A_1, \ldots, A_n)\) can still be computed.
In particular, if we use the parametrization \(\alpha = \tau / \sigma^2\), we have closed form for the Maximum Likelihood estimators both for \(\mu\) and \(\sigma^2\). Thus, we can optimize the profile likelihood for \(\phi\) and \(\alpha\) numerically. Then, we resort on conditional Normal properties again to compute the predictions in a new different set of regions.
Areal interpolation is a nonparametric approach that interpolates \(Y(A_i)\)’s to construct \(Y(B_j)\)’s. Define an \(m \times n\) matrix \(\mathbf{W} = \{ w_{ij} \}\), where \(w_{ij}\) is the weight associated with the polygon \(A_i\) in constructing \(Y(B_j)\). The weights are \(w_{ij} = \lvert A_i \cap B_j \rvert / \lvert B_j \rvert\) (Goodchild and Lam 1980; Gotway and Young 2002). The interpolation for \(\hat Y(B_1, \ldots, B_m)\) is constructed as \[\begin{equation} \label{eq:np-est} \hat{Y}(B_1, \ldots, B_m) = \mathbf{W} Y(A_1, \ldots, A_n). \end{equation}\] The expectation and variance of the predictor are, respectively, \[ \mathrm{E}[\hat{Y}(B_1, \ldots, B_m)] = \mathbf{W} \mathrm{E}[Y(A_1, \ldots, A_n)] \] and \[\begin{equation} \label{eq:np-matcov} \textrm{Var}[\hat{Y}(B_1, \ldots, B_m)] = \mathbf{W} \textrm{Var}[Y(A_1, \ldots, A_n)] \mathbf{W}^{\top}. \end{equation}\] In practice, the covariance matrix \(\textrm{Var}[Y(A_1, \ldots, A_n)]\) is unknown and, consequently needs to be estimated.
The variance each predictor \(\text{Var}[\hat Y(B_i)]\) is needed as an uncertainty measure. It relies on both the variances of \(Y(A_j)\)’s and their covariances: \[\begin{align} \label{eq:np-single-var} \textrm{Var}[\hat{Y}(B_i)] = \sum_{i = 1}^n w^2_{ij} \textrm{Var} \left [ Y(A_i) \right ] + 2 \sum_{l \neq i} w_{ij} w_{il} \textrm{Cov} \left[ Y(A_i), Y(A_l) \right]. \end{align}\] The variances are often observed in survey data, but the covariances are not. For practical purpose, we propose an approximation for \(\textrm{Cov}[ Y(A_i), Y(A_l)]\) based on Moran’s I, a global spatial autocorrelation. Specifically, let \(\rho_I\) be the Moran’s I calculated with a weight matrix constructed with first-degree neighbors. That is, \(\rho_I\) is the average of the pairwise correlation for all neighboring pairs. For regions \(A_i\) and \(A_l\), if they are neighbors of each other, our approximation is \[\begin{align} \label{eq:cova} \textrm{Cov} \left[ Y(A_i), Y(A_l) \right] = \rho_I \sqrt{\text{Var}[Y(A_i)] \text{Var}[Y(A_l)]}. \end{align}\] The covariance between non-neighboring \(A_i\) and \(A_l\) is discarded. The final uncertainty approximation for \(\textrm{Var}[\hat{Y}(B_i)]\) will be an underestimate. Alternatively, we can derive, at least, an upper bound for the variance of the estimates by using a simple application from the Cauchy–Schwartz inequality, in which case, \(\rho_I\) is replaced with~1.