Pierre Goovaerts on
Kriging Variance, Stationarity, and Cross-Validation

From an email to AI-Geostats:
Date: Fri, 4 Dec 1998 09:50:50 -0500 (EST)
From: Pierre Goovaerts 
Kriging provides not only a least-squares estimate of the attribute but also the attached error variance. The so-called kriging variance is unfortunately often misused as a measure of reliability of the kriging estimate, as reminded by several authors (Journel, 1993; Armstrong, 1994; Goovaerts, 1997). By doing so, one assumes that the variance of the errors is independent of the actual data values and depends only on the data configuration, a situation referred to as ``homoscedasticity". Homoscedasticity is rarely met in practice because the local variance of data usually changes across the study area (non-stationarity).

Stationarity is a property of the random function model that is needed for statistical inference, and in most situations the ``stationarity decision" is taken by the user simply because he cannot afford to split data into smaller subsets! An interesting approach consists of rescaling locally a relative semivariogram computed from all the data so that its sill equals the variance of the data within the kriging search neighborhood (Isaaks and Srivastava, 1989, p.516--524). Whereas this local rescaling does not change the kriging weights and so the estimated value, it might provide kriging variances that better inform on the actual estimation errors.

Another way to correct for the lack of homoscedasticity is to transform the data to stabilize the variance, typically when the sample histogram is asymmetric and the local variance of data is related to their local mean (proportional effect). A frequent transform consists of taking the logarithm of strictly positive measurements, y(u_i) = ln z(u_i). The problem lies in the back-transform of the estimated value y^*(u) to retrieve the estimate of the original variable z at u. For example, the unbiased back-transform of the simple lognormal kriging estimate is: z^*(u) = exp [ y^*(u) + s^2_{SK}(u)/2 ] where s^2_{SK}(u) is the simple lognormal kriging variance. Because of the exponentiation of both kriging estimate and variance, final results are very sensitive to the semivariogram model, in particular the kriging variance!

Thus, to answer your question, only under peculiar conditions is the kriging variance informative for prediction error and sampling strategy. It will just tell you to take additional samples where the sampling density is low. You don't need geostat for that!

As rightly stated by Tom, a cross-validation approach is a useful approach to assess the magnitude and spatial distribution of prediction errors. Several authors like Wackernagel (1995, p. 80) have also proposed to compare the magnitude of square predicted errors obtained through cross validation with the kriging variance to assess the adequacy of the model. In other words, for n re-estimated data:

1/2 \sum_i=1^n [(z(u_i)-z^*(u_i)]^2/s^2_{SK}(u)

should be close to 1. Cross validation can also be used to select a subset of monitoring stations if one can not afford taking measurements over the global network in the future. Just drop the sites that can be easily predicted from the other ones (smaller prediction errors).

Cross validation has also its limitations. For example, it can not be used to derive the location of additional monitoring sites. For that type of applications, I would use a simulation approach whereby a set of realizations, which reproduce the distribution and pattern of variability of the data, is generated conditionally to the measurements. For example, the generation of 100 realizations would provide, at each grid node, a distribution of 100 simulated values, the spread of which could be used as a measure of uncertainty. In other words, locate your station where the uncertainty is the largest (widest distribution of simulated values).

To conclude, let me quote the last paragraph of my book: "As a last remark, beware that uncertainty is not intrinsic to the phenomenon under study: rather it arises from our imperfect knowledge of that phenomenon, it is data-dependent and most importantly model-dependent, that model specifying our prior concept (decisions) about the phenomenon. No model, hence no uncertainty measure, can ever be objective: the point is to accept that limitation and document clearly all aspects of the model."

REFERENCE:
 
Armstrong, M., 1994. Is research in mining geostats as dead
as a dodo? In: Dimitrakopoulos, R. (Ed.), Geostatistics for The Next
Century. Kluwer Academic Publishers, Dordrecht, pp. 303--312.
 
Goovaerts, P., 1997. Geostatistics for Natural Resources Evaluation.
Oxford Univ. Press, New-York, 512 pp.
 
Isaaks, E.H., Srivastava, R.M., 1989. An Introduction to
Applied Geostatistics.  Oxford Univ. Press, New York, 561~p.
 
Journel, A.G., 1993. Geostatistics: roadblocks and challenges.
In: Soares, A. (Ed.), Geostatistics Troia '92, Kluwer Academic
Publishers,
Dordrecht, pp. 213--224.

Website maintained by Andy Long. Comments appreciated.
aelon@sph.umich.edu