Noel Cressie (left) and Matthias Katzfuss. Credit: Amy Braverman

Noel Cressie (left) and Matthias Katzfuss. Credit: Amy Braverman

Sharon Ray, NASA Jet Propulsion Laboratory

The use of satellite measurements in climate studies holds the promise of new insights if those data can be efficiently exploited. However, satellite data sets typically consist of over 100,000 retrievals per day, making it almost impossible to apply traditional spatio-temporal statistical methods. In addition, daily data sets are often sparse, so interpolation becomes a must. To overcome these challenges, researchers Matthias Katzfuss and Noel Cressie make use of a spatio-temporal mixed-effects statistical model.

In a nutshell, what is your paper about?

The article describes a statistical approach to estimating mid-tropospheric carbon dioxide at any location on the globe and for any given day, based on the sparse retrievals provided by the Atmospheric Infrared Sounder (AIRS) Project carbon dioxide team. The advantage of using statistics to smooth and fill gaps in sparse data is that for each daily map of estimated carbon dioxide, we can provide an accompanying map that quantifies the uncertainty in those estimated values. Underlying both of these maps is a statistical methodology that not only takes full advantage of the spatial and temporal dependence in carbon dioxide and its measurements, but also allows extremely fast computations.

What did you find out?

We know that the AIRS Project carbon dioxide team is very interested in looking at sequences of these maps in order to understand carbon-cycle features in a more precise manner. The team already has a sequence of maps based on a spatial random effects statistical model that exploits the spatial dependence in the data. These new maps can be more precise, since they also include temporal dependence; they are based on a spatio-temporal random effects statistical model.

The article revolves around obtaining estimates of the statistical model's unknown parameters through maximum likelihood estimation. We develop a new algorithm based on Expectation-Maximization (EM) estimation, and we show that using EM estimation results in better maps than the current technology based on method-of-moments estimation.

In summary, we were able to optimally smooth the sparse daily carbon dioxide maps obtained by AIRS. Using EM-estimated parameters, we obtain a complete sequence of the daily global carbon dioxide fields and a quantification of their associated prediction uncertainties. More generally, our methodology is well suited to the analysis of many types of remote-sensing data, crucially allowing for very fast computation.

What aspect of your research might the general public find interesting?

We produce a sequence of complete maps of mid-tropospheric carbon dioxide on the globe, which can be much more visually insightful than the original sparse retrievals of carbon dioxide. By introducing the temporal dependence as well as the spatial dependence, our maps are more precise than previously produced maps. Moreover, our methodology allows us to quantify this improvement in precision.

Abstract, "Spatio-temporal smoothing and EM estimation for massive remote-sensing data sets"

The use of satellite measurements in climate studies promises many new scientific insights if those data can be efficiently exploited. Due to sparseness of daily data sets, there is a need to fill spatial gaps and to borrow strength from adjacent days. Nonetheless, these satellites are typically capable of conducting on the order of 100,000 retrievals per day, which makes it impossible to apply traditional spatio-temporal statistical methods, even in supercomputing environments. To overcome these challenges, we make use of a spatio-temporal mixed-effects model. For each massive daily data set, dimension reduction is achieved by essentially modelling the underlying process as a linear combination of spatial basis functions on the globe. The application of a dynamical autoregressive model in time, over the reduced space, allows rapid sequential computation of optimal smoothing predictions via the Kalman smoother; this is known as Fixed Rank Smoothing (FRS). The dimension-reduced mixed-effects model contains a number of unknown parameters, including covariance and propagator matrices, which describe the spatial and temporal dependence structure in the reduced-dimensional process. We take an empirical-Bayes approach to inference, which involves estimating the parameters and substituting them into the optimal predictors. Method-of-moments (MM) parameter estimation (currently used in FRS) is typically inefficient compared to maximum likelihood (ML) estimation and can result in large sampling variability. Here, we develop ML estimation via an expectation-maximization (EM) algorithm, which offers stable computation of valid estimators and makes efficient use of spatial and temporal dependence in the data. The two parameter-estimation approaches, MM and ML, are compared in a simulation study. We also apply our methodology to global satellite CO2 measurements: We optimally smooth the sparse daily CO2 maps obtained by the Atmospheric InfraRed Sounder (AIRS) instrument on the Aqua satellite; then, using FRS with EM-estimated parameters, a complete sequence of the daily global CO2 fields can be obtained, together with their associated prediction uncertainties.