News | July 14, 2011

Your paper, three questions: Spatio-temporal smoothing and EM estimation for massive remote-sensing data sets

Noel Cressie (left) and Matthias Katzfuss. Credit: Amy Braverman

Sharon Ray, NASA Jet Propulsion Laboratory

The use of satellite measurements in climate studies holds the promise of new insights if those data can be efficiently exploited. However, satellite data sets typically consist of over 100,000 retrievals per day, making it almost impossible to apply traditional spatio-temporal statistical methods. In addition, daily data sets are often sparse, so interpolation becomes a must. To overcome these challenges, researchers Matthias Katzfuss and Noel Cressie make use of a spatio-temporal mixed-effects statistical model.

In a nutshell, what is your paper about?

The article describes a statistical approach to estimating mid-tropospheric carbon dioxide at any location on the globe and for any given day, based on the sparse retrievals provided by the Atmospheric Infrared Sounder (AIRS) Project carbon dioxide team. The advantage of using statistics to smooth and fill gaps in sparse data is that for each daily map of estimated carbon dioxide, we can provide an accompanying map that quantifies the uncertainty in those estimated values. Underlying both of these maps is a statistical methodology that not only takes full advantage of the spatial and temporal dependence in carbon dioxide and its measurements, but also allows extremely fast computations.

What did you find out?

We know that the AIRS Project carbon dioxide team is very interested in looking at sequences of these maps in order to understand carbon-cycle features in a more precise manner. The team already has a sequence of maps based on a spatial random effects statistical model that exploits the spatial dependence in the data. These new maps can be more precise, since they also include temporal dependence; they are based on a spatio-temporal random effects statistical model.

The article revolves around obtaining estimates of the statistical model's unknown parameters through maximum likelihood estimation. We develop a new algorithm based on Expectation-Maximization (EM) estimation, and we show that using EM estimation results in better maps than the current technology based on method-of-moments estimation.

In summary, we were able to optimally smooth the sparse daily carbon dioxide maps obtained by AIRS. Using EM-estimated parameters, we obtain a complete sequence of the daily global carbon dioxide fields and a quantification of their associated prediction uncertainties. More generally, our methodology is well suited to the analysis of many types of remote-sensing data, crucially allowing for very fast computation.

What aspect of your research might the general public find interesting?

We produce a sequence of complete maps of mid-tropospheric carbon dioxide on the globe, which can be much more visually insightful than the original sparse retrievals of carbon dioxide. By introducing the temporal dependence as well as the spatial dependence, our maps are more precise than previously produced maps. Moreover, our methodology allows us to quantify this improvement in precision.

Abstract, "Spatio-temporal smoothing and EM estimation for massive remote-sensing data sets"

The use of satellite measurements in climate studies promises many new scientific insights if those data can be efficiently exploited. Due to sparseness of daily data sets, there is a need to fill spatial gaps and to borrow strength from adjacent days. Nonetheless, these satellites are typically capable of conducting on the order of 100,000 retrievals per day, which makes it impossible to apply traditional spatio-temporal statistical methods, even in supercomputing environments. To overcome these challenges, we make use of a spatio-temporal mixed-effects model. For each massive daily data set, dimension reduction is achieved by essentially modelling the underlying process as a linear combination of spatial basis functions on the globe. The application of a dynamical autoregressive model in time, over the reduced space, allows rapid sequential computation of optimal smoothing predictions via the Kalman smoother; this is known as Fixed Rank Smoothing (FRS). The dimension-reduced mixed-effects model contains a number of unknown parameters, including covariance and propagator matrices, which describe the spatial and temporal dependence structure in the reduced-dimensional process. We take an empirical-Bayes approach to inference, which involves estimating the parameters and substituting them into the optimal predictors. Method-of-moments (MM) parameter estimation (currently used in FRS) is typically inefficient compared to maximum likelihood (ML) estimation and can result in large sampling variability. Here, we develop ML estimation via an expectation-maximization (EM) algorithm, which offers stable computation of valid estimators and makes efficient use of spatial and temporal dependence in the data. The two parameter-estimation approaches, MM and ML, are compared in a simulation study. We also apply our methodology to global satellite CO2 measurements: We optimally smooth the sparse daily CO2 maps obtained by the Atmospheric InfraRed Sounder (AIRS) instrument on the Aqua satellite; then, using FRS with EM-estimated parameters, a complete sequence of the daily global CO2 fields can be obtained, together with their associated prediction uncertainties.

About the authors

Matthias Katzfuss completed his PhD thesis in Statistics at The Ohio State University in June 2011. Together with his former advisor, Dr. Noel Cressie, he worked on hierarchical spatio-temporal modeling with application to global mapping of carbon dioxide. He is now a researcher at the University of Heidelberg in Germany. Noel Cressie received the Bachelor of Science degree with first class honors in Mathematics from the University of Western Australia and both MA and PhD degrees in Statistics from Princeton University. A Senior Lecturer at The Flinders University of South Australia and Professor of Statistics and Distinguished Professor in Liberal Arts and Sciences at Iowa State University, he is now Professor of Statistics, Distinguished Professor of Mathematical and Physical Sciences, and Director of the Program in Spatial Statistics and Environmental Statistics at The Ohio State University. Dr. Cressie is the author of around 250 refereed articles and of three books, including "Statistics for Spatio-Temporal Data," with Christopher K. Wikle, 2011 (Wiley). His research interests are in the statistical modeling and analysis of spatial and spatio-temporal data, in Bayesian and empirical-Bayesian methods, and in environmental sciences. He was awarded the Distinguished Achievement Medal of ASA's Section on Statistics and the Environment, the Twentieth Century Distinguished Service Award in Environmental Statistics, the Distinguished Scholar Award of The Ohio State University, and the 2009 Fisher Award and Lectureship from COPSS.