Evaluating nonlinear maximum likelihood optimal estimation uncertainty in cloud and aerosol remote sensing

Uncertainty estimates are important when retrieving properties of clouds and aerosols from satellites measurements. These measurements must be interpreted using a form of inverse theory, such as optimal estimation. In atmospheric remote sensing these inverse methods often assume that the forward model is linear in the region of uncertainty. This assumption is not necessarily valid. This paper presents an exact confidence procedure in contrast to the linear approximation using a maximum likelihood estimator. Two simple examples of retrieving the effective radius and optical depth of a volcanic ash cloud and water cloud show a discrepancy between the linear approximation and the exact procedure. The exact procedure is especially useful for inference where the entire parameter space has been forward modelled prior to or during the retrieval, such as using look up tables. When the inference method calculates the likelihood over the whole parameter space, it is less computationally expensive than a linear approximation.


| INTRODUCTION
Statistical methods make it possible to detect and quantify parameters of interest describing clouds and aerosols in the atmosphere from observations (e.g., Mackie et al., 2010;Poulsen et al., 2012;Pavolonis et al., 2013;Western et al., 2018). An example of quantifying a parameter could be inferring the abundance of an aerosol from measurements of infrared brightness temperature or reflectance of solar radiance using inverse methods. The method relies on a forward model that uses input parameters to simulate the observation (see for example, Rodgers, 2000). The inverse method finds the inputs for which the simulated observations most closely match the actual observations according to some metric and ideally some uncertainty in this estimate. A simple metric of closeness is the root-mean-squared error, but could similarly be found by using a statistical model and finding the maximum likelihood estimate. One approach forwardmodels the entire parameter space before any retrieval takes place, which is sometimes called a look-up table; see, for example, Nakajima and King (1990) on stratiform cloud layers, which is used routinely (e.g., Benas et al., 2019;Peers et al., 2019). Uncertainties in the maximum likelihood estimate are often ignored; where they are not, they are usually approximated using a linearization of the forward model, which gives rise to elliptical uncertainty regions. This paper shows that for maximum likelihood estimation there exists an exact confidence procedure to quantify uncertainty that does not necessarily produce elliptical regions. We compare the exact confidence region to the elliptical approximation in two simple examples: the retrieval of the effective radius and optical depth of a volcanic ash cloud and water cloud. For the given examples, if the exact confidence region is not elliptical this shows that a linear assumption is insufficient in describing the region of uncertainty.

| UNCERTAINTY REGIONS
Nonlinear optimal estimation in aerosol and cloud remote sensing, in many cases, relies on a cost function to retrieve n parameters x from m observations y. This needs a forward model F(Á) to simulate the observations y = F(x) + ϵ, where ϵ is the model-measurement error.
We treat this as a statistical model by providing a statistical description of the distribution of ϵ given x. The standard statistical model is that ϵ is Normal with expectation zero and (specified) variance S ϵ , independently of x. Under this statistical model the likelihood function L(Á) for x, where we maximize the probability associated with observations y, has the form where c is a constant that does not depend on x. The likelihood function evaluates the probability of x when the true observation is y. In a "less statistical" approach, the right-hand side is interpreted as a cost function to be minimized in x. Minimizing the cost function is equivalent to finding the maximum likelihood estimate,x. The advantage of a statistical approach is that it provides an assessment of uncertainty about x, as we now describe. A set estimator for x is a function y ↦ C(y) where y are the observations, and C(y) is a set of possible values for x. C is a "level (1 − α) confidence procedure" exactly when for all x, where y is treated as a vector of random quantities, parameterized by x. The probability on the left-hand side is termed the "coverage" of C at x. C is "exact" if its coverage is (1 − α) for all x; otherwise it is "conservative", meaning the coverage may be larger than (1 − α) for some x. Confidence sets (or confidence intervals for scalar quantities) are a standard approach to quantifying uncertainty. For more on confidence procedures, see Casella and Berger (2002, ch. 9); for a critical appraisal, see Morey et al. (2016). Exact confidence procedures are rare, and do not exist for most statistical models. However, in our case it is straightforward to derive an exact confidence procedure for x. The statistical model implies that where χ 2 m is the chi-squared distribution with m degrees of freedom, the number of observations (see for example, Mardia et al., 1980, Ch. 2). This converts directly into an exact level (1 − α) confidence procedure for x, where χ − 2 m is the quantile function of the chi-squared distribution with m degrees of freedom. The confidence procedure C is a level set of the likelihood function, and thus has various attractive properties, including being transformation-invariant. Transformation-invariance means that the transformed set retains its properties as a confidence procedure. For example, if x i was cloud top height in metres, and we computed a 95% confidence set for x i , and then transformed each element of the set into atmospheric pressure, then the resulting set would be a 95% confidence set for atmospheric pressure.
We are often interested in individual components of x, and therefore confidence procedure needs to be marginalized to x i . To find a (1 − α) confidence interval for x i based on y, just take the convex hull of all of the x i values in a (1 − α) confidence set C(y). We denote this confidence interval for x i as C i is typically conservative even if the original confidence procedure is exact. As we prefer exact confidence procedures to conservative ones, it is interesting that there is a special case which gives exact confidence intervals for x i . Assume that that F(Á) is an affine function of x, which we write as where a is the axis intercept of size m and K is an m × n Jacobian matrix. Treating a and K as given, this has the form of a standard regression problem, which has a maximum likelihood estimatê where The sampling distribution ofx iŝ This implies that there is an exact confidence procedure for the affine case, We can prove that Equation (7) is the same confidence procedure as Equation (3) under Equation (5), by completing the square. This confidence procedure produces elliptical confidence sets. Additionally, Equation (5) implies that there is exact confidence interval for each x i . The marginalization property of the Normal distribution implies that is an exact confidence interval, where z α/2 is the quantile function of the Normal distribution.
To summarize this section, we have a variety of approaches. Under the statistical model alone, we have an exact confidence procedure for x, given in Equation (3), but only a conservative confidence procedure for x i . But if F(Á) has the affine form in Equation (5), then we also have an exact confidence procedure for x i , Equation (8). If F(Á) is not affine, we still have the option of computing the Jacobian matrix atx, and using the affine form as a tangential approximation to F(Á). In this case, Equation (8) is an "approximately exact" confidence procedure for x i : its coverage would be exactly (1 − α) were the tangential approximation correct, but its coverage deviates from (1 − α) as the tangential approximation deviates from F(Á). The inappropriate use of the affine approximation to F(Á) creates a coverage error, in which the nominal coverage of at least (1 − α) everywhere is not guaranteed, and may well be lower for some values of x.

| COMPARISON OF UNCERTAINTY REGIONS
To compare the confidence procedures and intervals we use simple examples of retrieving two parameters in x. The SEVIRI sensor on board the Meteosat Second Generation satellite provides the observations for both examples. SEVIRI has a 3 km nadir pixel resolution at the wavelengths of interest, which increase in size with zenith angle, and has a baseline repeat cycle of fifteen minutes.
The first example retrieves the effective radius of a log-normal distribution of particles (with a fixed geometric standard deviation of 2.0) and optical depth at 10.8 μm, from an observation of a volcanic ash cloud (with an assumed fixed altitude of 5.5 km) at 1800 UTC on 6 May during the 2010 eruption of Eyjafjallajökull located at 58.09 N 10.92 W. The observations are two channels centred at 10.8 μm and 12.0 μm, and a simple forward model for volcanic ash radiative transfer (see Francis et al., 2012) using refractive indices for andesite (Pollack et al., 1973). The radiative transfer program RTTOV (Hocking et al., 2013) simulates the radiances using meteorology from the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA-Interim Global Reanalysis data set (Dee et al., 2011) on 60 model levels. The radiances are simulated with effective radius values between 0.1 and 10.0 μm and 10.8 μm optical depths between 0.1 and 1.0. Francis et al. (2012) provide the values for S ϵ .
The second example retrieves the effective radius and optical depth, valid at a reference wavelength of 0.55 μm, of a water cloud at 1300 UTC 31 October 2019 located at 12.95 S 1.86 E. The observations are the measured reflectance at 0.8 and 1.6 μm and the Discrete Ordinates Radiative Transfer Program for a Multi-Layered Plane-Parallel Medium (DISORT, Stamnes et al., 1988) simulates the reflectance values given the cloud properties and the relevant angles for the geometry of the problem. The reflectance values are simulated with effective radius values between 4 and 28 μm and at a reference 0.55 μm optical depth between 1 and 665. The combined standard deviation for the error in the measured and simulated observations is assumed to be 5% of the measured value. Figure 1 shows the confidence region at level (1 − α) = 0.95 using Equation (3), and under the assumption of an affine function using Equation (7) for the volcanic ash cloud and Figure 2 shows the same for the water cloud. Figures 1 and 2 show that a locally linear, or affine, assumption using Equation (7) has a level set error when describing the uncertainty in nonlinear optimal estimation. Instead it is preferable to use the exact conservative procedure Equation (3). For the given examples, the level set error using Equation (7) is greater for the volcanic ash cloud than the water cloud, although this will vary for each specific case. The level set error using Equation (7) will not be universal between retrievals of cloud type, nor between distinct sounding of the same cloud type. The level set error using Equation (7) instead relies on the suitability of assuming the simulated model is affine over the confidence interval. Transformation of variables, such as inferring the logarithm of variable, may reduce the coverage error; however the confidence region using Equation (7) is not transformation-invariant like the exact conservative procedure (3). This confidence procedure requires knowledge of the likelihood (or "cost") to the point that Equation (3) can be evaluated. This may be computationally expensive for iterative procedures with a large parameter space. However, the exact confidence procedure (3) may prove especially useful where the entire parameter space has been forward modelled prior to or during the retrieval, such as the use of a look-up table (e.g., Nakajima and King, 1990;Wen and Rose, 1994) or problems with a small parameter space (e.g., Eyre and Menzel, 1989;Taylor et al., 2019). In this case the exact confidence procedure negates the need to compute the Hessian matrix in Equation (7) under the assumption in Equation (5), decreasing the cost of computation, with the added benefit of a lower coverage error and transformationinvariant confidence regions.

| CONCLUSIONS
Those who quantify nonlinear optimal estimation uncertainty in cloud and aerosol remote sensing may wish to use an alternate approach to linear approximations if there is a desire to better quantify the region of uncertainty. For procedures where the computational cost of forward modelling the parameter space has already been done, and the maximum likelihood is found by calculating the likelihood for the whole parameter space, then it is both computationally favourable and has a lower coverage error to use the confidence procedure (3). The confidence set (3) for x is exact, and does not rely on a linearization approximation such as y ≈ a + Kx, or possibly y ≈ a + Kg(x), where g(Á) is a transform used to linearize the problem. The coverage of a confidence set for x derived using a linear approximation will not equal the nominal coverage. A simple way to find the error in the approximation, or to find an appropriate transform g(x), F I G U R E 1 The "cost" for the effective radius and 10.8 μm optical depth of a volcanic ash cloud from radiances measured by the SEVIRI sensor. The figure shows the 95% exact confidence region using Equation (3) (solid) and the confidence region under the assumption of an affine function using Equation (7) (dashed). The black cross indicates the maximum likelihood estimate F I G U R E 2 The "cost" for the effective radius and 0.55 μm optical depth of a water cloud from radiances measured by the SEVIRI sensor. The figure shows the 95% exact confidence region using Equation (3) (solid) and the confidence region under the assumption of an affine function using Equation (7) (dashed). The black cross indicates the maximum likelihood estimate would be to compare it with an exact confidence set, such as that in Equation (3). If accuracy of coverage is required, the linearization approximations are redundant.
This procedure requires forward-simulating observations for sets of input variables, either on a regular or variable grid. The input variables which most closely map the simulated observations to the measured observations are the maximum likelihood estimates for each parameter. The confidence interval is evaluated at a level according to the confidence required using the chisquared distribution with the degrees of freedom equal to the number of observations. For example, we may wish to produce a confidence region at the 95% level, where the retrieval consists of two observations. In this case, the input variables that evaluate to less than 5.991 (a constant following the chi-squared distribution with two degrees of freedom) using Equation (3) fall inside the confidence region. This is equivalent to taking all input variables where the "cost" of the retrieval evaluates to less than 5.991 in a less statistical approach. The minimum and maximum simulated value for each parameter that falls within this confidence region forms the confidence interval for this parameter at the required confidence level.