Assimilation of soil moisture and temperature in the GRAPES_Meso model using an ensemble Kalman filter

Soil moisture and temperature are significant variables in numerical weather prediction systems and land surface models, controlling the partitioning of moisture and energy fluxes at the surface. The ensemble Kalman filter (EnKF) is an approximation to the Kalman filter in that background error covariances are estimated from a finite ensemble of forecasts. The EnKF technique is now widely applied in data assimilation of the atmosphere, ocean and land surface. In the current GRAPES_Meso model version V4.0, the land surface soil assimilation method has not been integrated for land surface assimilation. Therefore, in this work, an EnKF has been introduced in the GRAPES_Meso model using air temperature at 2 m and the relative humidity at 2 m and its performance has been evaluated in land surface assimilation. The results show that the land surface assimilation method can effectively improve the performance skill of air temperature at 2 m and it has little effect on precipitation.

Soil moisture and temperature are significant variables in numerical weather prediction systems and land surface models, controlling the partitioning of moisture and energy fluxes at the surface. The ensemble Kalman filter (EnKF) is an approximation to the Kalman filter in that background error covariances are estimated from a finite ensemble of forecasts. The EnKF technique is now widely applied in data assimilation of the atmosphere, ocean and land surface. In the current GRAPE-S_Meso model version V4.0, the land surface soil assimilation method has not been integrated for land surface assimilation. Therefore, in this work, an EnKF has been introduced in the GRAPES_Meso model using air temperature at 2 m and the relative humidity at 2 m and its performance has been evaluated in land surface assimilation. The results show that the land surface assimilation method can effectively improve the performance skill of air temperature at 2 m and it has little effect on precipitation. Soil moisture and temperature are significant variables in numerical weather prediction (NWP) systems and land surface models, controlling the partitioning of moisture and energy fluxes at the surface (Paegle et al., 1996;Reichle et al., 2002;Drusch and Viterbo, 2007;Tian and Xie, 2008;Douville, 2010;Carrera et al., 2015;Wang et al., 2017aWang et al., , 2017b. The optimum interpolation (OI) method proposed by Mahfouf (1991) takes into account observation and forecast error statistics in an objective way, in order to correct soil prognostic variables (soil temperature and water content) using short-range forecast errors of screen level parameters. Giard and Bazile (2000) show that the impact of the analysis scheme for soil moisture and temperature is a clear improvement in the forecast fields; it is based on an optimum interpolation using 2 m observations of temperature and relative humidity. The co-efficients of the OI method can be calculated from a set of single column Monte Carlo experiments.
The analytical expressions of Douville et al., (2010) show that the soil analysis should be driven by new observations. Although the OI method has been widely used operationally in NWP models, the major drawbacks are that it is not flexible enough to account for new observation types easily and the screen level forecast errors are not always caused by initial errors in soil variables (Mahfouf et al., 2009). Evensen (1994) first introduced the ensemble Kalman filter (EnKF) using an ensemble of model trajectories. The EnKF is based on ensemble prediction, which derives a background error covariance matrix through a set of ensemble forecasts using the Monte Carlo method. That is, the covariance between state variables and observed variables is estimated from ensemble prediction and the observed data and covariances are used to update the ensemble prediction results through the Kalman filtering equation. The applications of the EnKF are numerous (Evensen, 1994(Evensen, , 2003Mitchell and Houtekamer, 2000;Keppenne and Rienecker, 2003), because it overcomes the limitations of the standard Kalman filter and extended Kalman filter (EKF), such as no derivation of a tangent linear operator and adjoint equations and no integration backward in time. The key feature of the EnKF is that it does not require linearized model operators or observation operators but has large computation requirements. Most of the land surface models are single column models (Chen and Dudhia, 2001) so they contain fewer variables than atmospheric models. The EnKF can achieve high-efficiency computation in land surface assimilation. The EnKF technique is now widely applied in data assimilation for the atmosphere, ocean and land surface (Belair et al., 2009;Carrera et al., 2015Carrera et al., , 2016. Many scientists have compared the EnKF with other methods of assimilation and further proved its effectiveness and superiority (Houtekamer and Mitchell, 1998;Lermusiaux, 1999;Hamill et al., 2001;Madsen et al., 2015). Miller et al. (1999) compared the EnKF with nonlinear filters and the EKF and found that it performed well. Madsen and Canizares (1999) found good agreement in implementation between the EnKF and the EKF. Keppenne (2000) applied the EnKF in a two-layer ISBA model to assimilate synthetic altimetry data and found that the modest ensemble size could be used to get realistic solutions. Bishop et al. (2001) implemented the EnKF in an observation system, did a set of experiments and found that the deployment could be optimized to minimize the forecast errors in a selected region.
In the China Meteorological Administration, the Global/ Regional Assimilation and Prediction System (GRAPES) (Chen and Shen, 2006;Xue and Chen, 2008) is an important operational NWP system. A preliminary study (Wu et al., 2005) showed that application of the GRAPES model meets the requirement for sustainable development of the numerical prediction system of China. Nowadays, the GRAPES model has been expanded to applications in various fields such as GRAPES_Meso for mesoscale weather prediction (Liu et al., 2015), GRAPES_GFS (Wang et al., 2017a(Wang et al., , 2017b for global weather prediction, GRAPES_TCM (Yao, 2013) for typhoon prediction, GRAPES_SDM (Duan et al., 2016) for sandstorm forecast, and so on. The 3D variational data assimilation system in GRAPES_Meso only does atmospheric data assimilation, not including land surface data assimilation. In the current GRAPES_Meso model, the initial field of soil moisture and temperature is driven by the National Centers for Environmental Prediction (NCEP) model or GRAPES_GFS, and the land surface assimilation method has not been introduced. Furthermore, there is still a distance between the soil temperature and water content data of the NCEP model or model T639 and the observation data. Also, soil observation is used in the GRAPES model, and the demand of the land surface model cannot be satisfied in time and space. Therefore, the performance of EnKF for analysing the soil variables is evaluated in order to study the effect of the improved soil variables on air temperature at 2 m and precipitation.

| BRIEF DESCRIPTION OF THE NOAH LAND SURFACE MODEL
The land surface model in GRAPES_Meso is Noah (Chen and Dudhia, 2001) in which there are four soil layers. The soil depths from the first soil layer to the fourth soil layer are 0.1, 0.3, 0.6 and 1 m, respectively. The ground heat flux is controlled by the usual diffusion equation: where K t is the thermal conductivity, Θ is the fraction of unit soil volume occupied by water, C is the volumetric heat capacity, T is the soil temperature, t is time and z is the depth of the soil layer.
The volumetric soil water content in the Noah model is as follows: where D is the soil water diffusivity, K is the hydraulic conductivity and F Θ represents sources and sinks in the soil, such as precipitation, evaporation. From Equation 2, the soil water content of each layer can be obtained: where d i is the ith soil layer thickness, P d is the precipitation, R is the surface runoff, E dir is direct evaporation from the top shallow soil layer and E i is the evaporation of the ith soil layer.

| Instruction of EnKF equation
In this study, the control vector x (dimension N x ) represents the prognostic equation of the land surface model Noah M that evolves with time as: Therefore, N x = 6 and x = (w i , t i ). They are the soil water content and temperature in the upper three levels. The model error term φ and the time step Δt are introduced in Section 3.2.1.
For this study, the control vector x contains the six main prognostic variables of the Noah model: soil water content (w i = 1, 2, 3) and temperature (w i = 1, 2, 3) in the upper three soil levels. The observation vector contains the screen level air temperature at 2 m (T2m) and the relative humidity at 2 m (RH2m). In the land surface analysis system, the increment of air temperature and relative humidity at 2 m is used to analyse the variation of the soil temperature and water content in the root zone, which is the upper three layers in the Noah model.
An observation operator H allows the model counterpart of the observations to be obtained: In this study, H can be an interpolation scheme for T2m and RH2m.
An advantage of this approach is that there is no need to obtain the various matrices entering into the computation of the Kalman gain explicitly. BH T and HBH T are obtained from an ensemble of predictions (x f ) i with i varying between 1 and N, written as follows: where B is the background error covariance matrix.

| Ensemble spread 3.2.1 | Model error
The model errors could account for uncertainties, most of which come from forcing data of the land surface model. In this study, the aim was to analyse soil moisture and temperature with the EnKF using the forecast and observed errors. The model errors are represented as unknown fluxes in the near-surface soil moisture and temperature equation. The prognostic variables x can be modified to add a model error term φ (Evensen, 2003), defined by: where ε w is a sequence of white noise drawn from a distribution of smooth pseudorandom fields with mean equal to 0 and variance equal to 1. v is the co-efficient determining the time decorrelation of the stochastic forcing, which should be related to the time step and a specified time decorrelation length τ.

| Inflation factor
To avoid an underestimation of the analysis error covariance matrix, Anderson and Anderson (1999) introduced an inflation factor to evaluate the impact of the ensemble size on noise. Hamill et al. (2001) evaluated the impact of the inflation factor on localizing the background covariance. After each analysis cycle an inflation factor α leads to a redefinition of the various analysed members according to: with α slightly greater than 1. The purpose is to account for a slight under-representation of variance due to the use of a small ensemble.

| EXPERIMENTS AND ANALYSIS
Experiments were run from June 21 to June 28, 2013, 0000 UTC, in order to compare the results of the analysis and the original data. The predictability time is 24 hr for the GRAPES_Meso model within the domain 15-64.5 N, 70-145.3 E at 0000 UTC daily. The T2m and RH2m observation data in a grid are from the China Meteorological Administration using the inverse distance weighting method and observation data from 2,500 stations. The default physical process parameterization schemes of the GRAPES_Meso model (WSM6 microphysical process parameterization schemes, RRTM long wave radiation scheme and Dudhia short wave radiation scheme, Monin-Obukhov surface layer scheme, Noah land surface process scheme, MRF planetary boundary layer scheme and Kain-Fritsch eta cumulus convection scheme) were used in the experiments. The experiments were conducted with the initial fields and lateral boundaries provided with NCEP forecast fields with 1 × 1 resolution driven by the GRAPES_Meso model with 10 × 10 km horizontal resolution. The GRAPES_Meso model parameter setup follows Wang (2010) and the Noah parameter setup is from Chen and Dudhia (2001). The observation data of nearly 2,500 stations were used to verify the root mean square error of T2m.
To compare the results, the experiments were divided into two types: 1. the EnKF was the control experiment, using the EnKF to analyse soil water content and temperature;

OL was an open-loop experiment, not using a soil analysis method.
The experiments were to couple the land surface assimilation system with the GRAPES_Meso model and do the land surface assimilation every 6 hr which is time for the soil variables to feed back information to the atmosphere (Mahfouf, 1991), in order to calculate the analysed increments of soil temperature and soil water content and return them to the land surface model (Figure 1).

| Initial perturbed distribution
The initial data are randomly generated and follow a Gaussian distribution. The number of members determines the scale and distribution of the ensemble data. As Figure 2 shows, the horizontal co-ordinate is the distribution of the member and the vertical co-ordinate is the number of points. This disturbance selects from 10 to 100 members. The distributions of 50 members and 100 members are similar, and the distribution of 20 members is better than that of 10 members. However, considering the time and space cost, 20 members were selected in this work.

| Inflation factor
The soil moisture ensemble filtering problem has certain distinctive aspects in that the perturbations in the initial conditions tend to die out after a certain time. So, the criterion of the inflation factor is that the perturbations cannot die out and the value of the inflation factor is selected to be as small as possible. The inflation factor can keep a suitable spread scale of the ensemble members. The natures are different between the first soil layer and the other soil layers; the memory of the first soil level is only a few hours and its initial conditions have little impact on longer-term estimates. In this study, inflation factor experiments were carried out for soil temperature and water content in different soil layers.
The effect of different inflation factors on the variables in each layer of soil (soil water content and temperature) are discussed in order to find a reasonable inflation factor for each soil layer. Mahfouf (1991) through 48 hr experiments found that, when the relationship between soil and atmosphere is relatively weak, the error of the 2 m air temperature and relative humidity do not always contain all the information of the soil. So, meteorological conditions are selected during the experimental time in order to eliminate the conditions when the relationship is weak between land surface and atmosphere, such as heavy precipitation, snow, strong wind, and low air temperature at 2 m. The results for the deep soil, i.e. the second and the third soil layers, are similar, and so only the results of the first and second layers are shown. Figure 3 shows the temporal variations of the standard deviation in the upper two levels when the initial time is June 21, 2013, 0000 UTC. Figure 3a,b shows the results for the soil water content in the upper two soil layers. The first soil layer is more sensitive to the atmosphere. As shown in Figure 3a, when the inflation factor is less than 10.4, the standard deviation of the soil water content decreases with time. Figure 3c shows that the soil temperature standard deviation of the first soil layer has a diurnal variation, which is due to the diurnal variation of the surface air temperature. Therefore, a suitable standard deviation can be obtained if the inflation factor is 1.01. As the second layer is affected less by the near-surface atmosphere, the result of the second soil layer shows an increase every 6 hr (see Figure 3b,d). From the graph it can be seen that the result is reasonable when the inflation factor is 1.01.

| Air temperature and relative humidity in 2 m
From Figures 4 and 5, the temporal variations of the root mean square error of T2m and RH2m per hour, it can be seen that the root mean square error in every experimental result has a diurnal variation. The daytime error of T2m is bigger than the error at night, and the mean square error of T2m in land surface assimilation analysis is smaller than that of the OL experiment. For RH2m, the daytime error is smaller than that at night which is opposite to the distribution of T2m. As the statistical results of Figure 4 and Table 1 show, because the soil analysis is started in the first 6 hr of the forecasting time, each experiment has the same results. The mean result for 24 hr is less than the mean result of the sixth hour. At the beginning of the second period of land surface analysis (12 hr), the T2m results of the experiments are distinctly different. From the statistical results of Table 1, it can be seen that the error of T2m analysis from 6 to 24 hr decreases gradually compared with the error without using land surface analysis (OL experiment), because the land surface model could moderate the forecasting results by itself.
The result of the EnKF is better than the OL method, and the root mean square error of T2m at 24 hr reaches 2.54 K, which improves the 0.38 K result. This means that after the   third period of land surface analysis the impact of the land surface data assimilation becomes more significant. For RH2m, the improvement effect is not obvious ( Figure 5).

| Precipitation
There are three periods of heavy precipitation in this experimental time: June 21-26 from North China to the Huai River region; June 22-26 in the southern part of Shaanxi province, the middle and northern part of Hubei province and the middle and lower reaches of the Yangtze River region; and June 27-28 from the southwest region to the south of the Yangtze River region and the northern part of China. In this paper, the threat scores (TS) method was employed to evaluate the performance of the precipitation forecasts. TS ranges from 0 to 1, with 0 indicating no skill and 1 perfect. From the TS score and bias (omitted) it can be seen that the results of the two experiments are basically the same. The land surface analysis experiment is based on the relationship between the error of soil water content and temperature and the error of T2m and RH2m, to modify the soil water content and temperature. In the NWP model, it is a complicated process from soil water content to precipitation. In this experimental time, there are three precipitation processes in most parts of the domain. Therefore, it is difficult to determine the effect on the precipitation in the current results. More experiments should be done for comparative analysis.

| CONCLUSIONS
The land surface analysis method can effectively improve the accuracy of the model for air temperature at 2 m which has been verified by many scientists (Mahfouf, 1991(Mahfouf, , 2011Reichle et al., 2002;Mahfouf and Bilodeau, 2010;Draper et al., 2011). In order to compare the results of the ensemble Kalman filter (EnKF) method better with the open-loop method, an 8 day experiment was carried out. From the analysis results, it is found that the land surface assimilation method can effectively improve the precision of T2m and has little effect on precipitation. In future studies the influence on precipitation of the EnKF land surface assimilation method should be validated for more case studies. Many experiments (Seuffert et al., 2003(Seuffert et al., , 2004Balsamo et al., 2006;Carrera et al., 2015) have proved that the EnKF land surface assimilation method has a better result on T2m simulation if it can make use of more observation data. The Canadian Land Data Assimilation System (Carrera et al., 2015) uses the EnKF to assimilate L-band microwave brightness temperature by coupling the land surface with a microwave radiative transfer model. The next step is to study the effect of the EnKF by combining the brightness temperature data with the nearsurface observation data and dew point temperature.