Bias correction to improve the skill of summer precipitation forecasts over the contiguous United States by the North American multi‐model ensemble system

Improvements in skill of summer forecasted precipitation as produced by the North American multi‐model ensemble (NMME) system over the contiguous United States (CONUS) are examined by applying a new bias correction method. The uncorrected precipitation produced by NMME hindcasts exhibits good prediction skill in fall and winter, while the spring and summer forecasts are marked with extremely poor skill. We propose a new method to correct the forecasted precipitation distribution based on skillfully predicted 2‐m air temperature (T2m) forecasts to fully exploit the stronger co‐variability that exists between precipitation and T2m in nature. The occurrence of enhanced recycled precipitation over CONUS provides an ideal situation to hone precipitation forecast skills using the T2m forecasts. The proposed bias correction is shown to successfully reduce the root mean square error in precipitation hindcasts in summer and can easily be extended to real‐time forecasts, thus providing a framework to dynamically link precipitation with other predictors besides T2m. Process understanding of the observed T2m‐precipitation relation will offer a framework for diagnosing poor model skill.

Improvements in skill of summer forecasted precipitation as produced by the North American multi-model ensemble (NMME) system over the contiguous United States (CONUS) are examined by applying a new bias correction method. The uncorrected precipitation produced by NMME hindcasts exhibits good prediction skill in fall and winter, while the spring and summer forecasts are marked with extremely poor skill. We propose a new method to correct the forecasted precipitation distribution based on skillfully predicted 2-m air temperature (T2m) forecasts to fully exploit the stronger co-variability that exists between precipitation and T2m in nature. The occurrence of enhanced recycled precipitation over CONUS provides an ideal situation to hone precipitation forecast skills using the T2m forecasts. The proposed bias correction is shown to successfully reduce the root mean square error in precipitation hindcasts in summer and can easily be extended to real-time forecasts, thus providing a framework to dynamically link precipitation with other predictors besides T2m. Process understanding of the observed T2m-precipitation relation will offer a framework for diagnosing poor model skill.

K E Y W O R D S
bias correction to improve precipitation skill in NMME seasonal precipitation forecasts, drought skill improvement over CONUS, surface air temperatureprecipitation relationship in seasonal forecasts

| INTRODUCTION
The current advances in dynamical modeling and dataassimilation techniques make it possible to produce multi-model coupled general circulation model (CGCM) seasonal forecasts, which are an excellent resource to explore the representation of regional precipitation trends. The North American multi-model ensemble (NMME) project is an effort to produce seasonal forecasts by several state-of-the-art CGCMs from the United States and Canada to provide precipitation, 2-m air-temperature (T2m), and sea-surface temperature forecasts (Kirtman et al., 2014). The skill exhibited by multi-model mean forecasts is consistently higher than any participating CGCM forecast underscoring the importance of multimodel forecasts Kirtman et al., 2014). The improved performance of the multi-model ensemble system is a consequence of the conglomeration of physics and numerics across the CGCMs that better span the complex spectrum of the solution space (Fritsch et al., 2000); Doblas-Reyes et al., 2005). Note that this success is predicated on two factors: (a) each of the participating CGCMs produces skillful forecasts and (b) a single best performing model under all circumstances and lead times cannot be readily identified.
While the NMME produced precipitation forecasts are not skillful beyond a month over land, the T2m forecasts exhibit higher skill levels beyond a season (Krakauer, 2017). Over the contiguous United States (CONUS), the NMME participating CCGMs' precipitation skill varies widely across regions and in general the skill rapidly decreases beyond a month (Slater et al., 2016). However, as drought conditions refer to onset of precipitation deficit accumulated over a season and beyond, skillful seasonal accumulated precipitation forecasts are crucial for adaptation and mitigation purposes. Indeed our analysis shows that the 3-month accumulated precipitation forecasts show higher skill beyond 3-month lead times in fall and winter seasons although the precipitation forecasts targeting the summer season show a very poor skill beyond a month ( Figure 1). The focus here is to improve the precipitation skill in summer season forecasts by performing a statistical correction that can be seamlessly extended to realtime forecasts. Ultimately we would prefer to understand the processes and improve the models but an interim statistical bias correction is valuable nonetheless.
The skill of precipitation forecasts over the CONUS have been shown to improve by using sea surface temperature (SST) indices related to Pacific decadal oscillation, El Niño-Southern Oscillation (ENSO), and Atlantic multidecadal oscillation in earlier studies with some success (Madadgar et al., 2016;Zimmerman et al., 2016). However, in our study we exploit the correlation between concurrent T2m and precipitation in summer season over CONUS to effectively improve the precipitation skill. This approach to exploiting the correlation results in the dynamical consistency and skill improvement at much smaller spatial scales compared to relying on remote ocean-atmosphere teleconnections, which generally improve the forecasts over larger spatial scales.

| DATA AND METHODS
The observed precipitation data for 1982-2010 used in this study is obtained from Climate Prediction Center (CPC) unified gauge-based optimally interpolated objective analysis, which includes over 30,000 land-based stations from all over the world (Xie et al., 2007;Chen et al., 2008). The CONUS has higher gauge density compared to other station networks, and is comprised of more than 15,000 stations (Higgins et al., 2000). The original data available at 0.25 spatial resolution is interpolated to a 1 resolution to be coherent with the NMME forecast resolution. The observed T2m data for the same time period is obtained from the Global Historical Climatology Network (Fan and van den Dool, 2008), which is also interpolated to a 1 resolution from the original 0.5 spatial resolution.
This study uses the seasonal forecast data of nine lead months produced by initializing at around the beginning of each month for 1982-2010 period using eight of the NMME participating CGCMs 1 originating from Canadian and US national modeling centers and universities. The number of ensemble forecasts produced by the NMME  In each subplot, the vertical axis shows the lead month and horizontal axis shows the target forecasts for 3-month aggregated precipitation. For example, "6" on the vertical axis corresponding to "JFM" on the horizontal axis refers to the anomaly correlation as calculated when the model is initialized 6 months before January, that is, in July in the previous year. In all the subplots, 10% significant values based on two-tail Student's t-distribution are shown with solid contours and 20% significant values are shown with dashed contours models varies widely, ranging from 10 to 28, and a total of 109 ensembles are used for the analysis. NMME models report monthly global SST, precipitation, T2m, 200 hpa geopotential heights and soil moisture as part of phase I (Kirtman et al., 2014) and this study uses precipitation and T2m forecasts over CONUS. The main motivation for precipitation bias correction stems from two key observations: (a) the T2m forecasts possess significant correlation skill in JJA season, while the precipitation forecasts exhibit very poor skill beyond one lead month and (b) T2m and precipitation are significantly correlated in observations over much of CONUS in JJA. The bias correction in precipitation has been performed by quantile mapping between observed and forecasted T2m and subsequently correcting the forecast precipitation distribution based on the observed T2m-precipitation relationship. The bias correction is performed in two steps: (a) at every grid point, the observed precipitation in summer is ranked based on the T2m percentiles; (b) the forecasted precipitation is replaced with the observed precipitation associated with observed T2m quantile, which exactly corresponds to that of forecasted T2m. This can simply be expressed as: where T and P represent the T2m and the precipitation, the subscripts y, m, F, and O represent year, month, forecast, and observation, respectively; % represents percentile value, and "modified" refers to the bias-corrected precipitation.
The key feature of the proposed bias correction is to replicate the observed distribution mapping found between T2m and precipitation in NMME mean forecasts.

| RESULTS
Though the precipitation forecast for any individual month is not skillfully forecasted beyond 2 months, the 3-month aggregated precipitation is more skillfully predicted much beyond 6 lead months for fall and winter forecasts ( Figure 1). While the precipitation forecasts for individual months in spring exhibit no skill, the 3-month aggregated precipitation forecasts that include the spring month show significant skill. Another encouraging part of this evaluation is that the multi-model mean exhibits a higher skill compared to any individual model ( Figure 1i); the aggregated precipitation in winter season is predictable up to 6 lead months, which is as far back as the models are initialized in the summer season. The root mean square error (RMSE) is consistently smaller over a larger part of the CONUS for the multi-model ensemble mean than any individual model forecasts (not shown). The striking feature of Figure 1 is that the 3-month aggregated forecasts involving May to September forecasted months have virtually no skill over much of CONUS, which would obviously impact the skill in capturing the severity and extension of drought events extending into the summer.
To understand the spatial distribution of NMME ensemble mean forecasted precipitation skill in summer, the JJA forecasts produced by initializing at different months are analyzed. The skill of JJA-aggregated precipitation forecast is mostly confined to parts of northwestern CONUS (Figure 2). The JJA forecasts initialized in the winter months (December-March) show some skill over central United States and Texas mainly attributable to the ENSO impact . In the June-initialized forecasts the skill extends over to the Ohio Valley and Southeast regions. The prediction skill does not exist in the JJA precipitation forecasts over much of the swath extending from south-southeast to the northeast regions when the models are initialized earlier than June.
The 3-month aggregated T2m forecasts on the other hand show a much higher skill in the spring and summer seasons beyond 3 lead months for every participating model except the GFDL_FLOR model, which shows weak yet significant correlations (Figure 3). The multi-model ensemble shows much higher correlations for the summer season compared to individual models. It is interesting to note that T2m summer forecasts exhibit statistically significant skill even beyond 6 lead months while the summer precipitation forecasts show virtually no skill (compare Figures 1i and 3i).
Over the CONUS, the correlations between the 3-month aggregated observed T2m and precipitation show a rich variability and smooth transition from positive to negative across seasons (Figure 4). In the eastern half, they are positively correlated over the Ohio Valley and Appalachians in late fall and winter seasons with positive correlations extending to the northern tip of the northeast. Positive correlations are also seen to the west of the northern Rockies and plains covering much of northwest CONUS and northern California in the winter. At the beginning of spring, positive correlations turn to negative between T2m and precipitation as the increased solar irradiance leads to dry conditions over much of the CONUS (Zhao and Khalil, 1993); Trenberth and Shea, 2005) and this negative relationship strengthens through the middle of fall (Figure 4i) with a peak in summer, where the negative correlations prevail all over the CONUS except in the southwest and upper Midwest regions (Figure 4f ). In particular, the negative co-variability between precipitation and T2m is strongest over the middle of CONUS covering Northern Rockies and Plains down to the upper part of the southern region. Stronger negative correlations over the CONUS in summer combined with the significantly higher T2m forecast skills are the basis of the precipitation bias correction proposed in this novel approach.
The difference in the RMSE between the uncorrected and bias-corrected JJA forecasts produced by initializing the NMME models from earlier forecast months from December shows that the RMSE is reduced significantly by the bias correction ( Figure 5). The reduction in RMSE widely varies across the CONUS with particularly larger Differences in precipitation root mean square error (mm/day) between the uncorrected and bias-corrected forecasts for the target forecast season JJA as produced by initializing the NMME participating models with from December to June months as depicted in subplots (a)-(g), respectively. Significant mean square error difference between the uncorrected and corrected forecasts at 5% significant level based on t-test is indicated by stippling negative correlations between precipitation and T2m ( Figure 4). The larger spatial extent of RMSE reduction in the Great Plains suggests that precipitation not produced by the models, likely due to the poor representation of enhanced land-atmosphere interactions in summer, is much better captured by the correction scheme. The hotspots of enhanced land-atmosphere interactions over central CONUS are extensively discussed by Koster et al. (2004), and the proposed methodology corrects the model precipitation biases in those regions for all the initialized forecasts. The bias correction scheme strongly reduces the RMSEs, especially in the earlier initializations (e.g., December-to Aprilinitialized forecasts compared to May-and June-initialized forecasts), and this provides a base for enhancing the drought prediction skill in the long-lead forecasts extending into summer. We also note that there are areas that are negatively impacted by the bias correction; for example, over Texas, the bias correction results in slightly higher RMSEs for December-to March-initialized forecasts ( Figure 5). However, the RMSE increase in this and other regions (e.g., the Ohio Valley) are relatively small compared to the RSME reductions over the rest of the CONUS and statistically insignificant.

| DISCUSSIONS AND CONCLUSIONS
This study proposes a bias correction for boreal summer precipitation hindcasts produced by NMME over CONUS. Bias correction is applied on the hindcasts of aggregated JJA, which are produced by initializing the NMME system in December to June of 1982-2010, and the proposed bias correction methodology is easily extendable to real-time forecasts as the RMSE improvements in the corrected forecasts only differ by 3% when the method is applied in simple and cross validated modes.
The main motivation for this bias correction stems from two key observations: (a) the seasonal precipitation forecasts produced by the NMME system have very poor skill in JJA but the T2m forecasts display a significant correlation skill, and (b) T2m and precipitation are significantly correlated in nature over much of the CONUS in JJA. To this end, the central idea of the precipitation bias correction is to hone the skill of precipitation forecasts by replicating the observed T2m-precipitation distribution co-variability in the NMME forecasts. The bias correction can simply be implemented a posteriori in the forecasts by first quantile mapping between observed and forecasted T2m and subsequently correcting the forecasted precipitation distribution based on the observed T2m-precipitation relationship.
The advantages of bias correction based on quantile mapping are well known over simple mean correction. Though bias correction based on adjusting the mean bias is effective in reducing systematic mean bias for continuous variables, such as SST (Narapusetty et al., 2014), it does not correct the other statistical moments that impose limitations on reducing biases effectively for stochastic variables, such as precipitation. This bias correction is in principle very similar to the statistical bias correction based on bias correction and spatial disaggregation method (BCSD; Wood et al., 2004;Maurer et al., 2014) except for a key difference. Unlike the BCSD, the bias correction method proposed here does not directly quantile map the forecasted precipitations to observations. As the precipitation forecasts in JJA possess an extremely low skill (Figure 1) over the CONUS, a direct mapping of the forecasted precipitation distribution will not generally lead to skill improvement. We compared the RMSE improvements imparted using the method proposed in this study with the BCSD-produced correction (not shown) and found that both methods produce comparable results in the western half of the CONUS. However, our method is more skillful in the eastern half of the CONUS where the precipitation forecast skill is low because this region is controlled by an enhanced precipitation recycling in summer (Anderson et al., 2009) and thus the T2m-based correction yields more realistic precipitation. Note that this study shows the RMSE reduction by employing the correction method. We found that the improvement in the skill of corrected precipitation based on anomaly correlation is marginal. In principle, the anomaly correlation between observed and corrected precipitation is expected to improve. However, as the bias correction relies on the existence of T2m-precipitation co-variability in observations and the NMME forecasts also replicate the T2mprecipitation co-variability (not shown), the resulting T2mbased precipitation correction does not yield further improvements in anomaly correlation. However, we note that if this correction is applied over a geographical region where the NMME forecasts would not produce expected covariability between T2m and precipitation, this scheme could improve the anomaly correlations as well besides the magnitude as found over CONUS. Such an application is beyond the scope of this manuscript and the analysis is underway and the results of that study will be published elsewhere.
The efficacy of the bias correction method illustrated in our study is not based on the assumption that T2m drives the precipitation. The objective of the study is not to find the drivers of the precipitation; rather we want to maximize precipitation forecast skills based on the variables which are skillfully produced by CGCM forecasts. The bias correction proposed in this study relies on the skillfully forecasted T2m as predictor and corrects the precipitation distribution based on the strong T2m-precipitation co-variabilities that exist in nature. Note that the trend of T2m in nature has very minimal effect, if any, on the trend of corrected seasonal precipitation forecast since we are only correcting precipitation for one season at a time.
This method also suggests that a framework to hone the precipitation forecast skill is possible based on the precipitation co-variability with any other dynamical field besides T2m provided that the dynamical field is skillfully forecasted by NMME. For example, the more tropical nature of the precipitation over the CONUS regions such as the Southeast CONUS may yield a vertical velocity or outgoing longwave radiation-based bias correction, while additional soil-moisture-based skill enhancement maybe possible over the CONUS regions where recycled precipitation is significant. In this study, the bias correction is performed on the ensemble-mean forecasts; therefore this analysis cannot be used to diagnose the improvements in ensemble forecasts by employing statistics such as reliability and ranked histograms. This study also does not distinguish the contributions of precipitation skill improvements from ENSO and ENSO-neutral years individually. Such an in-depth analysis is underway and will be reported elsewhere in the future. It is clear however that the model biases can be diagnosed in the context of missing observed correlations of model precipitation with other appropriate model variables that are captured more skillfully.