Application of a convection‐permitting ensemble prediction system to quantitative precipitation forecasts over southern China: Preliminary results during SCMREX

As a preliminary attempt to cope with the low predictability of heavy rainfall over southern China in the pre‐summer rainy season, an experimental convection‐permitting ensemble prediction system (GM‐CPEPS) based on the Global/Regional Assimilation and Prediction System (GRAPES) is developed. GM‐CPEPS produces 12 h forecasts at 0.03° horizontal resolution based on 16 perturbed members. Perturbations from downscaling, ensemble of data assimilation, time‐lagged scheme and topography are combined to generate the initial perturbations. Sea‐surface temperature is perturbed and a combination of downscaling and balanced random perturbations is used to perturb the lateral boundary conditions. Stochastically perturbed parametrization tendencies, multi‐physics, and perturbed parameters are all implemented.


Funding information
As a preliminary attempt to cope with the low predictability of heavy rainfall over southern China in the pre-summer rainy season, an experimental convection-permitting ensemble prediction system (GM-CPEPS) based on the Global/Regional Assimilation and Prediction System (GRAPES) is developed. GM-CPEPS produces 12 h forecasts at 0.03 • horizontal resolution based on 16 perturbed members. Perturbations from downscaling, ensemble of data assimilation, time-lagged scheme and topography are combined to generate the initial perturbations. Sea-surface temperature is perturbed and a combination of downscaling and balanced random perturbations is used to perturb the lateral boundary conditions. Stochastically perturbed parametrization tendencies, multi-physics, and perturbed parameters are all implemented. In this study, GM-CPEPS was verified over a 15-day period during the Southern China Monsoon Rainfall Experiment (SCMREX) in May 2014. It was indicated that GM-CPEPS provided estimates of forecast uncertainty that are comparable to some international peers. Compared with the control forecasts (DET), some deterministic guidance, including the forecast distribution with 90th percentile, probability-matched mean, and linear combination of both (NPM), showed advantages in forecasting moderate and heavy rainfall; and the optimal-member technique was superior in reducing bias. Probabilistic guidance demonstrated the value over DET in detecting potential threats of severe weather, with both the optimal-probability and neighbourhood-probability technique leading to improvements in predicting lighter rainfall. Two cases were used to display the deterministic and probabilistic guidance intuitively and to illustrate corresponding advantages and drawbacks.

convection-permitting, ensemble prediction system, QPF, SCMREX INTRODUCTION
In the pre-summer rainy season (April-June) in southern China, heavy rainfall occurs frequently, accounting for roughly half the annual precipitation. The maximum seasonal rainfall accumulation is over 800 mm in several areas, often causing severe flooding hereinafter L17). Mostly, precipitation during the pre-summer rainy season is of a convective nature with mesoscale organizational characteristics (Luo et al., 2013;Xia et al., 2015). Mesoscale convective systems (MCSs), and their interactions with larger-scale atmospheric systems and the complex underlying surface, are both closely related to the formation and evolution of heavy rainfall Wu and Luo, 2016;L17). Unfortunately, numerical weather prediction (NWP) models are still very poor at producing quantitative precipitation forecasts (QPFs) of heavy rainfall, especially of the warm-sector extreme rainfall over southern China during its pre-summer rainy season (L17).
Compared with NWP models with parametrized convection, convection-permitting (CP) NWP models, whose grid spacing (∼4 km or less) is fine enough to remove convective parametrization, have been shown to be beneficial for QPFs, especially for convective processes (Done et al., 2004;Weisman et al., 2008). For this reason, both operational and experimental CP NWP models for warm-season QPFs have been developed over the past decade. A CP regional model based on the Global/Regional Assimilation and Prediction System (GRAPES: Xue and Liu, 2007;Chen et al., 2008), named GRAPES-MARS3KM (Zhang et al., 2016;hereinafter Z16), has been developed recently for QPFs over southern China and has been in operational use since July 2014.
However, convective-scale forecast uncertainties due to highly nonlinear flows and rapid error growth (Lorenz, 1969;Hohenegger and Schär, 2007) lead inevitably to the low intrinsic predictability of atmospheric flows at CP scales (Zhang et al., 2003). The practical predictability is also seriously limited by deficiencies in the current data assimilation (DA) techniques and observation systems Dong et al., 2011), and by imperfections in the present dynamics and physics configuration of NWP models (Morrison et al., 2009;Schwartz et al., 2010). In particular, the large sensitivities of CP-scale QPFs over southern China to initial conditions (ICs) and physics parametrization schemes are very pronounced in the pre-summer rainy season (Wu et al., 2013;Z16;Bao et al., 2017;L17) and thus reduce the predictability.
To cope with the poor CP-scale predictability, CP ensemble prediction systems (EPSs) are currently being developed by several NWP centres (Raynaud and Bouttier, 2016), considering the advantage of EPSs in estimating forecast uncertainties (Leith, 1974;Wilks, 2006). Some of these EPSs are in operation, for example, COSMO-DE-EPS (Peralta et al., 2012), AROME-EPS (Bouttier et al., 2012), MOGREPS-UK (Hagelin et al., 2017), the Storm-Scale Ensemble Forecast system (Clark et al., 2011), and the 3 km NCAR ARW EPS (Schwartz et al., 2015). However, there is still no operational or experimental CP EPS that is well designed and maturely applied to QPFs over southern China during the pre-summer rainy season. To fill this void, an experimental CP EPS based on GRAPES-MARS3KM (hereinafter GM-CPEPS) was developed recently by the Institute of Tropical and Marine Meteorology (ITMM) of the China Meteorological Administration.
The Southern China Monsoon Rainfall Experiment (SCMREX; http://exps.camscma.cn/scmrex) is a research and development project of the World Weather Research Programme of the World Meteorological Organization. One of its objectives is to improve QPFs over southern China in the pre-summer rainy season by conducting CP ensemble-forecasting experiments (L17). The field In support of the SCMREX programme, the present study attempts (for the first time in the literature) to evaluate the QPF performance of GM-CPEPS during SCMREX. The aim is to examine the current ability of GM-CPEPS in predicting the pre-summer precipitation and thereby to prompt the development of an operational CP EPS in southern China. Moreover, the ways of generating both deterministic and probabilistic guidance are investigated to explore how to provide better guidance to forecasters. Considering the limited computing resources and the good representation of precipitation events, the evaluation, based on a batch experiment and two case-studies, was implemented only for 8-23 May 2014. The GM-CPEPS configuration is described in section 2. Sensitivity tests to illustrate the rationale behind the design of GM-CPEPS are shown in section 3. Sections 4 and 5 present the results of the batch experiment and case-studies, respectively. Section 6 concludes the article and provides further discussion.

METHODOLOGY
GM-CPEPS comprises one deterministic or control member (DET) and 16 perturbed members (PERs), all nested within a mesoscale EPS, namely GM-MSEPS. A summary of the acronyms for the components of GM-CPEPS can be found in Table 1. Aimed at improving the short-term (0-12 h) QPF skill, both GM-CPEPS and GM-MSEPS issue 12 h perturbed forecasts twice a day at 0000/1200 UTC.

The GM-MSEPS ensemble
GM-MSEPS is based on GRAPES-MARS, which is a regional NWP model ( Figure 1a) with a horizontal resolution of 0.09 • × 0.09 • and 55 vertical levels. GM-MSEPS also has 1 DET and 16 PERs. For DET, the 6 h forecasts of the 0.5 • × 0.5 • NCEP global forecast system (GFS) are used as background fields for the cold start of DA. The system used for DA and the observations assimilated in GM-MSEPS are the same as those in GM-CPEPS and will be introduced below. After a 6 h DA cycling which is started at 1800/0600 UTC per day, 24 h DET forecasts are issued with IC from DA analyses and lateral boundary conditions (LBCs) from GFS forecasts. Balanced random (BR: Barker, 2005;Meng and Zhang, 2008)   h DA cycling and 12 h forecasts. In MP, different parametrization schemes for microphysics and planetary boundary layer (PBL) processes are combined. During SCMREX, the ensemble mean (EM) of GM-MSEPS generally predicted both the pattern and evolution of rain bands, which were mainly influenced by the large-scale forcing, similar to the observed ones, and the ensemble spread can represent most of the forecast uncertainties of these rain bands (not shown).

Deterministic forecasts
The NWP model used in GM-CPEPS DET is GRAPES-MARS3KM, which is a nonhydrostatic model and adopts a semi-implicit, semi-Lagrangian scheme for temporal integration. Its horizontal grid is set on a longitude-latitude mesh with the Arakawa C-grid staggering. The GRAPES-MARS3KM domain covers southern China and the northern South China Sea (SCS) (Figure 1b),   (Hong et al., 2006), the five-layer thermal diffusion (SLAB) land surface model, the medium-range forecast (MRF) PBL scheme (Hong and Pan, 1996), the rapid radiative transfer model (RRTM) long-wave radiation scheme, and the Dudhia short-wave radiation scheme are used in the physics parametrization.
The DA system used is GRAPES-CHAF3KM (Z16), which is based on the three-dimensional variational (3D-Var) analysis system of GRAPES  along with the multigrid technique (Xie et al., 2011) and the partial cycling strategy (Hsiao et al., 2012). Data from six types of observation (radiosonde, surface station, ship, Doppler radar, wind-profiling radar and aircraft) is assimilated in GRAPES-CHAF3KM every 3 h.
As shown in Figure 2a, the flow of GM-CPEPS DET is the same as that of GM-MSEPS DET except that the LBCs of the former come from forecasts of the latter rather than from GFS. Z16 can be referred to for more details of DET.

Perturbed forecasts
The uncertainties in IC, surface, LBCs, and model physics are all considered in generating the PERs of GM-CPEPS. However, there is still no generally accepted method for perturbing these four components of a CP EPS. Recently, blending or combining approaches have been developed to consider different benefits of various perturbation methods, not only for IC Lang et al., 2012;Caron, 2013) but also for model physics (Hacker et al., 2011;Duda et al., 2014;Berner et al., 2015). Given the positive impacts of these approaches on forecasts as illustrated in the references cited above, combining perturbations from multiple uncertainty sources using some leading perturbation methods (Table 1) was used to construct GM-CPEPS to consider various uncertainties from various sources as much as possible. Each PER is defined by adding perturbations to the IC, surface, LBCs, and model physics of the unperturbed member, i.e. DET.

IC perturbations
IC perturbations are generated by a linear combination of perturbations from the downscaling (IDSC), ensemble of DA (EDA), time-lagged scheme (TLA), and topography (TO), with 1 , 2 , 3 and 4 as the corresponding weighting factors, respectively. 4 is set to 1; whereas 1 , 2 , and 3 need to be determined to avoid excessive perturbations causing numerical instability, because IDSC, EDA and TLA perturbations all show significantly larger magnitude than TO perturbations (not shown).
For IC perturbations of a CP EPS, downscaling the initial perturbations from a coarser EPS is a common and standard technique (Kühnlein et al., 2014;Tennant, 2015). Here, IDSC perturbations are calculated by subtracting EM of the 16 initial PERs of GM-MSEPS from each initial PER and then interpolated to the GRAPES-MARS3KM domain.
EDA is designed to estimate the uncertainty of analyses (Belo-Pereira and Berre, 2006;Vié et al., 2011) and is also used in GM-CPEPS. In the 6 h DA cycling of the GM-MSEPS PERs, the 3 h forecasts valid at 2100/0900 UTC are used as the perturbed background fields to create the first perturbed analyses with GRAPES-CHAF3KM. The 3 h integration of GRAPES-MARS3KM with MP and SPPT, which is described in section 2.2.6, will then provide the next perturbed background fields valid at 0000/1200 UTC to create the final perturbed analyses. The EM of these 16 perturbed analyses is subtracted from each perturbed analysis to calculate EDA perturbations.
Based on TLA, a set of successive forecasts valid at different ranges but valid at the same time can be used to reflect the IC uncertainty with time-evolving (flow-dependent) information (Mittermaier, 2007;Wang et al., 2016). The ready-made forecasts of GM-CPEPS DET are used to construct a TLA ensemble, which is also an integral part of GM-CPEPS. For TLA members with an initial time of 0000 UTC (Figure 2b), the forecasts used are the 21, 22, 23 and 24 h ones initialized at 0000 UTC on the previous day, the 9, 10, 11, 12, 13, 14 and 15 h ones initialized at 1200 UTC on the previous day, and the 1, 2 and 3 h ones initialized at 0000 UTC; the forecasts valid at 2300 UTC on the previous day and at 0000 UTC, which are initialized at 2100 UTC based on the analyses derived from the 6 h DA cycling, are also used. The order of these 16 forecasts used in constructing the 16 TLA members is random. To compute TLA perturbations, the EM of the 16 TLA members is subtracted from each TLA member.
The initialization and evolution of heavy rainfall over southern China are often influenced by the terrain . Due to weaknesses of NWP in describing real topography, the uncertainty of QPFs is associated with the uncertainty of representing topography in NWP. However, there are still few references proposing methods to perturb the topography. Here, the terrain height on land is perturbed by adding Gaussian random numbers with zero mean and a standard deviation of 300 m to the original value ( Figure 1b). The perturbation amplitude is restricted to be less than 450 m to avoid unrealistic perturbations, and the terrain height is left unperturbed if the perturbed height is negative. The value of parameters determining the perturbation amplitudes is set based on the analysis of differences in terrain height between current topography data and those with higher resolution (not shown). The 16 perturbed ICs are then generated by interpolating the IC of GM-CPEPS DET valid at 0000/1200 UTC based on the perturbed topographies, and TO perturbations are computed by subtracting the EM of these 16 ICs from each of them.
Perturbations of U, V, Π, and Q are calculated in the above four types of perturbation. and Q are left unperturbed if the perturbed Q of the IC reaches an unphysical value or exceeds a critical supersaturation value. Several further steps are conducted to determine the weighting factors for IDSC, EDA and TLA perturbations as follows.

Multivariate Empirical Orthogonal Function (MV-EOF)
analysis is used in the combination of various perturbations (i.e. U, V, Π, and Q). For the combined data at a certain vertical level, the temporal dimension used in traditional MV-EOF is replaced by the dimension of the ensemble member. 2. Eigenmodes corresponding to the leading eigenvalues, whose contributions to the total variance are above 90%, are calculated at every vertical level. 3. The eigenvalues at level k for IDSC, EDA and TLA perturbations are defined as E k 1 , E k 2 and E k 3 (k = 1, 2, · · ·, 55), respectively. The first estimation k for 1 , 2 and 3 at level k is defined as where E k max , E k min and E k tot represent the maximum, minimum and sum, respectively, of the three eigenvalues (i.e. E k 1 , E k 2 and E k 3 ). 4 The mean eigenvalues for IDSC, EDA and TLA perturbations are defined as E m 1 , E m 2 and E m 3 , respectively, by averaging E k 1 , E k 2 and E k 3 over all the vertical levels. 1 , 2 and 3 at level k are then defined as where E m max represents the maximum of the three mean eigenvalues (i.e.E m 1 , E m 2 and E m 3 ) and i denotes the tuning factor used to maintain the three perturbations' mutual balance in contribution to the total IC perturbations. Specifically, to increase the contribution of IDSC perturbations, 1 is set to 1.2 and both 2 and 3 are set to 0.3.

Surface perturbations
As a crucial aspect of monsoon precipitation, sea-surface temperature (SST) shows some influence on precipitation (Mo and Juang, 2003;Roxy, 2014), especially for the pre-summer precipitation over southern China due to the significant warm-moist air coming from the SCS (Huang and Mao, 2015;Wu and Luo, 2016). Thus, the uncertainty of SST is considered here by adding random numbers drawn from a Gaussian distribution with zero mean and a standard deviation 2.0 K to SST at the grid scale to generate SST perturbations. SST is kept unchanged during the model integration, resulting in constant forcing from the surface boundary to the atmosphere. SST perturbations are bounded within the range of ±2 (±1) standard deviations for PERs initialized at 0000 (1200) UTC, i.e. 0800 (2000) local solar time, to limit excessive perturbations. SST perturbations are allowed to be larger in daytime than at night-time, given the larger variability in the former (Gentemann et al., 2003).

LBCs perturbations
Downscaling LBCs (LDSC) perturbations are blended with BR perturbations to generate LBCs perturbations for GM-CPEPS. Introducing LDSC perturbations from a mesoscale EPS by nesting is a common way for CP EPS (Gebhardt et al., 2011;Schwartz et al., 2015), hence the forecasts of the GM-MSEPS PERs are interpolated to form the first LBCs of the GM-CPEPS PERs.
BR perturbations, which are produced by taking Gaussian random draws with zero mean and covariances from the regional background error covariances provided by the 3D-Var analysis system of GRAPES, are inflated multiplicatively as the forecast progresses (Torn, 2010;Schwartz et al., 2015). Specifically, the averaged inflation factor is roughly 1.04 (1.2) within (beyond) the 4 h forecasts. BR perturbations are then added to the first LBCs to form the final LBCs.

Model physics perturbations
A combination of SPPT, MP and Perturbed Parameters (PP: Gebhardt et al., 2011;Duda et al., 2014) schemes is used to generate model physics perturbations.
In the SPPT scheme, random perturbations are added to the total parametrized tendency of physical processes, which includes the radiation, microphysics and PBL processes, for the variables U, V, and Q. In this implementation, a random field r drawn from a Gaussian distribution with zero mean, a standard deviation of 0.5, a spatial decorrelation scale of 50 km, and a temporal decorrelation scale of 1 h is produced during model integration. Here, r is bounded within the range of ±2 standard deviations to avoid unrealistic perturbations causing numerical instability. Next, the physical tendency is multiplied at each time step by f = 1 + r to form a perturbed tendency. No tendency perturbations are applied near the surface (below about 500 m above ground) or near the model top (<50 hPa), and the perturbations in the transition layers (500-1,500 m and 100-50 hPa) are ramped up smoothly to full amplitude. Additionally, perturbations of Q are checked in a manner which is similar to the initial perturbations, to determine whether to implement tendency perturbations to both and Q.
WSM6 and the WRF Single-Moment 5-class (WSM5) microphysics scheme (Hong et al., 2004), as well as MRF and the Yonsei University (YSU: Hong et al., 2006) scheme, are used to construct four combinations of physics packages for parametrizing the microphysics and PBL processes in the MP scheme. Among the 16 perturbed ensemble members, members 1-4 use WSM6/MRF schemes, as is the case in DET, members 5-8 use WSM6/YSU schemes, members 9-12 use WSM5/MRF schemes, and the remaining members use WSM5/YSU schemes.
Both the rain intercept parameter (N 0r ) in the microphysics scheme and the critical Richardson number (Ri c ) in the PBL scheme are perturbed in the PP scheme, considering the significant sensitivities of the parametrization schemes to these parameters (Hacker et al., 2011;Baker et al., 2014). In the WSM6/MRF, WSM6/YSU, WSM5/MRF and WSM5/YSU schemes, the default setting of N 0r /Ri c is 8 × 10 6 /0.5, which is used in members 1, 5, 9 and 13; for the remaining three members of each combined scheme, N 0r /Ri c is set to 8 × 10 5 /0.5, 8 × 10 7 /0.5 and 8 × 10 6 /1.0, respectively.

Experimental design
To assess the performance of GM-CPEPS, retrospective GM-CPEPS forecasts were conducted during 8-23 May 2014, when 32 12 h forecasts were collected. The experimental period was dominated by several heavy rainfall events, with significant extreme daily rainfall (>100 mm/day) observed. As described in L17, two significant extremely heavy rainfall events, which were respectively characterized by local and large-scale MCSs and representative during the pre-summer rainy season, occurred on 8 and 22-23 May (hereinafter Cases 1 and 2, respectively). Among these events, those forecasts with initial times of 0000 UTC 8 May and 1200 UTC 10 May were selected to test the rationale behind the design of GM-CPEPS and to demonstrate the GM-CPEPS performance intuitively. Detailed descriptions of these events can be found in L17 and Huang and Luo (2017). This study focuses on the performance of both deterministic and probabilistic guidance of GM-CPEPS for QPFs (Table 1). Specifically, the deterministic guidance provided by GM-CPEPS includes rainfall from the EM, the probability-matched mean (PM: Ebert, 2001;Schwartz et al., 2014), the ensemble median (MED), the ensemble 10th, 25th, 75th and 90th percentiles (Q 10 , Q 25 , Q 75 and Q 90 , respectively), the ensemble maximum (MAX) and minimum (MIN), and the rainfall from each PER. However, the guidance based on ensemble percentiles whose thresholds are below 50th are generally not so often used in operational warning for severe weather, especially in southern China in the pre-summer rainy season. Therefore, the performances of MIN, Q 10 and Q 25 are not illustrated here. For rainfall with ensemble percentile threshold of q-th, i.e. Q q , it is calculated as follows: where a and b represent the integer and fractional portions of (M + 1)q/100 respectively and M represent the ensemble size, i.e. 16. V a + 1 and V a denote rainfall from the members ranked (a + 1)th and ath, respectively. The probabilistic guidance includes the probability of rainfall at different thresholds.
To investigate the advantages of GM-CPEPS in QPFs, its deterministic guidance is directly compared with DET, and its probabilistic guidance is compared with the binary rainfall fields of DET following Zhang (2018) (hereinafter Z18). Specifically, DET was transformed into binary fields by setting grid points exceeding a certain threshold to a value of 1, while all other points were given a value of 0. All comparisons were based on the skill scores illustrated below, which were averaged over all 32 forecasts.

Forecast verification
Verification for the forecasts of variables at the upper levels was performed using 12-hourly radiosonde observations over the model domain ( fig. 1a of Z16). Here, the forecasts were bilinearly interpolated onto the locations of radiosondes. Forecasts of precipitation and variables at the surface were verified against the surface meteorological stations and automatic weather stations (AWSs) over the verification domain (Figure 1b,c). These surface observations were interpolated onto the NWP model grids using Cressman interpolation to calculate the skill scores. The root-mean-square error (RMSE) of the forecasts was calculated for variables at the upper levels and surface. To verify the rainfall for GM-CPEPS deterministic guidance as well as DET, the fraction skill score (FSS: Roberts and Lean, 2008) was used. FSS was computed for the 1 h accumulated rainfall in consideration of the more significant "double penalties" due to the displacement error in the verification of high-resolution forecasts (Ebert, 2008). In calculating FSS, "neighbourhood length" was defined as the length of area over which the fractions were computed. Probabilistic guidance verification was achieved by computing the continuous ranked probability score (CRPS: Hersbach, 2000), the Brier score (BS) and its reliability aspect (Wilks, 2006), and the area under the relative operating characteristic curve (AROC: Mason and Graham, 2002).
For the comparison of skill scores (including RMSE, FSS, CRPS, BS, reliability and AROC), the statistical significances of the differences between different forecast sources were assessed using bootstrap resampling. Specifically, random samples of skill scores were generated with replacement, and the differences of skill scores were then calculated. This procedure was repeated 1,000 times, and bootstrapping was performed on the score differences between the paired samples. The rank at which the resampled score differences crossed zero was used as the significance level to represent the probability that two skill scores were distinct. In section 4, only situations above the 85% or 90% significance level (indicating an 85 or 90% probability that two skill scores differed) are labelled.

RATIONALE FOR GM-CPEPS DESIGN
So far, the best practice in designing a CP EPS remains unclear . Insufficient ensemble spread (or dispersion), which often causes inaccurate representation of forecast uncertainties in the ensemble forecasts, is one of the most critical problems that needs to be improved in current CP EPSs (Hohenegger et al., 2008;Gebhardt et al., 2011;Vié et al., 2011;Schwartz et al., 2014). However, methods used to improve the ensemble spread do not necessarily improve the deterministic guidance (e.g. EM) and may degrade the individual-member forecasts, thereby limiting the operational use of CP EPSs especially in forecasting high-impact events . Given the two aforementioned issues, the rationale for GM-CPEPS design is not only to increase the ensemble spread but also to increase both deterministic and probabilistic skills. For this purpose, GM-CPEPS is firstly constructed by simply incorporating perturbations from all uncertainty sources using some leading perturbation methods. The sensitivities of GM-CPEPS performance to the different perturbation methods are then tested to determine which perturbations should be included. This idea is also used in Z18 to construct a mesoscale EPS and seems to work.
In addition to the retrospective forecasts (hereinafter ALL) described in section 2.3, 10 additional comparison experiments -called noIDSC, noEDA, noTLA, noTO, noTS, noLDSC, noBR, noSPPT, noMP and noPP with IDSC, EDA, TLA, TO, TS, LDSC, BR, SPPT, MP and PP perturbations, respectively, removed from ALL -were carried out. Moreover, an experiment named CTL with the simplest perturbations (i.e. IDSC and LDSC perturbations) was designed as a benchmark in the comparison. The above ensemble forecast experiments and their respective identifiers are all listed in Table 2. These experiments were performed for Cases 1 and 2, but only the results of Case 1 are shown here because the conclusions are similar in either case.
During the whole 12 h forecasts, the EM RMSE of the 1 h accumulated rainfall in ALL was the smallest (Figure 3). The differences of RMSE among the experiments became more evident for the forecasts beyond 8 h when most heavy rainfall occurred ) than for the whole 12 h forecasts. Thus, including all the perturbations mentioned  . 9a3 of Huang and Luo, 2017), large EM RMSEs (≥10 mm) were present in the regions with rainfall above 10 mm/h for all the experiments (not shown). In these regions, both the area and magnitude of large ensemble spreads (≥10 mm) in ALL were most consistent with those of large EM RMSEs, among all the experiments (Figure 4c), whereas the opposite was true for noIDSC, noEDA, noTLA and CTL, all of which had small spreads (Figure 4a,b). In the regions where large RMSEs were absent, all the experiments except ALL produced large spreads, especially noMP and noPP (Figure 4d). Therefore, CTL and the experiments removing IC or model physics perturbations from ALL produced obvious biases in ensemble dispersion for QPFs. However, the above biases were reduced by introducing all the perturbations. Overall, the advantage of ALL over noLDSC or noBR in ensemble dispersion was small, albeit indeed present (Figure 4c).
To summarize, the current design of GM-CPEPS was proved to be effective in representative case-studies and was thus used during SCMREX, since the experiment with all the perturbations evidently outperformed the benchmark experiment and most of the perturbations, i.e. the IC, surface, and model physics perturbations seemed to be indispensable to GM-CPEPS in increasing its performance. But for LBCs perturbations, they contributed marginally to the performance of GM-CPEPS, not only for Cases 1 and 2 but also for cases during the entire experimental period (not shown). As shown in some previous studies (e.g. Vié et al., 2011), LBCs perturbations often showed non-negligible impacts on CP EPSs performance, especially in the later period of forecasts. However, this conclusion was apparently contradicted here, probably because the forecast range of GM-CPEPS is short (i.e. 12 h) and its domain is also relatively large. Besides LBCs perturbations, not all the other perturbations led to markedly improved forecasts. Thus, the current design of GM-CPEPS can be optimized by excluding or improving some perturbation components, which will be essential in future work. Figure 5 shows the vertical profiles of RMSE for the 12 h forecasts. By and large, EM provided higher skill for the non-precipitation variables, especially horizontal wind (Figure 5a), compared with DET. For temperature and humidity, the improvements of EM over DET were mainly at lower levels (Figure 5b,c). When comparing ensemble spread with RMSE, GM-CPEPS is no exception to the common deficiency of CP EPSs in underdispersion. Specifically, the underdispersion was more serious for temperature and humidity (Figure 5b,c) than for wind (Figure 5a). However, observation error should be included in the verification as a non-negligible error source (Saetra et al., 2004). Thus, the total spread, which includes both ensemble spread and observation error, was also  Wang et al., 2008) and the 3D-Var analysis system of GRAPES (e.g. about 1.7 g/kg for specific humidity at lower levels) were used here. The ratio of EM RMSE to total spread was close to 1, especially for variables at lower levels (1,000-850 hPa), indicating good consistency between spread and RMSE. However, GM-CPEPS remained underdispersive at middle levels (700-300 hPa) for temperature and humidity (Figure 5b,c) but became overdispersive at those levels for wind (Figure 5a). The ratio generally varied between 0.85 and 1.1 at lower levels, where dynamic and thermodynamic variables are very important for convective processes; moreover, the ratio was comparable to that for some mesoscale EPSs in the world (Hacker et al., 2011;Schwartz et al., 2014).

Deterministic guidance for non-precipitation forecasts
For both 10 m wind speed and 2 m specific humidity, both RMSE and ensemble spread decreased with lead time (Figure 6a,c), which is caused by the poor spin-up behaviour due to the imperfect coupling between the diagnosed and prognostic variables at the initial time in GRAPES-MARS3KM (Z16). EM showed significant improvements over DET in predicting the surface variables ( Figure 6), although some degradations appeared in the first 2 h forecasts of 10 m wind speed (Figure 6a). The estimations of observation error from NCEP for 10 m wind speed (1.4 m/s), 2 m temperature (1.2 K), and 2 m specific humidity (1.7 g/kg) were used here to calculate the total spread. After considering the impact of observation error on the verification, the consistency of spread and RMSE was significantly improved. Even so, underdispersion was still present for all the surface variables, especially for 10 m wind speed (Figure 6a). The ratio of EM RMSE to total spread was closest to 1 for 2 m specific humidity (Figure 6c), followed by 2 m temperature ( Figure 6b). Overall, the ratios for surface variables, e.g. 10 m wind speed, 2 m temperature and 2 m specific humidity were roughly 1.5, 1.2 and 1.0, respectively, which are very comparable to those for some operational CP EPSs (Beck et al., 2016).
Thus, GM-CPEPS exhibited reasonable performances of EM and some capabilities for estimating forecast uncertainty, especially for variables at lower levels and the surface.

Deterministic guidance for precipitation forecasts
Compared with DET, EM improved the forecasts of precipitation by significantly reducing the RMSE (Figure 6d). In The dots indicate the vertical levels for which the significance level of the RMSE differences is larger than 90% for comparing EM with DET. "oberr" ("noberr") represents the total (ensemble) spread including (not including) observation error. Numbers on the right-hand axes represent the ratio of EM RMSE to ensemble spread, separated by "/" to indicate that the former (latter) are related to total (ensemble) spread including (not including) observation error The dots indicate the lead times for which the significance level of the RMSE differences is larger than 90% for comparing EM with DET. "oberr" ("noberr") represents the total (ensemble) spread including (not including) observation error. Numbers on the top axes represent the ratio of EM RMSE to ensemble spread, where the grey italic (black normal) ones indicate that the former (latter) are related to total (ensemble) spread including (not including) observation error calculating the total spread, the observation error was estimated as in Bouttier et al. (2012). Generally, the ratio of EM RMSE to total spread was around 1.2, indicating that GM-CPEPS's underdispersion of precipitation was not very serious. Forecast distributions of EPS (such as MED, EM, Q 75 , Q 90 and MAX) can be directly used to generate deterministic guidance. Therefore, the performance of these products for different types of rainfall was assessed here by comparing the 50 km length FSS for the 1 h accumulated rainfall between various forecast distributions (Figure 7). The comparison of FSS with other neighbourhood lengths is not shown here because the conclusions are generally similar. FSS ranges from zero to one, with higher values corresponding to higher skills. In this study, rainfall with precipitation rates greater than 0.1, 10, 20 and 40 mm/h is defined as light, moderate, heavy and extremely heavy rainfall, respectively.
Overall, QPF skill increased with the percentile of forecast distribution, and MED showed the poorest performance. EM, Q 90 and MAX showed the best performance for light rainfall (Figure 7a,b). For moderate and heavy rainfall, the highest FSS was produced by Q 90 , followed by Q 75 and MAX, especially for forecasts beyond 6 h ( Figure 7b). Compared with other forecast distributions of EPS, both Q 90 and MAX showed significant advantage in predicting extremely heavy rainfall, especially the former. Actually, it was often very difficult for forecasts with a smaller percentile of forecast distribution to produce heavier rainfall. In particular, the magnitude of extremely heavy rainfall was overestimated (underestimated) for forecasts above (below) ensemble 90th percentiles (not shown). Thus, among the products from forecast distributions, Q 90 performed the best for almost all types of rainfall. Compared with DET, Q 90 was significantly more skilful in predicting light rainfall, and was slightly better or generally comparable in predicting heavier rainfall (Figure 7c,d). As shown in Figure 8a, there was underestimation (overestimation) for the area of light rainfall in DET (Q 90 ). Thus, the missing rate of DET in predicting light rainfall can be corrected by using Q 90 , which partially contributed to higher FSS in Q 90 than in DET. The areal coverages of moderate and heavy rainfall were overestimated more evidently for Q 90 than for DET (Figure 8b,c), which explains the insignificant improvement in FSS of Q 90 over DET. Q 90 also yielded less but more accurate areal coverage of extremely heavy rainfall  (c) identifies the observed and predicted fractional grid coverage shown in these panels. The cyan, black, green, blue and red numbers represent the ratios of predicted fractional grid coverage from DET, Q 90 , PM, NPM and OPT to observed fractional grid coverage respectively than DET, especially during the later experimental period (Figure 8d).
Aimed at improving the rainfall estimation from the EM, the PM blends the spatial pattern of the EM with the frequency distribution of the whole EPS (Ebert, 2001). The PM field at a given time is produced in three steps. First, the frequency distribution is formed by pooling the rainfall amounts from all 16 PERs at each grid point in the verification domain, ranking them from largest to smallest and keeping every 16th value. Second, the EM rainfall amounts in the verification domain are also ranked from largest to smallest. Third, the grid point corresponding to the largest EM rainfall amount is assigned the largest value in the frequency distribution, and so on. Compared with DET, PM yielded significant degradation in predicting light rainfall during the whole forecasts (Figure 7c,d), which mainly resulted from more serious underestimation of areal coverage in PM than in DET (Figure 8a). For moderate and heavy rainfall, PM significantly outperformed DET for forecasts beyond 6 h ( Figure 7d); especially, there was smaller overestimation of areal coverage in PM than in DET (Figure 8b,c). Because there were only a few observations or forecast samples for extremely heavy rainfall, the forecast differences between PM and DET were not very significant. For areal coverage of extremely heavy rainfall, PM showed evident underestimation, but the bias was much smaller than that of DET (Figure 8d).
In summary, the significant superiorities of PM (Q 90 ) over DET were present in the forecasts of moderate and heavy (light and extremely heavy) rainfall. This gave us a hint that combining PM with Q 90 may produce new PM forecasts (hereinafter NPM) with better performance than either individually. The superiority of Q 90 over PM was mainly for light and extremely heavy rainfall; therefore, more weight is given to Q 90 than to PM in constructing NPM for these types of rainfall. Note that, although bias was evidently present in Q 90 for most of the rainfall thresholds, the characteristics of bias were not identical among rainfall with different thresholds (Figure 8). Actually, high bias of Q 90 was chiefly present in predicting rainfall of moderate amount (e.g. above 5 mm and below 35 mm), but not evident in predicting rainfall of smaller (e.g. below 5 mm) or larger (e.g. above 35 mm) amount (not shown). In order to avoid serious bias caused by Q 90 , NPM should be constructed based on rainfall of different amounts. Specifically, NPM is generated by a linear combination of PM (with weighting factor ) and Q 90 (with weighting factor ).
( ) is set to 0.45 (0.55) when the predicted rainfall amount from PM (Q 90 ) is below 5 mm or above 35 mm; while ( ) is set to 0.65 (0.35) when the predicted rainfall amount from PM (Q 90 ) is above 5 mm and below 35 mm. According to the definition of NPM, there may be discontinuous fields which are not physical in NPM, where the corresponding fields in PM or Q 90 are around 5 or 35 mm and characterized by small spatial variations. However, this discontinuity is generally not evident here, probably due to the evident spatial variations of rainfall in both PM and Q 90 in the experimental period. Further work is required to fix this discontinuity in order to improve the general applicability of NPM.
Generally, the NPM forecasts were better than or comparable to the PM and Q 90 ones, confirming the effectiveness of the NPM technique. In particular, the inferiority of PM (Q 90 ) compared with DET in forecasting light (moderate) rainfall was well eliminated (Figure 7c,d). Especially, the serious overestimation for the area of moderate and heavy rainfall in Q 90 was greatly reduced in NPM (Figure 8b,c), while the noticeable area underestimation of extremely heavy rainfall in PM was considerably calibrated (Figure 8d). Overall, NPM outperformed DET in forecasting nearly all types of rainfall. The most significant improvements of NPM over DET were found in the entire 12 h forecasts of light rainfall and the 7-12 h forecasts of moderate and heavy rainfall. Obviously, the evident overestimation for the area of extremely heavy rainfall in the later experimental period in DET was well calibrated in NPM (Figure 8d).
Note that NPM can be considered as a calibration method based on ad hoc tuning, since it can effectively calibrate the biases of both PM and Q 90 for some thresholds of rainfall. The construction of NPM is highly dependent on the characteristics of performances (e.g. bias and FSS) of the two components, i.e. PM and Q 90 , which suggests that the construction or tuning strategy of NPM will probably be different in a different experimental period when the characteristics of performances of PM and Q 90 are different from those in the current experimental period, i.e. SCMREX. As a calibration method, NPM should in principle be evaluated over a period that is independent of the one used to construct it. So, the performance of NPM in this study presented the upper bound on the one that can be expected from using NPM.
Recently, a technique known as "best member" was proposed to cope with issues of physically unrealistic EM during model integration caused by nonlinearity at both synoptic and convective scales (Ancell, 2013;Hollan and Ancell, 2015). Ancell (2013) defined the following two best-member techniques: one that uses the single member closest to the EM over the whole forecasts, and another that uses a forecast patched together from members closest to the EM at each forecast time. The latter method was found to produce better forecasts of sea-level pressure and was also used by Schwartz et al. (2014) to forecast convective precipitation. However, the latter method did not yield significant improvements relative to the PM and sometimes produced unrealistic convective evolution . Z18 proposed an "optimal-member" method in which the best member was selected as the one closest to both the EM and the latest observations. This optimal-member method was used to predict precipitation related to tropical cyclones and was significantly superior to the PM for light rainfall. Therefore, there is still no widely accepted best-member technique for CP precipitation forecasting.
In this study, the optimal or best member (hereinafter OPT) was selected based on its closeness to the NPM field, since the NPM forecasts generally behaved better than the PM, DET, and Q 90 ones. Following Schwartz et al. (2014), the FSS with a 50 km neighbourhood length was calculated over the verification domain for precipitation rates greater than 0.1 and 20 mm/h for each PER using the corresponding NPM field as the truth. Following Z18, OPT was determined by maximizing the cost function where FSS j i (20) and FSS j i (0.1) denote the FSS, which is calculated in the verification domain, for member i at lead time j for precipitation rates above 20 and 0.1 mm/h, respectively. Here, only those forecasts beyond 3 h were measured for heavy rainfall because NPM was less skilful than DET during the first 3 h of forecasts (not shown). Obviously, the closest member with the highest cost function was selected as OPT. Similar to Z18, if more than one member had a maximum cost function, the OPT forecasts were produced by averaging over these members.
Although OPT produced higher or comparable FSS for forecasts of light rainfall compared with PM and DET, it performed worse than Q 90 and NPM (Figure 7c,d). There was no significant superiority for OPT in comparison with the other forecasts in predicting rainfall with larger amounts, although OPT seemed to perform best in the 7-12 h forecasts of extremely heavy rainfall. The conclusion that OPT did not significantly outperform DET in predicting any type of rainfall except light rainfall, as well as the reason behind the variation of OPT's advantages over PM in the present study, were both similar to those in Z18. However, the bias of OPT was clearly smaller than that of DET in predicting all types of rainfall ( Figure 8). On average, OPT yielded the smallest bias among all the deterministic guidance.

Probabilistic guidance
It is now becoming increasingly accepted that a probabilistic forecasting approach is absolutely necessary for CP NWP, considering the chaotic behaviour and large uncertainties of the high-impact weather at convective scales (Stensrud et al., 2009;Snook et al., 2012). The original probabilistic fields (hereinafter PRO) discussed in this study were simply the number of ensemble members with a certain threshold divided by the ensemble size, namely 16. The optimal-probability technique used in Z18 was also investigated here. Specifically, only the ensemble member whose cost function in Equation (4) accounted for 5% or more of the total cost function for all the PERs was included in the calculation of optimal probability (OPTP).
Recently, "neighbourhood approaches" were used to post-process high-resolution ensemble output to produce the probabilistic guidance (Theis et al., 2005). Introduced by Schwartz et al. (2010), the "neighbourhood ensemble probability" (NEP) applied a neighbourhood approach to generate grid-point probabilities. Here, NEP fields with a neighbourhood length of 50 km were also verified.
The skills for various probabilistic guidance (PRO, OPT and NEP) were assessed and compared with those for the binary rainfall fields of DET. Several skill scores for the forecasts during SCMREX were calculated here (Figure 9). Considering that the comparison of these scores among different forecasts was generally similar at different lead times, only the scores averaged over the whole 12 h forecasts were shown in Figure 10. BS measures the accuracy of EPS for predicting probability. Reliability is one of the three components of BS (Candille and Talagrand, 2005) and measures the agreement between predicted probability and mean observed frequency. AROC measures the ability of a forecast to discriminate between two alternative outcomes and thus measures the resolution, which is the other component of BS. A better (smaller) BS is associated with better (smaller) reliability and better (larger) resolution.
Compared with DET, all the probabilistic guidance produced significantly smaller BS for all thresholds (Figure 9a), indicating that the latter performed more accurately. For light rainfall, both OPTP and NEP outperformed PRO, with smaller BS for OPTP than for NEP. In terms of moderate rainfall, BS for NEP was significantly smaller than that for the other probabilistic guidance, while OPTP did not show significant superiority over PRO (Figure 9a). Regarding heavy and extremely heavy rainfall, there was no significant difference in accuracy among the probabilistic guidance.
The results of comparing reliability between DET and the probabilistic guidance were similar to those for BS (Figure 9b), except that the improvement in reliability of the probabilistic guidance over DET was remarkably larger than that in accuracy (Figure 9a). In short, compared with PRO, OPTP was significantly advantageous only for light rainfall, whereas NEP was more reliable for almost all thresholds except for heavy and extremely heavy rainfall (Figure 9b). For DET, some areal biases were found across all lead times and thresholds (figure 5c,d in Z16; Figure 8 herein). Meanwhile, for the probabilistic guidance, underforecasting (overforecasting) for light (heavy) rainfall is present (not shown). As a result, the poorer reliabilities for both DET and PERs were largely attributed to their higher biases.
For AROC, the superiority of probabilistic guidance over DET was not as significant as that for reliability. Specifically, improvements of the probabilistic guidance in AROC over DET were statistically significant for moderate and heavy rainfall (Figure 9c). There were no significant differences in discrimination between PRO and OPTP for all types of rainfall. Overall, evident improvements (degradations) of NEP over PRO in discrimination were found in the forecasts of moderate (heavy) rainfall.
OPTP seemed to be more accurate than PRO in forecasting lighter rainfall, mainly due to the more reliable performance Grey crosses indicate the thresholds for which the significance level of the BS, reliability and AROC differences between DET and other forecasts is larger than 90%, while green pluses indicate the thresholds for which the significance level of the BS, reliability and AROC differences between PRO and OPTP or NEP is larger than 90%. Considering large differences in BS (reliability) between rainfall with different thresholds, BS (reliability) for 1 h accumulated precipitation with thresholds of 10, 20 and 40 mm was inflated by a factor of 2.5, 5 and 10, respectively in the former. Additionally, the superiority of OPTP in discrimination was not significant, which resulted from the optimal-probability technique excluding fewer members in general. Since most of the PERs showed similar cost functions in Equation (4), especially the last few ones, the principle used to select PERs in OPTP must be relaxed to include enough ensemble members. NEP significantly outperformed PRO in accuracy, reliability and discrimination for rainfall with smaller threshold, which was consistent with the conclusion of Schwartz and Sobash (2017) that NEP typically does not possess good reliability or resolution for rare events because of sharpness loss.

RESULTS OF CASE-STUDIES
To further illustrate the performance of both deterministic and probabilistic guidance of GM-CPEPS, as well as to improve our understanding of the underlying physical causes of the performance, this section presents both forecasts initiated at 0000 UTC 8 May and 1200 UTC 10 May.

Deterministic guidance
From 0900 to 1200 UTC 8 May, a local MCS, which formed in southwestern Guangdong (GD) Province, moved eastward and merged with some convective clusters on the southwest coast of GD, resulting in a quasi-linear MCS. This MCS caused severe rainfall (hereafter R1) at 1200 UTC, with 1 h accumulated rainfall over 60 mm (Figure 10a). DET yielded evident area overestimations in Guangxi (GX) Province during the whole forecasts as well as obvious westward displacement of R1 (Figure 10b). Although Q 90 obviously overestimated the area of light rainfall, it did predict R1 well with a similar location to the observation (Figure 10c). Compared with Q 90 , PM evidently reduced the area overestimation of light rainfall (Figure 10d) but at the expense of more underestimations, which caused the lower FSS ( Figure 7d). Additionally, PM predicted R1 more accurately than DET. Compared with both Q 90 and PM, NPM showed better performance for R1, not only in location but also in size, despite still underestimating the intensity (Figure 10e). Generally, OPT performed the best for this local MCS (Figure 10f). In particular, OPT captured the observed characteristics of R1 well at 1200 UTC, with the predicted locations of 1 h accumulated rainfall exceeding 60 mm close to the observed ones ( Figure 10a). OPT was actually member 14, which was one of the 16 PERs. Thus, the excessive smoothing of averaging, which was illustrated in Z18 and occurs when there are too many members with the same maximum cost function, was absent in the present case. During the period from 1800 to 2100 UTC 10 May, a linear-shaped MCS formed in GX and moved southeastward, accompanied by a severe rain band R2 (Figure 10g). DET exhibited poor performance for this case, namely spurious precipitation at the border between GX and GD, and a northward displacement error for R2 (Figure 10h). Q 90 reduced the intensity of spurious rainfall at the border between GX and GD in DET, but it also yielded evident area overestimations over western GD (Figure 10i). R2 was captured by Q 90 despite the displacement error and intensity underestimation. PM underestimated the intensity of R2 seriously (Figure 10j). Overall, Q 90 outperformed PM in forecasting R2 but underperformed for more spurious rainfall, and NPM performed between Q 90 and PM (Figure 10k). Although OPT indeed showed some superiorities over DET in predicting R2, especially for location, it still fell behind Q 90 in predicting R2 intensity (Figure 10i,l). For this large-scale MCS, both Q 90 and NPM intuitively generated better guidance than DET in the forecasts of heavy rainfall. The evident displacement error for larger-scale MCSs present in single forecasts (i.e. DET and OPT) was partially corrected by including all the outcomes of PERs, which also inevitably resulted in false alarms.

Probabilistic guidance
For the forecasts initialized at 0000 UTC 8 May, no PER was excluded in calculating OPTP, according to the principle used in the optimal-probability technique (Figure 11a). For R1 at 1200 UTC, whose intensity was seriously underestimated in DET (Figure 10b), PRO highlighted the occurrence of moderate (extremely heavy) rainfall with predicted probabilities of roughly 20% (5%) (Figure 11a), indicating the ability of GM-CPEPS to capture potential threats of severe weather. In addition, the predicted probabilities of PRO were below 20% in GX (Figure 11a), these being obviously smaller than those where R1 was observed. Therefore, the risk of excessive warning due to DET (Figure 10b) can be diminished to some extent with the help of GM-CPEPS. Obviously, the areas with non-zero probability in NEP were larger than those in PRO, well covering the areas where moderate rainfall of R1 was observed (Figure 11b). This aspect was absent in PRO (Figure 11a), indicating the better capacity for detection in NEP than in PRO. However, the probability of NEP was generally smaller than that of PRO, resulting directly from smoothing in the neighbourhood technique. Thus, severe weather with higher threshold (e.g. extremely heavy rainfall) cannot be identified well by NEP (Figure 11b).
In the second case-study, 12 members of the 16 PERs were used to calculate OPTP. Both PRO and OPTP identified the potential threat of R2, with the probability of moderate rainfall reaching 10-40% (Figure 11c) and 20-50% (Figure 11d), respectively. Additionally, partial maxima for extremely heavy rainfall were also captured by PRO and OPTP. Obviously, the probabilistic guidance provided us with additional information about potential severe weather, which was absent in DET. Similar to Z18, OPTP in this study also enhanced the probability of detection effectively but produced misleading guidance in the case of spurious rainfall. The absences of R2 in some areas in both PRO and OPTP were corrected evidently by NEP (Figure 11e). NEP also exhibited FIGURE 10 One-hour accumulated rainfall (unit: mm) of AWS observation (a, g), DET (b, h) and various forecasts from deterministic guidance (c-f, i-l) valid at different lead times. R1 and R2 indicate the maximum observed 1 h accumulated rainfall valid at 1200 UTC 8 May and 2100 UTC 10 May respectively, and their outlines are contoured at the 10 mm threshold in orange lines. The regions of (a)-(f) are the same as the thick dashed range shown in Figure 1b, while the rectangles of (g)-(l) are the same as the thin dashed rectangle shown in Figure 1b FIGURE 11 Probability (%) of the 1 h accumulated rainfall exceeding 10 mm (shaded) and exceeding 40 mm (white contour for values greater than 5) for various forecasts from probabilistic guidance valid at different lead times. The outlines of the maximum observed 1 h accumulated rainfall marked as R1 and R2 are contoured at the 10 mm threshold in cyan lines. The regions of (a)-(b) are the same as the thick dashed range shown in Figure 1b, while the rectangles of (c)-(e) are the same as the thin dashed rectangle shown in Figure 1b the advantage over PRO and OPTP in reducing the probability of spurious rainfall.

CONCLUSIONS AND DISCUSSION
Poor CP-scale predictability of heavy rainfall over southern China in the pre-summer rainy season has severely limited QPF skill, thereby strongly calling for the development of a CP EPS. As part of this effort, an experimental GRAPES-based CP EPS, namely GM-CPEPS, has been established recently to provide forecasters with high-resolution deterministic and probabilistic products that are extremely scarce in current operational use.
GM-CPEPS was described fully herein, and its performance for QPF during SCMREX was also examined. GM-CPEPS covers southern China and the northern SCS with a horizontal resolution of 0.03 • . It consists of one unperturbed and 16 perturbed members, and it issues 12 h forecasts twice a day at 0000/1200 UTC. Several approaches are combined to generate perturbations of IC, LBCs and model physics, aiming at considering different benefits of various perturbation methods. Perturbations from downscaling, EDA, a time-lagged scheme and topography are combined with different weights to generate IC perturbations. Sea-surface temperature is also perturbed. LBCs perturbations are generated by blending downscaling perturbations with balanced random perturbations. Model physics perturbations are generated based on the combination of MP, SPPT and PP schemes.
The sensitivities of GM-CPEPS performance to the different perturbation methods were tested in two cases, to validate the rationale of GM-CPEPS design. The current construction of GM-CPEPS, which is based on simply incorporating perturbations from all uncertainty sources using some leading perturbation methods, was proved to be effective in providing the best deterministic and probabilistic QPFs as well as ensemble dispersion for QPFs. Moreover, all the perturbation methods contributed to improved QPF performance but in different ways. Initial downscaling perturbations made the largest contributions followed by MP, whereas perturbations from LBCs made the smallest contributions.
The half-month performance of GM-CPEPS in the period of 8-23 May 2014 was evaluated by implementing objective verifications for both deterministic and probabilistic guidance. Two heavy rainfall cases, which were initialized at 0000 UTC 8 May and 1200 UTC 10 May, respectively, were used to investigate the GM-CPEPS performance for rainfall forecasting more intuitively and to illustrate the factors affecting its performance. The assessment was focused on the advantages of GM-CPEPS over DET and on comparisons among various post-process techniques.
Verification of the EM for both vertical and surface variables revealed the advantage of EM over DET in deterministic skill as well as the comparable performance in estimating forecast uncertainty compared with some existing operational or experimental regional EPSs. However, there was still underdispersion for all the surface variables, which requires further improvements in designing the perturbations.
For precipitation, various forecast distributions of GM-CPEPS with different percentiles were verified and compared with EM and DET. Generally, Q 90 (i.e. the forecast distribution with the 90th percentile) showed the highest deterministic skill, with significant improvements in light (moderate and heavy) rainfall over DET (EM). Although PM underperformed DET for light rainfall, the former improved QPFs over the latter for both moderate and heavy rainfall. NPM, which was a linear combination of Q 90 and PM based on their respective superiority in QPFs, was confirmed to be effective for showing better or comparable performance compared with Q 90 and PM. NPM was superior to DET for almost all types of rainfall, with most significant improvements in the forecasts of light rainfall at all lead times and the forecasts of moderate and heavy rainfall beyond the lead time of 6 h. Note that the parameters, e.g. the rainfall thresholds used to classify rainfall and the weighting factors for Q 90 and PM, which are used in constructing NPM, were currently selected without tuning. Thus, the performance of NPM could probably be improved by tuning these parameters, which will be explored in following work. An optimal-member technique, namely selecting the member closest to the NPM fields, was proposed here to improve the deterministic guidance for rainfall. The superiority of this technique over DET was evident in reducing the area overestimation of heavier rainfall.
Both the superiority and inferiority of various deterministic guidances were demonstrated intuitively in the case-studies. GM-CPEPS showed evident advantages over DET in predicting precipitation related to local or large-scale MCSs. The inferiority of PM in forecasting light rainfall was present, because the areal coverage of this type of rainfall was underestimated. Q 90 was apt to show spurious rainfall that caused serious area overestimation for most types of rainfall, whereas NPM could effectively reduce the area underestimation (overestimation) of PM (Q 90 ) for light and extremely heavy rainfall (moderate and heavy rainfall). The superiorities over DET for the forecasts of heavy rainfall seemed to be different between NPM and OPT, especially for different MCS cases. Thus, exploring a new post-process technique including the advantages of both NPM and OPT merits further investigation. It cannot be ignored that both DET and PERs showed some biases in the forecasts, which caused GM-CPEPS's unsatisfactory performance for reliability. Although NPM can reduce biases for some types of rainfall, it also added some biases compared with PM, especially for moderate and heavy rainfall. Thus, calibration based on some traditional methods (Hopson and Webster, 2010;Zhu and Luo, 2015) is required in following work to address this issue. Using calibrated NPM is expected to improve the performance of OPT and will also be investigated.
From the perspective of probability, the original probabilistic guidance performed significantly better than DET in not only accuracy but also reliability for all types of rainfall and in discrimination for moderate and heavy rainfall. Besides this guidance including all the 16 PERs, the probabilistic guidance based on the optimal-probability technique proposed in Z18 was also verified herein. Although this technique improved the probability of detection effectively, it caused misleading guidance in the case of spurious rainfall. However, this technique did not bring too much improvement except for light rainfall because there was too little variation between most PERs in the cost function used to determine the optimal member. The neighbourhood probability (i.e. NEP) with a neighbourhood length of 50 km was found to be effective in improving the accuracy, reliability and discrimination of probabilistic guidance for rainfall with smaller threshold. Because NEP includes additional information about neighbourhood locations, it can reduce the missing rates in the forecasts. Owing to the effect of smoothing caused by averaging the probability over neighbourhood grids, NEP can also reduce the probability of spurious rainfall and thereby improve the reliability. However, the averaging also causes sharpness loss, which was responsible for the limited superiority of NEP in heavier rainfall.
The case-studies demonstrated the significant advantage of GM-CPEPS probabilistic guidance over DET in detecting potential threats of severe weather and lessening excessive warning. The benefits of NEP were more remarkable for reliability than for discrimination. It was noteworthy that the post-process techniques investigated in this study, i.e. OPTP and NEP, did lead to some improvements in the forecasts of cases with localized or large-scale forcing, but these improvements were still limited. Consequently, it is necessary to develop some new post-process techniques to further improve the probabilistic guidance for heavy rainfall forecasting. Even so, the discrimination, especially the probability of detection, may not necessarily be improved through post-process techniques.
Developing perturbation methods for IC, surface, and model physics that are more proper than those used in designing the current version of GM-CPEPS is probably the most effective way to improve the discrimination and will be explored in future work. Especially, TS perturbations can be potentially improved by implementing random perturbations with a spatial scale, and modifying the perturbations in the convectively unstable regions with convergent wind in SPPT also shows some positive impacts on the forecasts of heavy rainfall, based on the results of recent sensitivity tests in case-studies.
Although both deterministic and probabilistic guidance of GM-CPEPS showed some advantages over DET in QPF during SCMREX, the corresponding verification was only conducted over a 15-day period, which is still not long enough to validate the generality of these results in the pre-summer rainy season. Consequently, more experiments covering more cases are needed to draw general conclusions about the performance of GM-CPEPS in QPF during the pre-summer rainy season, which is very necessary for prompting the application of GM-CPEPS in operations.