A GRAPES‐based mesoscale ensemble prediction system for tropical cyclone forecasting: configuration and performance

To improve short‐range tropical cyclone (TC) forecasting in China from both deterministic and probabilistic standpoints, a mesoscale ensemble prediction system (TREPS) based on the Global/Regional Assimilation and Prediction System (GRAPES) is constructed as a planned operational system. TREPS comprises one unperturbed and 30 perturbed forecasts ranging up to 60 h at 0.09° horizontal resolution. Downscaling perturbations from the European Centre for Medium‐range Forecasts' (ECMWF) ensemble prediction system (ENS) are used to generate initial perturbations by blending them with balanced random perturbations implemented around the TC and to generate perturbations for lateral boundary conditions. The surface temperature around the TC is also perturbed, with a combination of multi‐physics and stochastically perturbed parametrization tendencies considered to represent uncertainties in the model physics. These perturbation methods contribute a full‐area, multi‐scale and multi‐variable mutual complement to increasing forecast perturbations.


Introduction
Improvements in numerical weather prediction (NWP) models, observing systems and data assimilation techniques have significantly improved the forecasting of tropical cyclone (TC) tracks over the last two decades (Rappaport et al., 2009;Goldenberg et al., 2015;Gopalakrishnan et al., 2016). However, the forecasting of TC intensity has shown slow improvement and remains a challenge (Yu et al., 2013;DeMaria et al., 2014;Cangialosi and Franklin, 2016). Because hazardous conditions associated with TCs, such as strong wind, heavy rainfall and storm surge, are usually related to TC intensity (Nishijima et al., 2012;Chen et al., 2013), improving the skill of TC intensity and related wind and rainfall forecasting is of particular importance and urgency.
In recent years, much research has gone into developing operational NWP models at fine resolution and new techniques for assimilating inner-core observations, and optimizing ensemble forecasts of TC intensity (Rogers et al., 2013). As TC intensity is primarily determined by small-scale dynamics and moist processes, improvements in intensity prediction and structural realism as a result of increased horizontal resolution have been demonstrated (Davis et al., 2010;Gopalakrishnan et al., 2012;Goldenberg et al., 2015).
The chaotic nature of TCs means that their intrinsic predictability -and therefore, the accuracy of their forecasting -is limited. Additionally, deficiencies in the current data assimilation methods and observation systems, as well as imperfections in the present NWP model dynamics and physics configuration, lead to inaccurate initial conditions (ICs) and imperfect representations of TC evolution for NWP models. These factors underlie the inevitable uncertainties in TC forecasting (Lorenz, 1963;Tribbia and Baumhefner, 1988) and result in limited information in single (deterministic) forecasts (Wilks, 2006). Ensemble forecasting is an important and useful approach for coping with such forecast uncertainties (Epstein, 1969;Leith, 1974;Wilks, 2006). The mean forecasts of a properly designed ensemble prediction system (EPS) are often superior to the deterministic forecasts from any individual member of the EPS (Leith, 1974;Toth and Kalnay, 1997); furthermore, EPS can provide useful quantitative information (e.g. ensemble spread and probability) for estimating forecast uncertainties (confidence) and the probability of weather system evolution (Murphy, 1990;Tracton and Kalnay, 1993).
To date, the application of EPS has produced encouraging improvements in TC forecasting. Ensemble mean track forecasts of TCs significantly outperform deterministic forecasts or forecasts from the best individual member of the EPS (Goerss et al., 2004;Rappaport et al., 2009;Yamaguchi et al., 2009). The advantages of ensemble mean forecasts of TC intensity over those produced by individual members have also been demonstrated (Weber, 2005;Sampson et al., 2008;Krishnamurti et al., 2011;Yu et al., 2013), although this advantage is not as evident in the prediction of TC tracks. For the promising benefits of ensemble forecasting technique on TC forecasting, the developments of EPS, especially at high resolution, become increasingly popular and important (Hamill et al., 2012;Gall et al., 2013).
TCs are one of the most serious types of natural hazards and cause severe damage in China each year (Xu et al., 2015). However, TC forecasting in China using NWP models lags behind other advanced operational centres, e.g. the European Centre for Medium-range Forecasts (ECMWF) (Wang, 2016). In particular, research on ensemble forecasting of TCs in China is still in its inception (Tan and Liang, 2012;Tu et al., 2014;Wang, 2016) and operational ensemble forecasting of TCs has not been fully implemented (Tan and Liang, 2012). However, the Numerical Weather Prediction Center of the China Meteorological Administration (CMA) implemented a regional EPS in 2014 . This EPS, named GRAPES-REPS, is based on a regional version of the Global/Regional Assimilation and Prediction System (GRAPES_Meso: Xue and Liu, 2007;Chen et al., 2008) and primarily covers land areas in China with a horizontal resolution of 0.15 • . Nevertheless, applications of this EPS still cannot meet the requirements of operational refined weather services for TC prediction, necessitating the development of a high-resolution regional EPS for TCs.
In 2006, the Institute of Tropical and Marine Meteorology (ITMM) of CMA established a new tropical regional atmosphere model for the South China Sea (TRAMS) based on the GRAPES framework. During the past few years, TRAMS has shown TC forecasting results comparable to some advanced operational global NWP models (Lei et al., 2016), especially with respect to TC track prediction. At present, TRAMS has become one of the main operational TC forecasting models used in China (Yu et al., 2013;Chen et al., 2014). To further improve the short-range forecasting of TC track and intensity, particularly of the strong wind and heavy rainfall caused by TC landfall, a high-resolution version of TRAMS (TRAMS9KM) and a corresponding mesoscale EPS (TREPS), in which the horizontal resolution has been updated from 0.36 • to 0.09 • , have recently been developed. Based on its performance during its experimental phase (2014)(2015)(2016) and the need to implement it as an operational weather service, ITMM plans to commence TREPS operation by the end of 2017.
To the best of our knowledge, issues regarding the development of a high-resolution (∼10 km) mesoscale regional EPS for use in operational TC forecasting in China have not been well investigated. This study attempts to assess the performance of TREPS in TC forecasting in China over the past 3 years, with a particular eye to evaluating whether its performance is competitive with or superior to that of ECMWF, whose products are used most frequently in operational TC forecasting for forecasters. To achieve this objective, a batch experiment covering 2014-2016 and a super-typhoon case-study are carried out. The TREPS configuration and an overview of the 19 TCs studied here are described in section 2. Section 3 provides some characteristics of TREPS perturbations. Results of the batch experiment and case-study are presented in sections 4 and 5, respectively. Section 6 concludes the article and provides further discussion.

Forecast model
The NWP model used in TREPS is TRAMS9KM , a non-hydrostatic regional model that adopts a semiimplicit, semi-Lagrangian scheme for integration over time. It employs a horizontal grid designed on a longitude-latitude mesh with Arakawa C-grid staggering, including 385 × 305 horizontal grid points with horizontal resolutions of 0.09 • × 0.09 • . The centre of the model domain is located at the position of the TC centre, which is determined in the official real-time warning information from CMA (WARNING, hereafter). The vertical coordinate is terrain-following based on Charney-Philips vertical layer skipping (Charney and Phillips, 1953); 55 vertical layers up to 35 km are used. The main forecast variables include zonal and meridional velocity (U, V), Exner pressure ( ), potential temperature (θ ), and specific humidity (Q). The physics parametrization schemes used include the Weather Research and Forecasting (WRF) Single-Moment 6-class (WSM6) microphysics scheme (Hong and Lim, 2006), the five-layer thermal diffusion (SLAB) land surface model (Dudhia, 1996), the rapid radiative transfer model (RRTM) long-wave radiation scheme (Mlawer et al., 1997), and the Dudhia short-wave radiation scheme (Dudhia, 1989).

TREPS configurations -basic set-up
Because the objective of TREPS is to improve the skill of shortrange TC forecasts during landfall, and computing resources are limited, TREPS operation will not be initiated until a TC approaches mainland China, and is suspended after the TC makes landfall. TREPS issues 60 h forecasts twice per day at 0000/1200 UTC for 30 perturbed ensemble members. Considering the remarkable performance of ECMWF highresolution (HRES) and ensemble (ENS: Buizza, 2014) forecasts (http://apps.ecmwf.int/archive-catalogue) in operational TC forecasting (Richardson et al., 2013;Haiden et al., 2015), both are used to construct TREPS as a large-scale forcing. Operationally, ECMWF forecast products are often received with a 'delay time' of 6 h, due to the data communication and computational requirements (Qi et al., 2014). Thus, WARNINGs issued at 0000/1200 and 0600/1800 UTC can both be used to construct TREPS.
For a TC forming over the South China Sea (SCS), TREPS would be initiated when the TC genesis is reported in WARNING or its minimum central pressure, viz. P min (maximum sustained wind speed, viz. V max ) becomes less (greater) than 1000 hPa (10 m s −1 ). For a TC forming over the western North Pacific (WNP), TREPS initiates if the following two conditions are satisfied simultaneously: (i) when the TC is reported to pass over the '48 h warning line prior to landfall at China' (0 • N, 105 • E; 0 • N, 120 • E; 15 • N, 132 • E; 34 • N, 132 • E) and is predicted to make landfall on mainland China within 60 h by CMA, and (ii) when the TC P min (V max ) reported in WARNING is less (greater) than 1000 hPa (10 m s −1 ). Here, only one of the conditions of P min and V max satisfied is required.
The control (deterministic or unperturbed) forecasts of TREPS (DETER, hereafter) are cold started, with the 0.125 • × 0.125 • HRES analyses and forecasts used as ICs and lateral boundary conditions (LBCs), respectively. The simplified Arakawa-Schubert (SAS) cumulus parametrization (CU) scheme (Pan and Wu, 1995;Han and Pan, 2011) and the medium-range forecast model (MRF) planetary boundary layer (PBL) scheme (Hong and Pan, 1996) are used in DETER.

TREPS configurations -perturbation generation
The uncertainties in ICs, LBCs and model physics processes are all considered in generating the perturbed ensemble members of TREPS.
Downscaling perturbations are calculated by subtracting the ensemble mean of the first 30 perturbed ensemble members of the 0.5 • × 0.5 • ENS forecasts from each perturbed ensemble member and then interpolated to the TRAMS9KM domain.
The initial perturbations are generated by a linear combination of downscaling perturbations (ICDp; with a weighting factor α) and balanced random perturbations (BRp; with a weighting factor β), representing, respectively, the large-scale and mesoscale uncertainties in the TC. These perturbations are added to DETER ICs to construct perturbed ICs. U, V, , θ and Q of ICs are all perturbed in this manner. θ and Q are left unperturbed if the perturbed Q reaches an unphysical value or exceeds a critical supersaturation value. BRp (Barker, 2005;Zhang, 2005;Schwartz et al., 2014) are produced by taking Gaussian random draws with zero mean and covariances from the regional background error covariances (B) provided by the GRAPES three-dimensional variational (3D-Var) system . The variances, horizontal correlation scales, and parameters of the vertical correlation function of B are all tuned to meet the needs of proper representation of the mesoscale uncertainties in TC. ICDp are implemented over the entire domain and at all vertical levels, while BRp are used only around the TC centre and in the troposphere. Specifically, BRp are switched on at the grid points at which the following two conditions are satisfied simultaneously: (i) the distance between the grid point and the TC centre reported in WARNING is smaller than the length γ , and (ii) the difference of equivalent potential temperature between 1000 and 500 hPa, which is calculated using the unperturbed ICs and used as the representative of convective instability, is larger than 10 K. Meanwhile, BRp are switched off near the model top (<50 hPa) and multiplied by a factor that is set equal to 0 near 50 hPa and linearly increases until it approaches 1 near 100 hPa. BRp are used to initialize the mesoscale perturbations in the significant convectively unstable region around the TC, and will evolve to become flow-dependent under the constraints of atmospheric dynamics in the NWP model after several hours of integration (Zhang, 2005). Specifically, in TREPS, BRp will generally grow to be coherent and organized after about 6 h forecasts, and reflect the large uncertainty of wind where background wind has large horizontal gradient (not shown), representing the mesoscale uncertainty around the TC inner core to some extent. Downscaling perturbations (multiplied by a tuning factor ε) are added to DETER LBCs in intervals of 6 h to generate the LBCs of the perturbed forecasts (LBCDp). The surface temperature is also perturbed to take into account the uncertainties from the sea-surface temperature, which is used as the surface boundary condition. Specifically, a perturbed surface temperature is produced by adding Gaussian random numbers with the mean of 0 and standard deviation of δ to the DETER surface temperature. The surface temperature perturbations (TSp) are switched off unless the distance between the grid point and TC centre reported in the WARNING valid at the initial time is smaller than the length γ . Several further steps are included in TSp implementation: (i) the perturbations are multiplied by a factor that is equal to 0 at distance γ and increases linearly to 1 at the wind radius of 17.2 m s −1 (gale_radius, hereafter) reported in WARNING; (ii) the amplitude of TSp is restricted to the range [−λδ, λδ] to limit excessive perturbations; and (iii) the preceding steps are only implemented at the initial time and keep perturbations unchanged during the model integration, resulting in constant forcing from the surface boundary to the atmosphere.
Based on previous tests of both operational performance and numerical integration stability, CU parametrization schemes, including Kain-Fritsch (KF: Kain and Fritsch, 1990) and SAS, and PBL parametrization schemes, including MRF and Yonsei University (YSU: , are used to construct four combinations of physics packages for parametrizing the CU and PBL processes in the MP schemes. Among the 30 perturbed ensemble members, 8 members are selected at random to use SAS/MRF schemes, as is the case in DETER; of the remaining members, 8, 7 and 7 are selected randomly to use SAS/YSU, KF/MRF and KF/YSU schemes, respectively.
In the SPPT schemes used here, perturbations with a random mesoscale pattern are added to the total parametrized tendency of physical processes for variables including U, V, θ and Q. In this process, a random field r drawn from a Gaussian distribution with mean zero, standard deviation 0.5, spatial correlation scale κ, and temporal correlation scale τ is produced during model integration. Here, r is bounded within the range of ±2 standard deviations to avoid unrealistic perturbations causing numerical instability. Next, the physical tendency is multiplied at each time step by f = 1 + r to form perturbed tendency.
To avoid the problems mentioned in Palmer et al. (2009), no tendency perturbations are applied near the surface (below about 100 m above ground) and near the model top (<50 hPa), and the perturbations in the transition layers (100-1200 m and 100-50 hPa) are smoothly ramped up to full amplitude. Additionally, perturbations of Q are checked in a manner similar to the initial perturbations to determine whether to implement tendency perturbations to both θ and Q.

TREPS configurations -parameter settings
Following the application of the process above, several key parameters used in the generation of perturbations remain to be determined. However, there is currently no clear theoretical or universally applicable guidance for setting such parameters. For example, the three critical parameters (i.e. standard deviation, κ and τ ) with primary impact on the performance of SPPT show apparent differences in settings between NWP models with different dynamical cores and resolutions (Palmer et al., 2009;Bouttier et al., 2012;Lang et al., 2012;Romine et al., 2014;Berner et al., 2015). Furthermore, previous sensitivity tests have revealed considerable impact of parameter settings for nearly all parameters mentioned above (i.e. α, β, γ , ε, δ, λ, κ and τ ) on TC forecasting (appendix). As a preliminary solution to parameter settings, a pragmatic approach based on WARNING in conjunction with tuning experimentation based on performance assessment of several case-studies (appendix) is proposed (Figure 1).
WARNING is used to provide parameter settings with the TC information at the initial time and in the first 6 h of forecasts. Specifically, WARNING reports the number of TCs over SCS and WNP and their centre positions in terms of longitude and latitude, intensities in terms of P min and V max , and scales in terms of gale_radius. The averaged initial gale_radius over the 48 forecasts (about 300 km) was used to classify the TC scale, with the gale_radius above (below) 300 km for a large (small) TC; the value of 980 hPa was selected to classify the TC intensity by reference to the national standard of China (see table 2 of Yu et al. (2013)), with P min below (above) 980 hPa for a strong (weak) TC; while the selection of 40 km to classify the TC extension was based on empirical estimation, with the change in gale_radius from the initial time to 6 h later over (below) 40 km for an extended (unextended) TC.
If there is a single TC reported to influence the target area, the weighting factors for ICDp and BRp (α and β) are set to 0.8 and 0.6 respectively, the distance of double gale_radius is used as the value of the maximum range of BRp (γ ), and there is no tuning (ε = 1) exerted on LBCDp. For a strong (weak) TC, the standard deviation of TSp (δ) is set to 2.0 (4.0). The clipping parameter for TSp (λ) is set to 0.75 (1.0) to restrict the probably excessive perturbations for the strong extended (unextended) TC; in the case of weak extended (unextended) TCs, the value of λ is set to 1.5 (2.0). For large (small) TCs, the spatial correlation scale (κ) and temporal correlation scale (τ ) are set to 500 (300) km and 5 (3) h, respectively.
If there are multiple TCs reported in WARNING, the TC predicted by CMA to make landfall at mainland China in the shortest time is selected as the 'target' TC and its centre position is used as the centre of the model domain. The distances between the 'target' TC centre and those of the other TCs are calculated and compared with the 'impacting' length, which is defined as the sum of the other TCs' gale_radius and the half width of the model domain. If this distance does not exceed the 'impacting' length, the corresponding TC is identified as an 'impact' TC. In the case of more than one 'impact' TC, only the strongest is selected. Among the parameters that must be determined in the case of multiple TCs, α, β and ε are independent of the characteristics (e.g. intensity) of the TC and are set to 2.0, 0.8 and 2.0, respectively. In this situation, including more large-scale uncertainties from ENS in TREPS leads to better forecasting performance, which is the result of previous sensitivity tests (appendix). BRp and TSp are implemented for both 'target' and 'impact' TCs, with γ set to the distance of triple gale_radius for the corresponding TCs, δ set to 0.5, and λ set to 2.0 (1.0) for extended (unextended) TCs. The two key parameters in SPPT schemes, i.e. κ and τ , are set based only on the 'target' TC in the same way as in the case of a single TC.

Experimental design and cases overview
To evaluate the performance of TREPS, retrospective forecasts of TREPS were conducted during 2014-2016, when 19 TC cases ( Figure 2) with 48 TREPS forecasts were collected.
Most of these TCs formed in WNP and moved northwestward to make landfall on the southeast coast of China; several formed in SCS and mainly made landfall on the Guangxi (GX), Guangdong (GD) and Hainan (HN) provinces. There were some simultaneous TCs in the forecasts, including Linfa/Chan-hom, Chan-hom/Nangka, Meranti/Malakas and Haima/Sarika. As the strongest TC making landfall on mainland China during the past 3 years, super-typhoon Rammasun (in 2014) caused great damage in HN, GD and GX (Zhang et al., 2017). Following its genesis, Rammasun moved from WNP to SCS and arrived at SCS at 0000 UTC 16 July as a typhoon. Owing to favourable environmental conditions over SCS, Rammasun strengthened rapidly to become a super-typhoon with a peak maximum sustained wind speed of 72 m s −1 at 0600 UTC 18 July. In view of the serious nature of the resulting disaster and its rapid intensification (RI), which is difficult to skilfully predict at present (Rogers et al., 2013), Rammasun was selected as the case-study.

Forecast verification
This study focuses on the performance of both deterministic and probabilistic guidance of TREPS. Specifically, the deterministic guidance provided by TREPS includes the ensemble-mean (EM) track (TC centre position) and intensity (P min and V max ) and the probability-matched mean (PM: Ebert, 2001;Schwartz et al., 2014) rainfall and 10 m wind; the probabilistic guidance includes the probability of rainfall and 10 m wind at different thresholds. To investigate the advantages of TREPS in TC forecasting, its deterministic guidance is compared with that of ENS, HRES and DETER, while its probabilistic guidance is Figure 2. Tracks of the 19 TCs examined in this study from the CMA best-track analysis. Here, the TC position at the first forecast initialization time is used as the beginning of the track. The sizes of the TC symbols represent the TC intensities, with larger sizes corresponding to stronger TCs; the number in parentheses is the first forecast initialization time (UTC) for the TC; the number labelled at the initial location of the track is the total number of forecasts for the TC; the dashed rectangle indicates the verification area (VA) for TC Rammasun. compared with that of ENS. To make an equitable comparison, only the first 30 perturbed ensemble members of ENS are used. All comparisons are based on the skill scores illustrated as follows.
In this study, the TC centre is defined as the minimum of the geopotential height at 850 hPa, while the lowest (highest) sea-level pressure (10 m wind speed) within 100 (250) km of the TC centre is used to define P min (V max ) (Torn, 2010). For each forecast time, the EM track and intensity are computed as, respectively, the average of the TC centre positions and P min (V max ) from the perturbed forecasts; these factors are only computed if at least two-thirds of the perturbed ensemble members predict V max larger than 10 m s −1 (Majumdar and Finocchio, 2010). The CMA best-track dataset (Ying et al., 2014; http://tcdata.typhoon.org.cn) was used to calculate the average absolute track error (Hamill et al., 2011) and intensity error over the set of 48 forecasts. To ensure a fair comparison, a homogeneous sample of TC cases with observed or predicted V max larger than 10 m s −1 was used.
Forecasts of rainfall and 10 m wind were verified against hourly rainfall and 10 m wind observations respectively, from the automatic weather stations (AWSs) densely distributed over mainland China. The observed 6 h accumulated rainfall was also calculated by summing the hourly rainfall data over time in a manner similar to that in Wang et al. (2016). Both rainfall and 10 m wind observations were interpolated to the NWP model grids using Cressman interpolation to calculate skill scores. In this article, the verification area (VA, hereafter) and verification period (VP, hereafter) for rainfall and 10 m wind are defined as the domain and time directly influenced by TC, respectively. Specifically, the area (period) where (when) heavy rainfall and strong wind related to TC circulation occurred was subjectively selected as VA (VP) based on the AWS-observed rainfall and wind, as well as radar reflectivity. An example of VA for Rammasun is shown in Figure 2, and VP for the forecasts initialized at 0000 UTC 17 July is from 30 to 60 h. Both PM rainfall and 10 m wind were calculated within the VA.
To verify the rainfall and 10 m wind for EPS deterministic guidance as well as for the unperturbed forecasts, both the threat score (TS: Gilbert, 1884) and fraction skill score (FSS: Roberts and Lean, 2008) were used. The TS (FSS) was computed for the 6 h (1 h) accumulated rainfall in consideration of the more significant 'double penalties' due to the displacement error in the verification of high-resolution forecasts for heavy rainfall accumulated over a shorter period (Ebert, 2008). In calculating FSS, 'neighbourhood length' was defined as the length of area over which the fractions were computed. Probabilistic guidance verification was achieved by computing Brier score (BS) and its reliability aspect (Wilks, 2006), and the area under the relative operating characteristic curve (AROC: Mason and Graham, 2002).
For the comparison of skill scores, including the absolute track/intensity error, TS, FSS, BS, reliability, and AROC, statistical significances of the differences between different forecast sources were assessed using a bootstrap resampling procedure. Specifically, random samples of skill scores were generated with replacement, and then the differences of skill scores were calculated. This procedure was repeated 1000 times and bootstrapping was performed on the score differences between the paired samples. The rank at which the resampled score differences crossed zero was used as the significance level (Davis et al., 2010) to represent the probability that two skill scores were distinct. Here, a 90% significance level indicates a 90% probability that two skill scores differed.

Perturbation characteristics
Knowledge of the perturbation characteristics of different perturbation methods for different perturbation sources is important for the design of EPS. Recently, the time evolution of the spatial perturbation structure for different perturbation methods was investigated, revealing significant differences among these methods for TC cases (Yamaguchi and Majumdar, 2010;Lang et al., 2012). Moreover, Lang et al. (2012) indicated the importance of deciding how much variance should be introduced into EPS by each perturbation method in the design of EPS.

Design of parallel comparison experiments
As a preliminary attempt at establishing an operational mesoscale EPS based on TRAMS9KM, the ideas for the ensemble design of TREPS are simply incorporating perturbations from all uncertainty sources using some leading perturbation methods. Therefore, to confirm their suitability and improve future design of TREPS, the impacts of various perturbation methods on the total forecast perturbations around the TC were investigated. In addition to the retrospective forecasts (ALL, hereafter) described in section 2.5, six additional parallel comparison experiments, called noICD, noBR, noLBCD, noTS, noSPPT and noMP, with ICDp, BRp, LBCDp, TSp, SPPT and MP removed from ALL respectively, were carried out.
We used the same analysis method as that used in Lang et al. (2012). Specifically, the perturbations were defined as the differences between perturbed and unperturbed forecasts for U, V, temperature (T) and Q. The kinetic energy (KE), internal energy (IE) and latent heat energy (LHE) for perturbations were then expressed as follows: and where U , V , T and Q denote the perturbations for U, V, T and Q, respectively; C p = 1006.0 J kg − 1 K − 1 is the specific heat capacity at constant pressure; T r = 287 K is the reference temperature; L = 2.5 × 10 6 J kg − 1 is the latent heat of vaporization; and S and p 1 p 0 denote horizontal and vertical integrations respectively. These perturbation energies were all calculated in the 'target' domain within the distance of the double initial gale_radius of the TC centre in DETER. The differences of perturbation energies between ALL and the additional six experiments, i.e. noICD, noBR, noLBCD, noTS, noSPPT and noMP, were used to represent the relative contributions of ICDp, BRp, LBCDp, TSp, SPPT and MP to the total forecast perturbations, respectively.

Contribution to total forecast perturbations
The differences of perturbations, averaged over all ensemble members, all 48 forecasts, the domain within the distance of the double initial gale_radius, and the vertical levels, are shown in Figure 3. Overall, ICDp contributed the largest portion of the total forecast perturbation around the TC, followed by MP and LBCDp. However, the contribution of MP to perturbation IE increased quickly and became the largest during model integration. For BRp, the contribution to perturbation KE was significantly higher than that to IE or LHE at the initial time (Figure 3(b)), revealing some deficiencies in BRp, likely caused by the unsatisfactory balance constraints described in B used for generating perturbations; however, this contribution appeared to decrease significantly during model integration, because some random perturbations without dynamically unstable mode dissipate. Furthermore, TSp contributed generally the least to all three perturbation energies, especially perturbation LHE (Figure 3(f)), indicating the probably undesirable implementation of TSp as a constant forcing around the initial TC centre. The contribution of SPPT was not as large as that in Lang et al. (2012), which might be attributable in part to the less than satisfactory parameter settings, e.g. standard deviation.
Composites of the azimuthally integrated perturbations energies were calculated and averaged over all ensemble members and all 48 forecasts. An azimuthal Fourier analysis of these perturbation energies was then calculated to investigate the scale and distribution characteristics of the relative contribution of different perturbation methods.
At the initial time, the contribution of ICDp was greater than that of BRp by and large. Generally, ICDp produced a positive contribution in most of the regions around TC and at nearly all scales, with its contribution most evident at large scales (wavenumber <3) (Figure 3(a)). Compared with ICDp, BRp produced a substantially higher positive contribution to perturbation KE outside the TC core (at radii >40% of gale_radius) at small scales (wave-number ≥3) (Figure 3(b)). For perturbation IE and LHE, BRp exhibited its primary contribution at large scales, especially near the TC core (at radii <40% of gale_radius). Thus, BRp provided the initial total perturbations with additional information, particularly at small scales, in addition to partial information similar to that of ICDp.
For the 36 h forecast total perturbations, the main positive contributions from both ICDp and BRp were located outside the TC core at most scales and near the TC core at small scales (Figures 3(c) and (d)). Around the TC eyewall (at 20-40% of gale_radius), BRp contributed a more substantial small-scale perturbation IE than ICDp. LBCDp chiefly contributed to the large-scale perturbations outside the TC core (Figure 3(e)); TSp and SPPT both led to significant contribution to the small-scale perturbation LHE near the TC core (Figures 3(f) and (g)), which was missing in the contribution of LBCDp. SPPT also produced an evident contribution to the large-scale perturbation KE near the TC core (Figure 3(g)). MP exhibited a broader distribution of positive contributions than the other perturbation methods, with the most noteworthy contributions to large-scale perturbations for KE near the TC core, IE around the TC eyewall, and LHE outside the TC core (Figure 3(h)). Overall, different perturbation methods exhibited different contributions to the total forecast perturbations in different regions and at different scales, with the discrepant contributions complementing each other in some cases.

Results of batch experiment
The performances of both deterministic and probabilistic guidance of TREPS in the retrospective forecasts for 2014-2016 TC cases were verified and compared with those of ENS, HRES and DETER. Note that, in operational use, ECMWF forecast data are received at intervals of 6 h; therefore, the comparison between TREPS and ECMWF was conducted only for the verification of variables at 6 h intervals, while the verification comparison between TREPS unperturbed and perturbed forecasts was also carried out at 1 h intervals to investigate the impact of TREPS on high-resolution TC forecasting.

Deterministic guidance
The TREPS EM track forecasts were slightly better than those of HRES during the first 48 h of forecasts, and seemed to be comparable to those of ENS EM, with which there were no statistically significant differences during the overall forecast period (Figure 4(a)). For the insignificant degeneration of EM track forecasts of TREPS relative to ENS, the more serious bias Figure 3. Differences of the decomposition for perturbation KE (shaded; unit: m 2 s −2 ) and LHE (thin black contour, solid for positive values, dashed for negative, and dotted for zero values; intervals are 0.5 and 2 for (b) and others, respectively; unit: J kg −1 ) at 850 hPa, and for perturbation IE (thick green contour, solid for positive values, dashed for negative, and dotted for zero values; intervals are 1 and 2 for (b) and others, respectively; unit: J kg −1 ) at 500 hPa between ALL and noICD, noBR, noLBCD, noTS, noSPPT and noMP, denoted by (a, c)'ICDp', (b, d)'BRp', (e)'LBCDp', (f)'TSp', (g)'SPPT' and (h)'MP' respectively. '00' (a-b) and '36' (c-h) indicate the differences at the initial time and the 36 h forecast time, respectively. The differences of the decomposition amplitudes are depicted as a function of wave number and radius, which is measured by the factor of gale_radius; the differences are scaled by factors of 10 5 , 10 2 and 10 3 for perturbation KE, IE and LHE respectively. Italic numbers in parentheses following the title represent the differences of perturbation KE (red; unit: m 2 s −2 ), IE (green; unit: J kg −1 ), and LHE (black; unit: J kg −1 ) averaged over the domain within the distance of the double initial gale_radius over the levels from 1000 to 100 hPa. in track forecasts for TRAMS9KM than for ECMWF is likely responsible (not shown). Furthermore, track forecasting was more skilful in TREPS EM than in DETER at nearly all valid times. In this study, the lead time at which TC made landfall often lay between 24 and 54 h according to the conditions that must be satisfied to issue TREPS. Neither TRAMS9KM nor ECMWF could well reproduce the intensities of intense TCs, because of their insufficient resolutions (Gentry and Lackmann, 2010). Thus, during the initial 36 h of forecasts when the TC was making landfall, large intensity errors of both ENS EM and TREPS EM were dominated by serious weak biases (Figures 5(b) and (c)), leading to the reduction of intensity errors with increasing forecast lead times (Figures 4(b) and (c)). For P min forecasts, TREPS EM showed smaller (larger) errors than the others in the first 36 h (beyond 36 h) of forecasts, which was statistically significant (not significant) (Figure 4(b)). These larger errors may be partially explained by the more serious bias in P min forecasts for TRAMS9KM than for ECMWF ( Figure 5(b)), and partially related to the different characteristics of bias between different physics schemes used in MP. The most noticeable differences among the various forecasts were in the V max results, with TREPS EM exhibiting the smallest errors in the whole forecasts (Figure 4(c)).
Considering the bias in TC intensity forecasting, the calibrated forecasts were briefly calculated by subtracting the bias from the raw forecasts at each lead time. After calibrating, intensity errors before TC landfall decreased, with larger reduction in ENS than in TREPS (Figures 5(b) and (c)). However, calibrating degraded ENS EM P min forecasts after TC landfall when biases did not dominate intensity errors (Figure 5(b)). In general, intensity errors after calibrating were smaller in TREPS than in ENS.
Ensemble spreads of both ENS and TREPS were calculated according to the approach of Hamill et al. (2011) and averaged over the 48 forecasts. It is seen that both ENS and TREPS were overdispersive for track forecasting, with the former generally producing more overdispersions (Figure 5(a)). In intensity forecasting, TREPS exhibited greater spreads and smaller underdispersions than ENS, particularly in the forecasts of V max (Figures 5(b) and (c)). Although underdispersions of ENS decreased after calibrating, they were still more serious than those of TREPS. Thus, TREPS provided better estimation of uncertainties in TC intensity forecasting, despite biases in the forecasts. Generally, EM underestimates the peak of heavy rainfall and overestimates the area of light rainfall, due to its simple averaging over all ensemble members (Du et al., 1997); the same is true for TC wind forecasting. PM was proposed by Ebert (2001) to improve the rainfall estimation from EM by blending the spatial pattern of EM with the frequency distribution of the whole EPS. Accordingly, PM rainfall and wind forecasts are also important products of TREPS. The PM field at a given time is produced in three steps. First, the frequency distribution is formed by pooling the rainfall (wind) amounts from all 30 ensemble members at each grid point in VA, ranking them from largest to smallest, and keeping every 30th value. Second, the EM rainfall (wind) amounts in VA are also ranked from largest to smallest. Third, the grid point corresponding to the largest EM rainfall (wind) amount is assigned the largest value in the frequency distribution, and so on.
Obviously, the accuracies of both the EM spatial pattern and EPS frequency distribution determine the accuracy of PM. However, EM has been shown to become physically unrealistic during model integration as a result of nonlinearity at both synoptic and convective scales (Ancell, 2013;Hollan and Ancell, 2015). Thus, a technique called 'best member' was proposed to cope with these issues in EM and PM. Ancell (2013) defined two best-member techniques: one uses the single member closest to EM over the whole forecasts, and the other uses a forecast patched together from the members closest to EM at each forecast time. It was determined that the latter method produced better forecasts of sea-level pressure; the latter technique was also used in Schwartz et al. (2014) for convective precipitation forecasting, but did not yield significant improvements relative to PM and sometimes produced unrealistic convective evolution. It is therefore still not clear how to effectively design a best-member technique for precipitation forecasting. Recently, the short lead time error based on latest observations such as warning position was considered as a means for selecting ensemble members to determine the track consensus at long lead times (Goerss et al., 2004;Sampson et al., 2006;Qi et al., 2014). This attempt had proven effective in improving TC track forecasting. Based on the previous studies mentioned above, an 'optimal member' method for both rainfall and wind forecasting was verified in this study to explore how to best produce deterministic guidance from EPS.
Specifically, a single member was selected to be the optimal member based on the closeness of the perturbed ensemble members' forecasts to both EM forecasts (for both track and intensity) and the latest observations of WARNING. Following Torn (2010), the optimal member was determined by minimizing the cost function: The first term on the right-hand side of Eq. (4) measures the closeness to the latest observations, where T j i , P j i and W j i denote absolute errors of the track, P min and V max respectively, against WARNING observations for member i at lead time j; μ j = 1000.0/P j obs is the weighting factor depending on the P min observation P j obs at lead time j. Here, only the forecasts at the initial and 6 h forecast times are measured, because of the 6 h delay time in operational use. Moreover, the term is tuned by the factor μ j in light of larger uncertainties in the WARNING observations for weaker TCs. The second term on the righthand side measures the closeness to EM forecasts, where T j i , P j i and W j i denote absolute differences of the track, P min and V max respectively compared with EM forecasts for member i at lead time j. The forecasts of track, P min and V max were respectively measured from lead time t1, p1 and w1 to t2, p2 and w2 in intervals of 6 h, with the lead times chosen according to the evaluation results illustrated above. Clearly, the performances of both TREPS and ENS EM forecasts for TC track and intensity forecasting changed with lead time (Figure 4). It was naturally assumed that better performance indicates a higher similarity between EM forecasts and observations; therefore, only the lead times at which EM forecasts behaved better were chosen. For TREPS, t1, t2, p1, p2, w1 and w2 were set to 6, 54, 6, 36, 6 and 60, respectively. The closeness to the EM V max forecasts was not measured in the cost function for ENS, considering the poor performance (Figure 4(c)); and t1, t2, p1 and p2 were set to 6, 60, 54 and 60, respectively, with t, p and w used to represent the total number of lead times used in the calculation of the cost function. σ T , σ P and σ W denote the standard deviations of the TC track (15 km), P min (4 hPa) and V max (1 m s −1 ), respectively, which were determined empirically and used to normalize the various variables to maintain their mutual balance. If more than one member had a minimum cost function, optimal-member (OPT, hereafter) forecasts were produced by averaging over these members. Figure 6 shows TS for 6 h accumulated rainfall and 10 m wind, where larger TS values indicate higher skill. It was not surprising that TS was nearly 0, or the differences among various forecasts were not very significant, during the initial 24 h of forecasts for both rainfall and wind with larger thresholds, as there were few observations or forecast samples for this period of TC making landfall.
For light rainfall (>0.1 mm per 6 h), TREPS PM had limited superiority, which was significant only in comparison with ENS PM (Figure 6(a)). Additionally, ENS OPT yielded the highest TS, followed by TREPS OPT. The superiority of TREPS OPT to TREPS PM was most evident beyond 24 h of forecasts, while its significant superiority to HRES, ENS PM and DETER was nearly ubiquitous for the whole forecasts. In view of heavy rainfall (>25 mm per 6 h), the highest TS was produced by TREPS  . TS for the forecasts of 6 h accumulated precipitation with thresholds of (a) 0.1, (b) 25, and (c) 100 mm, as well as forecasts of 10 m wind with thresholds of (d) 17.2, (e) 24.5, and (f) 32.7 m s −1 averaged over different lead times (hours) from various forecast sources. Grey (black) asterisks, diamonds, squares, and circles indicate the lead times for which the significance level of the TS differences are larger than 90% for the comparison of TREPS PM (TREPS OPT) with HRES, ENS PM, ENS OPT and DETER respectively. Orange crosses indicate the lead times for which the significance level of the TS differences between TREPS PM and TREPS OPT is larger than 90%. PM, followed by TREPS OPT, although there was no significant difference between their results (Figure 6(b)). Both of them significantly outperformed the other forecasts (except ENS OPT) during most of the forecasts. The most noticeable improvements of TREPS PM over ENS OPT were found beyond 36 h of forecasts. Again, ENS OPT exhibited the highest TS among the ECMWF forecasts, indicating the effectiveness of the optimal-member method proposed here. For extremely heavy rainfall (>100 mm per 6 h), the TREPS forecasts behaved significantly better than the ECMWF forecasts (Figure 6(c)), probably due to the former's higher resolution; the performance of TREPS PM was comparable to (better than) that of TREPS OPT (DETER).
The forecasts of 10 m wind exceeding 17.2 m s −1 (modest wind, hereafter) were also verified ( Figure 6(d)). Although TREPS PM behaved worse than ENS OPT from 30 to 60 h lead time, its performance was comparable to that of TREPS OPT. However, Figure 7. FSS for the forecasts of 1 h accumulated precipitation with thresholds of 15 mm and neighbourhood length of 50 km as a function of lead time (hours) from various forecast sources. Percentages in the top axis represent the relative improvements of FSS compared to DETER averaged over different lead times as in Figure 6, which are separated by '/' to indicate that the former (latter) is related to TREPS PM (TREPS OPT). Pluses on the markers of TREPS PM (TREPS OPT) indicate the lead times for which the significance level of the FSS differences between TREPS PM (TREPS OPT) and DETER are larger than 90%. Crosses in the top axis indicate the lead times for which the significance level of the FSS differences between TREPS PM and TREPS OPT is larger than 90%.
TREPS PM still displayed significant superiority to DETER, HRES and ENS PM at partial lead times. ENS OPT was the best, and its improvement over TREPS OPT was slightly significant. For strong wind (>24.5 m s −1 ) forecasts, ENS OPT once again outperformed both HRES and ENS PM, but TREPS OPT was evidently less skilful than both TREPS PM and DETER, which ranked as the best two forecasts with no significant differences between them (Figure 6(e)). Similar to the difficulties in forecasting extremely heavy rainfall, the capability of ECMWF in forecasting extremely strong wind (>32.7 m s −1 ) was poor even though the optimalmember technique was used (Figure 6(f)). Note that TREPS OPT exhibited no skill in forecasting extremely strong wind, while TREPS PM was still skilful, as, to a lesser extent, was DETER.
To investigate the fine performance of TREPS for short-time heavy rainfall (>15 mm h −1 ) forecasts, the 50 km length FSS for the 1 h accumulated rainfall was compared between various TREPS forecasts (Figure 7). The comparison of FSS with other neighbourhood lengths is not shown here, as the conclusions are nearly the same. FSS ranges from 0 to 1, with higher values corresponding to higher skill. For short-time heavy rainfall, FSS was much higher for TREPS PM than for the others during the initial 18 h of forecasts. However, we cannot draw a sufficiently credible conclusion based on the limited observation samples available in this period. Generally, TREPS PM behaved best with the most significant superiority in the middle stage of forecasts, demonstrating improvements over DETER ranging from 10 to 20%. TREPS OPT also outperformed DETER, especially in the 12-24 h of forecasts with improvements of about 20%.

Probabilistic guidance
The probabilistic fields discussed in this study were simply the number of ensemble members with a certain threshold divided by the ensemble size, 30. Since the effectiveness of the optimalmember technique was confirmed in some aspects, the feasibility of using this technique to improve probabilistic forecasting was also investigated. Therefore, an 'optimal probability' technique was proposed here. Firstly, the cost function J i in Eq. (4) is calculated for each ensemble member; then, the total cost function J t is calculated by summing each cost function over all members; finally, if the ratio J i /J t is smaller than a certain threshold, THj, the corresponding member is included in the calculation of probability. As a preliminary attempt, THj was set to 0.045. However, there was no member with a ratio below 0.045 in some forecasts, in which case THj was increased gradually in increments of 0.005 until at least ten members were included.
The forecast skills of the probabilistic forecasts for both TREPS and ENS with (OPT, hereafter) and without the optimal-probability technique were then assessed. BS measures the accuracy of EPS for predicting probability, with smaller values corresponding to better probabilistic forecasts. BS can be decomposed into three terms: reliability, resolution, and uncertainty (Candille and Talagrand, 2005). Reliability measures the agreement between predicted probability and mean observed frequency, while resolution measures the ability of a forecast to discern situations with different frequencies of occurrence; smaller reliability indicates more reliable probabilistic forecasts. Considering that AROC measures the ability of a forecast to discriminate between two alternative outcomes, and thus measures the resolution, verification based on the resolution was not considered here. For AROC, values exceeding 0.5 indicate successful discriminating ability, while higher values correspond to better discrimination.
For light rainfall, BS for ENS was significantly smaller than that for TREPS (Figure 8(a)). The optimal-probability technique greatly improved the accuracy for TREPS during nearly the entire forecasting period; but the opposite was true for ENS. Originally, TREPS exhibited higher BS than ENS for heavy rainfall, especially beyond 30 h of forecasts (Figure 8(b)); excitingly, this inferiority was significantly eliminated by using the optimal-probability technique. Beyond 24 h of forecasts, TREPS OPT produced a higher accuracy than ENS. However, the optimal-probability technique slightly degraded ENS. Generally, there were few differences in extremely heavy rainfall forecasts between TREPS and ENS (Figure 10(c)), with the only significant difference being that TREPS was more skilful than ENS for 60 h forecasts; and the optimal-probability technique did not change the forecast performance for either TREPS or ENS.
ENS was significantly superior to TREPS in probabilistic forecasts for modest wind (Figure 8(d)); however, TREPS behaved remarkably better than ECMWF during the initial 18 h of forecasts for strong wind (Figure 8(e)). The optimal-probability technique degraded both ENS and TREPS, particularly the former (Figures 8(d) and (e)). For extremely strong wind forecasting, the superiority of TREPS over ENS was evident in half of the forecasts (Figure 8(f)). The benefits of the optimal-probability technique were found for ENS, despite only being significant for partial forecasts.
There were some biases found in both ENS and TREPS; specifically, underforecasting was present for light rainfall forecasting and was more serious in TREPS than in ENS (not shown). For this reason, TREPS yielded less reliable forecasts than ENS (Figure 9(a)). For heavier rainfall, underforecasting (overforecasting) existed before (after) TC landfall for both ENS and TREPS (not shown). Overall, TREPS forecasts with (without) the optimal-probability technique had the best reliability for heavy (extremely heavy) rainfall among the various forecasts (Figures 9(b) and (c)). However, their superiorities over ENS were only significant at limited lead times.
Regarding modest wind, there was overforecasting for both ENS and TREPS (not shown). For (extremely) strong wind, underestimation of ENS was serious and thereby resulted in few samples used in calculating reliability. Therefore, 'jumpiness' of reliability for ENS was observed (Figure 9(f)). The differences of reliability between different forecasts were generally only statistically significant for modest wind (Figure 9(d)), and overall, ENS pronouncedly outperformed TREPS for reliability, with the optimal-probability technique generating some deteriorations in both of them.
The AROC comparison indicated that ENS provided a significant improvement in discrimination over TREPS for Figure 8. BS for the forecasts of 6 h accumulated precipitation with thresholds of (a) 0.1, (b) 25, and (c) 100 mm, as well as forecasts of 10 m wind with thresholds of (d) 17.2, (e) 24.5, and (f) 32.7 m s −1 as a function of lead time from various forecast sources. Grey (black) diamonds and asterisks in the bottom (top) axis indicate the lead times for which the significance level of the BS differences is larger than 90% for the comparison of TREPS (TREPS OPT) with ENS and ENS OPT respectively. Crosses (pluses) in the bottom (top) axis indicate the lead times for which the significance level of the BS differences between TREPS (ENS) and TREPS OPT (ENS OPT) is larger than 90%.
light-rainfall forecasting (Figure 10(a)). However, the optimalprobability technique consistently improved the discrimination of TREPS, with no significant differences between TREPS OPT and ENS seen at later forecast times. The impact of this technique on ENS was neutral. For heavy rainfall, the optimal-probability technique did not produce significant changes in AROC for ENS but evidently raised the rank of AROC for TREPS from last to first, especially during later forecast periods (Figure 10(b)). The ENS discrimination was poor for extremely heavy rainfall forecasting; by stark contrast, TREPS displayed good discriminating ability (Figure 10(c)). It is clear that the optimal-probability technique functioned well for the two EPSs, especially for TREPS.
The optimal-probability technique significantly reversed the significant inferiority of TREPS compared to ENS in terms of discrimination for modest wind (Figure 10(d)). However, the degradation in reliability dominated the improvement in discrimination, resulting in partial deterioration in BS (Figure 8(d)). Moreover, the technique did not produce any significant impact on the ENS discrimination. For strong wind, the TREPS discrimination was comparable to or slightly better than that of ENS (Figure 10(e)). The optimal-probability technique played positive roles for both TREPS and ENS, producing more significant improvements in the former. Again, the unsatisfactory performance for reliability associated with the technique contributed more to the poor performance for accuracy than did the exciting performance for discrimination ( Figure 8(e)). It was difficult for ENS to yield probabilistic forecasts for extremely strong wind with sufficient discrimination (Figure 10(f)); this poor performance was significantly improved in the TREPS forecasts, especially for forecasts using the optimalprobability technique.
DETER were compared with the probabilistic forecasts of the perturbed members to further examine the advantage of TREPS in probabilistic forecasts of short-time heavy rainfall. For this purpose, DETER was transformed into binary fields by setting grid points exceeding a certain threshold to a value of 1, while all other points were given a value of 0. DETER, with larger BS, was less accurate than TREPS perturbed forecasts (Figure 11(a)). Furthermore, the optimalprobability technique did not significantly change the accuracy performance. A comparison based on reliability and AROC revealed the advantages of TREPS probabilistic forecasts over DETER, with the most significant improvement in reliability during the period of TC landfall (Figures 11(b) and (c)). The impact of the optimal-probability technique on reliability was ambiguous, with slight improvements (degradations) in the late (middle) period (Figure 11(b)); however, the impact was consistently significant and positive on the discrimination (Figure 11(c)).

Results of case-study
Forecasts of Rammasun initialised at 0000 UTC 17 July 2014 are presented in this section to further illustrate the performance of both deterministic and probabilistic guidance of TREPS.

Deterministic guidance
Among the various forecasts, the southwestward bias in TC track forecasting was almost the least for TREPS EM; the landfall locations at both GD and GX were accurately predicted by TREPS EM. Although there was a significant bias for TREPS EM after Figure 11. (a) BS, (b) reliability, and (c) AROC, for forecasts of 1 h accumulated precipitation with thresholds of 15 mm as a function of lead time from various forecast sources. Pluses on the markers of TREPS (TREPS OPT) indicate the lead times for which the significance level of the score differences between TREPS (TREPS OPT) and DETER is smaller than 90%. Crosses in the top axis indicate the lead times for which the significance level of the score differences between TREPS and TREPS OPT is smaller than 90%. landfall at GX, some perturbed members did predict tracks similar to the best track. Thus, TREPS could well capture the possibility of the best track by properly predicting the potential spread of the track. In addition, the track spread increased as the track error of TREPS EM grew after landfall at GX, suggesting the success of TREPS in predicting the forecast uncertainty. Although evident RI was observed, nearly all forecasts failed to capture this scenario (Figures 12(b) and (c)). However, TREPS EM exhibited the most accurate prediction for both intensification rate and peak intensity. Furthermore, the rapid weakening after landfall was predicted accurately by TREPS EM. It was noteworthy that some perturbed members predicted peak intensity close to the best track despite their low biases. Thus, TREPS could capture the potential threat and thereby displayed the value in disaster prevention. As well, the intensity spread increased during the RI period and reached a maximum at the time of peak intensity, confirming once again the capability of predicting forecast uncertainty.
Overall, TREPS predicted the rainfall amount better than ECMWF (Figure 13). For heavy rainfall in HN and southwestern GD, TREPS PM exhibited the highest skill (Figure 13(b)), followed by TREPS OPT (Figure 13(c)). Note that the differences in the spread of light rainfall between TRAMS9KM and ECMWF were clear, with more underestimations in the former. Short-time heavy rainfall observed in both the strait and HN, with several spots exceeding 50 mm h −1 (Figure 13(h)), was predicted well by TREPS PM for both location and amount but significantly underestimated by both DETER and TREPS OPT. Generally, strong-wind forecasting did not show remarkable differences among TREPS forecasts in this case (not shown).
In this case-study, TREPS OPT did not display any improvement and even caused slight degradation relative to TREPS PM, which was also the miniature of the optimal-member technique for TREPS. To investigate the cause of the incomplete effectiveness of this technique, the four members used to calculate the forecasts of TREPS OPT for this case were compared with TREPS EM for track and intensity (Figure 12), which showed that there was no member which can match TREPS EM consistently for both track and intensity over the entire period. In fact, for the four selected members, one often matched TREPS EM much better at partial forecast times for track or intensity but then deviated from TREPS EM more seriously at other forecast times than the others. Thus, simple averaging over these selected members at a given forecast time would inevitably lead to unreasonable smoothing, and thereby unrealistic results. This smoothing was more serious for rainfall (wind) with larger amount and smaller scale, resulting in worse performance of the optimal-member technique in heavier rainfall (stronger wind). Moreover, (extremely) strong wind seemed to be more localized than (extremely) heavy rainfall (Figures 13(g) and (i)), since there were different characteristics of scale and distribution between wind and rainfall. Therefore, the optimal-member technique worked better in forecasts of rainfall than those of wind ( Figure 6).
The principles used to select the optimal member introduced in section 4.1 might be responsible for its deficiency, because the excessive smoothing of averaging often occurred when there were too many members with the same minimum cost function. Nevertheless, this excessive smoothing was relatively rare for ECMWF ENS owing to large differences in the cost functions among the various members, which was likely attributable to calculating the cost function with reduced forecast times.

Probabilistic guidance
TREPS provided higher probability of extremely heavy rainfall covering areas that better matched the observations than ENS (Figure 14). This increased the probability of detection, which was consistent with the higher AROC results shown in Figure 10(c). The optimal-probability technique enhanced the magnitude of the high probability and lessened the area of the low probability (Figures 14(b) and (d)) as the common result of reducing ensemble size. Consistent with its goal of selecting ensemble members closer to the true state in calculating probability, the effectiveness of the optimal-probability approach was confirmed based on its results of increasing realistic high probability while reducing false low probability.
For short-time heavy rainfall, DETER missed its occurrence around the peninsula of GD (Figure 13(h)) that was highlighted in the TREPS probabilistic forecasts with probabilities above 40% (Figure 14(e)). Although the optimal-probability technique increased the forecast probability at northeastern HN from 80 to 85% (Figure 14(f)), indicating an improvement in guidance for forecasters, the areas with non-zero probability were reduced in other locations, resulting in mixed impacts. In southern HN and the coastal area of southwestern GD, this reduction in area indicated some deterioration in identifying the potential threat; in other areas, false alarms due to overestimations could be alleviated in this situation. Heavy rainfall exceeding 50 mm h −1 in the strait and northern HN was well detected by TREPS; however, the enhanced high probability owing to the optimal-probability technique was apt to produce misleading guidance if biases were present in the forecasts of rainfall location (Figure 14(f)). Figure 13. 6 h (shaded) and 1 h (contour) accumulated rainfall of (a-f) 36 h forecasts and (g-h) AWS observation, as well as (i) 10 m wind of AWS observation, at 1200 UTC 18 July 2014 (unit: mm). The range of (a-g) and (i) is the same as VA shown in Figure 2, while the range of (h) is the same as the dashed rectangle shown in (g). Shaded colour settings in (a-f) are the same as those in (g), while contour colour settings in (a-c) are the same as those in (h).
Compared with ENS, TREPS indicated the possibility of strong wind occurring around the peninsula of GD with larger forecast probability (Figure 14(a)); TREPS also predicted larger areas with non-zero probability, especially at southeastern HN, where strong wind was observed (Figure 13(i)). In this manner, TREPS showed considerable advantage over ENS in identifying areas with severe weather. Nevertheless, false alarms in probabilistic forecasts were still more notable in TREPS compared with ENS, suggesting the necessity of improving the reliability for TC wind forecasting. The impact of the optimal-probability technique for strong wind was overall similar to the above finding for heavy rainfall.
The optimal-probability technique brought more improvement than degradation in application to TREPS but produced slight impacts for ENS. The risk of reducing ensemble size may be responsible for the degradation, as the excluded ensemble members could still provide useful information in the description of probability. Moreover, because the ENS ensemble members varied more significantly in terms of cost function, the threshold THj was usually set larger to include more members; as a result, the optimal-probability technique excluded fewer members overall, resulting in a small influence on ENS.

Conclusions and discussion
With the goal of filling the current void in the development and application of high-resolution regional EPS for operational TC forecasting in China, TREPS, a mesoscale EPS based on GRAPES, was established for a planned operational run by the end of 2017. In this article, TREPS was fully described and its performance in forecasting landfall TCs in China was examined for the period 2014-2016.
TREPS is composed of one unperturbed and 30 perturbed members with horizontal resolutions of 0.09 • . It issues 60 h forecasts twice per day and focuses solely on TCs in WNP and SCS. Initial perturbations, generated by blending ICDp from ENS with BRp implemented around TCs from GRAPES 3D-Var are added to the ICs of unperturbed members to produce the ICs of perturbed members. Perturbations for the LBCs are downscaled from ENS, with additional perturbations in surface temperature around the TC also considered. The uncertainties in the model physics are considered by combining MP and SPPT. The parameters exerting key impacts are determined using a pragmatic approach based on WARNING using tuning.
The contributions of different perturbation methods for different perturbation sources to the total forecast perturbation around a TC were investigated firstly to determine the rationality of the TREPS design. Although there were different contributions among various perturbation methods to the total forecast perturbations in different regions, at different scales, and for different variables, these methods mutually complemented in increasing the total forecast perturbation in the whole region, at every scale, and for all variables. Note that some benefitlimiting deficiencies were present in some of the perturbation methods. Further improvements are required for the balance constraints used in generating BRp, the range of implementing TSp, and the setting of key parameters determined by only several case-studies now.
Objective verification for both deterministic and probabilistic guidance was carried out to evaluate the performance of TREPS and to investigate its advantages over ECMWF during the past 3 years. Additionally, the case of Rammasun initialized at 0000 UTC 17 July 2014, was studied to investigate the performance of TREPS more intuitively and to further understand the underlying physical causes of its performance.
TREPS EM provided better track forecasts than either HRES or DETER but was comparable to ENS EM. In terms of intensity forecasting, TREPS EM significantly outperformed the other forecasts in the first 36 h of forecasts for P min and yielded the highest skill for V max . TREPS produced the ensemble spread more consistent with the EM error than ENS by effectively reducing the underdispersions in intensity forecasting. The case-study showed that TREPS EM could indeed provide more accurate prediction of track, intensification rate and peak intensity than the other forecasts. Furthermore, the ensemble spread of TREPS could not only properly estimate the forecast uncertainty but also well capture the potential threat from high-impact weather, for both track and intensity. Since the track (intensity) forecasting was overdispersive (underdispersive), the perturbations used to describe the uncertainties in large-scale circulation controlling the TC motion seem to be adequate, while the perturbations used to represent the mesoscale uncertainties in the TC structure related to its intensity might still be insufficient. Accordingly, the introduction of more small-scale perturbations to TREPS may be responsible for its more skilful and reliable performance for intensity than ENS. However, the TREPS intensity forecasting was still underdispersive, indicating the necessity of increasing the ensemble spread. Likely, further improvements in the perturbation methods mentioned above could partially address this issue.
For 6 h accumulated rainfall and 10 m wind forecasting, TREPS PM was considerably superior to both HRES and ENS PM for almost all thresholds. However, TREPS PM significantly outperformed DETER only for heavy and extremely heavy rainfall, but underperformed for light rainfall; in terms of wind forecasting, TREPS PM was generally better than DETER. The advantage of TREPS PM over DETER for strong wind was not as significant as that for heavy rainfall. Besides, TREPS PM displayed obvious improvements over DETER in short-time heavy rainfall forecasting. The advantages of TREPS PM over ECMWF and DETER for heavy rainfall were well demonstrated in the case-study.
In order to improve deterministic guidance for both rainfall and wind forecasting from EPS, an 'optimal member' technique to select members closest to both EM and the latest WARNING observations was proposed and will be used in the operational version. The effectiveness of this technique was well confirmed in ECMWF forecasts for both rainfall and wind; however, the technique produced mixed results in TREPS forecasts, with significant improvements over PM in light rainfall but degradations in strong and extremely strong wind. As was illustrated in the case-study, simply averaging over the ensemble members with nearly the same minimum cost function resulted in excessive smoothing, which may partially explain the unsatisfactory performance of this technique. Thus, the principles used to select the optimal member require improvement to cope with this issue. Applying this technique to TREPS and ENS led to differing impacts, which probably arise from differences between the two EPSs in principles used in selecting the optimal member. Therefore, further investigation of the differing impacts is expected, to further improve this technique. Considering variations in the performance of ensemble members at different forecast times, the patched-together forecasting proposed by Ancell (2013) may be useful and will be tested in the future.
ENS yielded significantly more accurate probabilistic forecasts than TREPS for light rainfall, heavy rainfall and modest wind. At root, the poor performance of TREPS in both reliability and discrimination for these variables was responsible for its poor accuracy. The advantage of TREPS in accuracy over ECMWF was apparent in the forecasts of extremely heavy rainfall, strong wind, and extremely strong wind, but with limited significance in all three cases. This advantage of TREPS stemmed primarily from its better discrimination, as its superiority in reliability was not highly significant. More serious underforecasting was present in TREPS than in ENS for light rainfall forecasting, causing TREPS to yield less reliable forecasts than ENS for this variable. One suspected cause for this is the difference in representing stratiform precipitation between TRAMS9KM and ECMWF, which may mainly influence light-rainfall forecasting.
Similarly, the poor reliability of TREPS in predicting both heavy rainfall and modest wind resulted from its serious overforecasting. From a high-temporal-resolution probabilistic standpoint, the TREPS probabilistic forecasts showed outstanding improvements in accuracy, which was mainly attributed to its improved reliability over DETER during TC landfall. This reveals the evident advantage of ensemble forecasting over single deterministic forecasting in predicting the threat of short-time heavy rainfall, as the important signal of threat missing in the latter is covered well by some ensemble members. The capacity of TREPS to highlight severe weather was confirmed intuitively in the case-study.
An 'optimal probability' technique based on the optimalmember technique was developed to improve probabilistic guidance. In this technique, the ensemble members close to the optimal member within a certain threshold were selected to construct a new ensemble that produced a new forecast probability. This technique resulted in consistent improvements in the discrimination for both rainfall and wind forecasting in TREPS and substantial improvements in the TREPS reliability for rainfall forecasting. Generally, accuracy for rainfall forecasting was improved by this technique. By contrast, the impact of this technique was mixed on ENS, with major deterioration in reliability and partial improvement in discrimination. For short-time heavy rainfall, this technique contributed the most improvement in discrimination. Both the effectiveness and limitations of this technique were demonstrated in the casestudy. Its mixed performance was partially related to the risk of reducing ensemble size, as excluding ensemble members distant from the optimal member is not guaranteed to preserve all useful information in the original EPS. Hence, further work is indispensable in investigating this risk and exploring more appropriate strategies to improve the optimal-probability technique. Note that the bias was still considerable in the TREPS forecasts of both light rainfall and modest wind, even when the optimal-probability technique was used. Consequently, calibration of TREPS will be essential in following work.