Evaluation of tropical cyclones over the South China Sea simulated by the 12 km MetUM regional climate model

The ability of a newly developed non‐hydrostatic regional climate model (RCM) based on the Unified Model of the UK Met Office (MetUM) at a resolution of 12 km is examined for the simulation of tropical cyclone (TC) activity affecting the South China Sea and is compared with the current version of the MetUM RCM at the resolution of 25 km. The results show that both the 25 and 12 km models can reasonably simulate the TC‐associated large‐scale environments, while the 12 km model has a better ability to simulate the South Asian monsoon. Compared with the 25 km model, the 12 km model generally improves the simulation of track density and the radial wind structure of TCs. However, the annual cycle of simulated TCs show that both models tend to overestimate the TC frequency in May and November–January while underestimating the frequency in June–September. Compared with the 25 km model, the 12 km model produces fewer intense TCs with 10 m maximum wind speeds >30 m s−1. It is also found that both the 12 and 25 km models reproduce the observed modulation of TC activity associated with different phases of the El Niño/Southern Oscillation (ENSO) such as the reduced track density and accumulated cyclonic energy during El Niño events, while the 12 km model better captures the TC–ENSO response including the track density and the large‐scale environments than the 25 km model.


Introduction
The South China Sea (SCS) is a semi-enclosed basin located to the west of the western North Pacific (WNP) and is one of the most active tropical cyclone (TC) regions on the planet . Countries adjacent to the SCS, including China, the Philippines and Vietnam, are areas particularly vulnerable to TC activity (Kossin et al., 2016). The TC activity in the SCS is closely related to the South Asian monsoon system (Lee et al., 2006;Yumul et al., 2012;Wu et al., 2013) with the start of the main TC season in the SCS usually associated with the onset of the monsoon in May. The low-level convergence associated with the South Asian monsoon and the west Pacific trade winds form the intertropical convergence zone (ITCZ) or monsoon trough (MT) from the north SCS to the warm sea surface east of the Philippines. The Mei-Yu front is also formed in May-June by the convergence between the South Asian monsoon and the midlatitude northeasterlies. The MT and Mei-Yu front provide tropical disturbances and moisture convergence for the cyclogenesis and maintenance of TCs. The main TC season is usually terminated by the southward retreat of the South Asian monsoon system in September-October.
The natural climate variability of TC properties is found to be closely associated with the El Niño -Southern Oscillation (ENSO) over the WNP (Chan, 1985). For example, during El Niño (La Niña) events, TC activity tends to be enhanced (weakened) to the east of the Philippines and weakened (enhanced) over the SCS (Camargo and Sobel, 2005;Camargo et al., 2007). One of the most important drivers of such responses is the change in the atmospheric Walker circulation, which can affect the largescale conditions such as vertical wind shear and relative humidity (Camargo et al., 2007;Bell et al., 2013).
For the period 1992-2012, the observed TC frequency over the WNP has been shown to have significantly decreased by 0.45 TC per year, while the averaged TC intensity (measured by maximum near-surface wind speed) has increased by 0.28 m s −1 per year (Lin and Chan, 2015). However, there is some uncertainty in these findings due to differences in the methods used to determine intensity in the observations by the different TC reporting agencies (Landsea et al., 2006;Barcikowska et al., 2012). In recent years, it has been found that the socio-economic losses associated with TCs have increased over the WNP (Welker and Faust, 2013). For example, in 2012 the super-typhoon Bophai made landfall in the Philippines and left over 1000 people dead and caused losses of 1.04 billion US dollars (Daniell et al., 2013). In 2013, super-typhoon Haiyan made landfall in the Philippines and left over 6000 people dead and caused losses of 2.02 billion US dollars (Daniell et al., 2013). These super-typhoons are recorded as the two most destructive TCs over the WNP in recorded history. Moreover, other extreme TCs in recent years such as typhoon Nesat (2011) and Rammasun (2014) caused combined losses of more than five billon US dollars (NDRRMC, 2012;Mühr et al., 2014). However, such increases in the TC-associated socio-economic losses can also be the result of increases in the vulnerability to TCs as the result of population growth in coastal regions (Welker and Faust, 2013).
With the increasing socio-economic impact of TCs across the SCS, it is important to improve our understanding of their climatology in this region and what factors control their variability. The most suitable tools for such studies are numerical models of sufficient resolution. However, it is important to understand the physical mechanisms of how models simulate TCs and the associated uncertainties, such as the impact of different model configurations. Moreover, the ability of numerical models to simulate TCs in the current climate needs be tested in order to provide confidence in their use to provide future projections of TCs. In particular, models need to demonstrate their ability to simulate the climatologies of TCs including extreme TCs. Studies based on global climate models (GCMs) with resolutions <100 km have demonstrated some ability to reproduce realistic TC climatologies for the current climate (Bengtsson et al., 2007;Murakami and Sugi, 2010;Manganello et al., 2012;Bell et al., 2013). However, these GCMs are commonly found to underestimate the intensity of TCs which is generally considered to be due to limited spatial resolution leading to poorly resolved pressure gradients and deep convection inside the TCs (McBride, 1984;Emanuel, 1991).
To improve the simulation of TCs, GCMs with comparatively high resolution (<50 km) have been applied in recent studies. These studies have indicated improvements in the simulation of organized deep convection in TCs (Chauvin et al., 2006;Oouchi et al., 2006;Zhao et al., 2009;Reed and Jablonowski, 2011;Manganello et al., 2012;Strachan et al., 2013;Wehner et al., 2014;Roberts et al., 2015;Walsh et al., 2015) and the simulated interannual variability of the TC track distribution (Vitart and Stockdale, 2001;Manganello et al., 2014). However, increasing the model resolution in GCMs is computationally very expensive. One possible solution to this problem is to use variable-resolution GCMs with enhanced horizontal resolution over a particular region to simulate TCs with less computational cost (e.g. Zarzycki and Jablonowski, 2014;Harris et al., 2016;Hashimoto et al., 2016). Another approach is to use dynamical downscaling that nests a limited area or regional climate model (RCM) within a GCM. Such techniques simulate TCs within a focused downscaling domain of interest and require less computational resources than GCMs of the same resolution so that higher spatial resolutions can be used. Such methods have been applied in many studies of TCs and have indicated a good ability to simulate TCs and their properties (Walsh et al., 2004;Landman et al., 2005;Knutson et al., 2007;Stowasser et al., 2007;Kanada et al., 2013;Kim et al., 2014;Wu et al., 2014;Zhou et al., 2016). However, RCMs can still underestimate the intensity of TCs compared to observations (Walsh et al., 2004;Landman et al., 2005;Knutson et al., 2007;Stowasser et al., 2007) and the choice of downscaling domain can influence the simulated development and life cycle of TCs (Landman et al., 2005). Horizontal resolutions ranging from 1 to 5 km are required to correctly resolve the inner cores of TCs so that intensities can be correctly produced (Davis et al., 2008;Kanada et al., 2013); however, the use of such resolutions is often restricted to case-studies or a limited number of TCs (Feser and Storch, 2008;Hill and Lackmann, 2011) because of the computational cost, even for RCMs.
Previous studies using ensemble methods have revealed a considerable model uncertainty in simulating the distributions of TCs using ensembles of coarse-resolution GCMs (Camargo, 2013). RCM model uncertainty has also been discussed using ensembles driven by reanalysis data (Jin et al., 2016). In addition, ensembles using a single RCM driven by perturbed initial conditions have shown the considerable internal variability of TCs (Wu et al., 2012). Although the ensemble method can account for model uncertainty and internal variability, its application is strongly subject to the balance between the required horizontal resolution needed to resolve the physical processes inside TCs and the ensemble size (Done et al., 2013). Large ensembles are required when the signal-to-noise ratio is low. Moreover, with limited computational resources, it is computationally prohibitive to use large ensembles at the spatial resolutions sufficient to simulate TC properties close to reality and it is difficult to robustly quantify model uncertainty in simulating TCs when only a limited number of ensemble members are possible. Hence, it is still important to use single-model experiments to better understand the physics of model biases and the response of simulated TCs to different model configurations, including the effect of the spatial resolution (Davis et al., 2008), dynamical core , lateral boundary scheme (Wang et al., 2010), physical parametrizations (Kepert, 2012;Green and Zhang, 2013;Jin et al., 2014;Lim et al., 2015), the size of domain (Landman et al., 2005;Goswami and Mohapatra, 2014) and spectral nudging (Choi and Lee, 2016). To make sure that TCs are simulated realistically in RCMs, the resolution and the choice of domain needs to be carefully selected.
The RCM system known as 'PRECIS' (Providing Regional Climates to Impact Studies), developed from the Met Office Unified Model (MetUM version 4.7), has been widely used for climate change impact studies in developing countries (Xu et al., 2006;McSweeney et al., 2012;Rao et al., 2014;Van Khiem et al., 2014). A recent study by Redmond et al. (2015) used this model, at 25 km resolution, to project the TC activity affecting the SCS and Vietnam under climate change. Recently, the Met Office has developed a new non-hydrostatic RCM based on the MetUM version 8.2 with an improved model resolution (Wang et al., 2013). It is important to evaluate the reliability of this model in simulating TCs and its potential improvement compared with the previous PRECIS model if it is to be used in the future for impact studies of TCs over the SCS. It will also provide useful information on the model performance in simulating TCs in the SCS for future users. Understanding the bias in the simulation of TCs also helps improve the model. Therefore, the main aim of this study is to assess the simulation of TCs affecting the SCS for the current climate by this new version of the RCM using a horizontal resolution of 12 km.
The first objective of the study is to evaluate the ability of the new RCM to simulate the large-scale environments associated with TCs in the SCS. This will examine a series of diagnostics of the large-scale environment that are important to TC activity. The second objective is to assess the ability of the new RCM to simulate the climatology of TCs including their distribution, intensity and composite structure and to relate these to the large-scale environment in order to understand their biases. The final aim of the study is to present the ability of the RCM to simulate the response of TCs to different phases of ENSO. These aims will be addressed by downscaling both a reanalysis dataset and data from a simulation of the HadGEM2-ES GCM for the current climate. This will allow a determination of the importance of the biases in the driving data and how these affect the downscaled simulations. The diagnostics of the TCassociated environmental variables and TCs from the downscaled reanalysis simulations are compared with the driving reanalysis data (to verify the TC-associated environmental variables) and the observed TCs to identify systematic biases in how the RCM represents TCs. The environmental variables from the downscaled GCM simulations are compared with those from the GCM driving data along with those from both the reanalysis and the downscaled reanalysis simulations to further indicate how the RCM performs. Throughout the study, a comparison is performed between the new 12 km MetUM RCM and existing simulations using the older PRECIS version with horizontal resolution of 25 km, to further assess the potential improvement of the latest non-hydrostatic MetUM RCM. The study is expected to help in understanding the credibility of applying the latest non-hydrostatic MetUM RCM to the future projection of TCs under greenhouse gas (GHG) emission scenarios, reported in a following publication, and to provide information to potential users of output from the new system as to its performance for TCs in the SCS.
The article continues with a description of the models and methods in section 2, results presented in section 3 and discussion and conclusions in section 4.

RCM models
The latest high-resolution RCM based on the MetUM version 8.2 used in this study is run with a horizontal resolution of 12 km and 63 vertical levels. The configuration of this version of the model is similar to that described by Wang et al. (2013) and Walters et al. (2011). It uses an atmospheric dynamical core based on the compressible, non-hydrostatic Navier-Stokes equations (Davies et al., 2005). The advection scheme applies the Eulerian scheme for density and the semi-Lagrangian discretization for prognostic variables. The semi-implicit method is used for time integration. The model is run on a rotated-pole coordinate (with the North Pole at latitude 77.61 • and longitude 295.22 • ) with an Arakawa C staggered grid (Arakawa and Lamb, 1977). The vertical model coordinate uses the terrain-following hybrid-height coordinate with Charney-Philips staggering. The prognostic variables in the dynamical core include wind components (u, v, w), potential temperature, Exner pressure, dry density and mixing ratios of moist quantities. The model includes a set of physical parametrizations for clouds (Wilson and Ballard, 1999), convection (Gregory and Rowntree, 1990), radiation (Edwards and Slingo, 1996), the boundary layer (Lock et al., 2000) and gravity wave drag (Palmer et al., 1986). The Met Office Surface Exchange Scheme version 2 (MOSES-II: Essery and Clark, 2003) is used as the land surface component in the model.
To determine the potential improvements in the simulation of TCs and their associated large-scale environments using the latest 12 km model, the results are compared with those from existing simulations with the PRECIS system produced for a previous study (Wang et al., 2017) for Southeast Asia which have demonstrated some ability to reproduce TCs in the current climate over the SCS. PRECIS uses a hydrostatic model based on the older MetUM version 4.7 with a horizontal resolution of 25 km and 19 vertical levels and is the same dynamical core used in the Met Office HadRM3P RCM (Massey et al., 2015). The atmospheric component of the model is similar to that of the HadCM3 GCM (Gordon et al., 2000). The vertical coordinate uses a terrainfollowing hybrid-pressure scheme (Simmons and Burridge, 1981) and a rotated latitude/longitude horizontal coordinate with the North Pole at latitude 75 • and longitude 289 • . The prognostic variables in the dynamical core include surface pressure, wind components (u, v), water content (vapour plus liquid), cloud ice and potential temperature adjusted to consider latent heat. The physical parametrizations of the 25 km model are similar to those in the 12 km model.

RCM boundary conditions
The ECMWF ERA-Interim reanalysis is used to provide lateral boundary conditions for both the 25 km (ERAI_25) and 12 km (ERAI_12) models. Due to the restricted simulation length for the previously produced ERAI_25 simulation that only covers the 1990-2005 period, ERAI_12 was also run for the same period for comparison. ERA-Interim is a global atmospheric reanalysis produced with a spectral model at a spectral resolution of TL255 (∼80 km) with 60 vertical levels (Dee et al., 2011). It should be noted that both RCM models are land surface-atmosphere coupled models and are not coupled to an ocean or sea-ice model so that prescribed SSTs and sea-ice are required. For both ERAI_25 and ERAI_12, the prescribed SSTs and sea-ice fraction are from Reynolds et al. (2007) which have been interpolated to each RCM model grid using bilinear interpolation.
For downscaling the GCM, the HadGEM2-ES model output during 1961-2005 is used to drive both the 25 km model (Had2_25) and 12 km model (Had2_12). HadGEM2-ES is a coupled atmosphere-ocean model from the Hadley Centre of the Met Office (Jones et al., 2011), and is one of the models used in the Coupled Model Intercomparison Project phase 5 (CMIP5). The atmospheric dynamical core of HadGEM2-ES is similar to that of the MetUM version 8.2. The atmospheric component is coupled to an ocean, sea-ice and land surface ecosystem model as prescribed for CMIP5. The resolution of the atmospheric component is 1. The domain of the previous 25 km model run is similar to that of Redmond et al. (2015). This covers the coastal area over the WNP including Vietnam, the Philippines and the southeastern coast of China ( Figure 1). This allows the RCM to simulate the South Asian monsoon in the main TC season. The model domain also covers the whole SCS to allow the simulation of TC genesis over the SCS. The 12 km model domain also covers the coastal areas over the WNP. However, to optimize the cost of available computational resources, a smaller domain is used for the 12 km RCM simulations which covers further east of the Philippines, to include more of the possible TC genesis regions, whilst some other areas covered by the 25 km model are not included in the 12 km model domain, such as east of the Bay of Bengal, the southern Malay Peninsula and southern Borneo. Although differences in the domains may make interpreting the differences between the simulations more difficult, we believe the comparison between the 25 and 12 km models is still relevant.

TC identification
To identify TCs in the various simulations used in this study an objective tracking algorithm is used (Hodges, 1994(Hodges, , 1995(Hodges, , 1999. This system has previously been used in similar studies of TCs (Bengtsson et al., 2007;Manganello et al., 2012Manganello et al., , 2014Bell et al., 2013;Strachan et al., 2013;Roberts et al., 2015). The algorithm first spectrally filters the 850 hPa relative vorticity (RV) fields to the resolution of T42. This acts to reduce the noise in the vorticity at high resolutions so that more reliable and coherent tracks can be obtained. Secondly, the algorithm tracks vortices with positive vorticity centres in the filtered 850 hPa RV fields with values greater than 5 × 10 −6 s −1 by minimising a cost function for track smoothness subject to adaptive constraints on the track smoothness. All tracks that last longer than 2 days are retained for further analysis. The RV fields between 850 and 200 hPa  500, 400, 300 and 200 hPa for the 12 km model) are spectrally filtered to T63 resolution and then the vorticity maxima at each level are iteratively added to the tracks by searching within a 5 • radius of the storm centre at the previous level using a B-spline and steepest ascent approach (Bengtsson et al., 2007). The fullresolution maximum 10 m wind speed within the same geodesic radius is also added to the tracks using a direct search. The criteria for detecting the TCs amongst all the tracked vortices are as follows: a the T63 RV maxima at 850 hPa should be >5 × 10 −5 s −1 ; b the difference in the T63 filtered vorticity maxima between 850 and 250 hPa should be greater than 3.0 × 10 −5 s −1 , for the identification of a warm core structure; c vorticity maxima must exist at each level between 200 and 850 hPa; d the 10 m wind speed maxima >17 m s −1 in the vicinity of the TC centre; e criteria a-d must be valid for 1.5 days and only over the ocean.
As noted above, the diagnosis of the RV in the 25 km model has a different set of pressure-levels from that in the 12 km model. For the 12 km model, it is found that there is a negligible difference between the identified TCs using the whole set of pressure-levels with that using only the four levels (i.e. 850, 700, 500 and 200 hPa). Therefore, all of the pressure levels of RV from both the models have been used for the TC identification in this study. Following TC identification, the minimum pressure within the same 5 • radius of the storm centre is added to the tracks using a combination of B-spline interpolation and steepest descent minimisation.

Verification data
The ERA-Interim data are used to verify the TC-associated environmental variables that come from the downscaled simulations. The TCs identified in the various simulations from the two RCM versions are compared to those from the World Meteorological Organization version of the International Best Track Archive for Climate Stewardship (IBTrACS: Knapp et al., 2010). This dataset provides a synthesized global TC data through combining best-track data for TCs over the WNP from four international forecast agencies. This helps to address the issue of inhomogeneities in TC data due to different technical procedures used by the different agencies, which causes uncertainty for the interannual variation in TCs with relatively weak intensities (Barcikowska et al., 2012).

Genesis Potential Index
The Genesis Potential Index (GPI) is an empirical index of TC genesis developed by Emanuel and Nolan (2005), which synthesizes the influence of various large-scale factors important for TC development. This index is useful to evaluate the realism of the combined simulated large-scale environments associated with TCs in models . The GPI is defined as (1) where η is the absolute vorticity at 850 hPa (s −1 ), H is the 700 hPa relative humidity (%), and V shear is the 250-850 hPa vertical wind shear (VWS, m s −1 ). The maximum potential intensity (MPI, m s −1 ) is computed in the same way as Emanuel (1995) and Strachan et al. (2013): where T s is the sea-surface temperature (SST), T 0 the mean outflow temperature at the level of neutral buoyancy, C k the exchange coefficient for enthalpy, and C D the drag coefficient. CAPE * is the convective available potential energy (CAPE) value for an air parcel lifted from saturation level at sea level in reference to the environmental sounding, while CAPE b is the CAPE value for the boundary layer. Both CAPE * and CAPE b are evaluated at the radius of maximum winds.

The diagnosis of ENSO phases
To investigate the ability of the downscaled HadGEM2-ES to simulate the response of TC activity to ENSO over the 45-year simulation period , the El Niño/La Niña phases as represented in the driving GCM are first determined by calculating the Niño 3.4 index (Gergis and Fowler, 2005). This index uses the monthly mean SST regionally averaged over −5 • N to 5 • N and 120-170 • W. A 5-month moving average is used to remove the high-frequency variability of the monthly regional means. Similar to Bell et al. (2013), the El Niño/La Niña years are determined through the mean Niño 3.4 index during December to February (DJF) of the following year. For TCs in the IBTrACS observations, the observed monthly SSTs from the HadiSST 2.0 dataset (Rayner et al., 2006) (Collins et al., 2008).

The large-scale environment associated with TCs
The ability to simulate the large-scale environment is an important aspect of the model's ability to reproduce TC activity. Therefore, for each model of this study, this section examines the TCassociated dynamical and thermal environments averaged over the main TC season (June to November, JJASON). Considering the limited period covered by the ERAI_25 simulation, the intercomparison of the different downscaled simulations is restricted to the period of 1990-2005. However, the full period (  of the downscaled HadGEM2-ES is used to compare the simulated TC-ENSO response in both the RCMs with that of the observed TCs. The dynamical environment variables include the vertical wind shear (VWS), mid-tropospheric winds (MTW), RV and vertical velocity (omega). Previous studies have shown that TC activity is inhibited by the VWS associated with the tilting of the vortex (Gray, 1968;DeMaria, 1996). Here, the VWS is defined as the magnitude of the difference in the vector wind between 250 and 850 hPa. Considering the close association between the low-level monsoon circulation and the TC activity over the WNP (Wu et al., 2013), the 850 hPa wind field from all the models are analysed to show the ability of the models to simulate the monsoon flow. Moreover, the cyclonic structure and deep convection of TCs requires an environment with positive RV at low levels and large-scale ascending motion throughout the main troposphere. These are analysed here through computing the 850 hPa RV and the vertical velocity averaged over the model levels between 250 and 850 hPa (i.e. 850, 700, 600, 500, 400, 300 and 250 hPa; mass-weighted average is not used as it makes little difference).
The analysis of the thermal environment includes the use of the moist static stability (MSS), convective available potential energy (CAPE) and the mid-tropospheric humidity. The MSS is closely associated with the intensification of winds in a developing TC (Lahaye and Zeitlin, 2016). This is investigated by calculating the vertical difference of equivalent potential temperature between 250 and 850 hPa. The atmospheric instability in each model is also measured by the atmospheric CAPE that is found to be associated with the development of deep convection leading to TC genesis (Zhang et al., 2011) and the intensification of landfalling TCs (Tuluri et al., 2010), although the numerical experiment by Persing et al. (2005) suggests no significant influence of CAPE on the maximum TC intensities. In addition, the development of deep convection within a TC is associated with the latent heat release in the mid-troposphere (Cheung, 2004). To consider this process, the averaged relative humidity for levels between 500 and 700 hPa (i.e. 700, 600 and 500 hPa) is computed. The results for these large-scale environmental variables are presented in Figure 2 for the downscaled ERA-Interim and Figure (k)) compared with ERA-Interim (Figure 2(a)). However, both ERAI_25 and Had2_25 overestimate the VWS across 10 • N in JJASON. It can also be seen that both Had2_25 and Had2_12 overestimate the VWS across 10 • N by about 5.0 and 3.0 m s −1 respectively, which may be related to the overestimation of VWS in HadGEM2-ES (Figure 3(a)).
Compared with ERA-Interim (Figure 2(b)), both ERAI_25 and ERAI_12 (Figures 2(g) and (l)) demonstrate a good ability to capture the location of the MT extending from the SCS to the east of the Philippines, while the MT is less well simulated by HadGEM2-ES and both Had2_25 and Had2_12 (Figures 3(b), (g) and (l)) due to the underestimated tropical easterlies across the north of 15 • N. The wind speed of the southwesterly monsoon is overestimated by about 2.4-4.8 m s −1 . Compared with the reanalysis, it is worth noting that the 12 km model (i.e. ERAI_12 and Had2_12, Figures 2(l) and 3(l)) produces better agreement for the wind speed of the monsoon than those from the 25 km model (ERAI_25 and Had2_25, Figures 2(g) and 3(g)).
The ERA-Interim reanalysis (Figure 2(c)) shows that the area across 15-20 • N is dominated by positive RV at 850 hPa and negative vertical velocity, which corresponds to the location of the MT. The location of these characteristics are well captured by both ERAI_25 and ERAI_12 (Figures 2(h) and (m)) and both Had2_25 and Had2_12 (Figures 3(h) and (m)). However, over the SCS, the simulated RV is over-predicted by about 4 × 10 −6 s −1 in ERAI_25 and 7 × 10 −6 s −1 in Had2_25 compared with the ERA-Interim reanalysis. ERAI_12 over-predicts the RV by 2 × 10 −6 s −1 while Had2_12 over-predicts the RV by about 4 × 10 −6 s −1 , which implies about 40% less bias relative to the 25 km model in terms of the magnitude of the over-prediction of RV. In addition, the 250-850 hPa averaged vertical velocity is found to be underpredicted by 0-8 × 10 −2 Pa s −1 in both the ERAI_25 and ERAI_12 (Figures 2(g) and (l)) and by about 4-8 × 10 −2 Pa s −1 in both the Had2_25 and Had2_12 (Figures 3(g) and (l)). These results imply a stronger MT simulated by the 25 km model (ERAI_25 and Had2_25) compared to the 12 km model (ERAI_12 and Had2_12). For the thermal environments, the MSS in both ERA-Interim (Figure 2(d)) and HadGEM2-ES (Figure 3(d)) show a relatively unstable environment over south of 20 • N from the Indochina peninsula to the Philippines. The MSS across this area is found to be underestimated by about 2-4 K by ERAI_25 and ERAI_12 (Figures 2(i) and (n)) and Had2_25 (Figure 3(i)) compared to the ERA-Interim reanalysis (Figure 2(d)), while the Had2_12 has the best performance in reproducing the distribution of MSS (Figure 3(n)). Moreover, the MSS for both ERAI_12 and Had2_12 is generally higher than those from both ERAI_25 and Had2_25. For the analysis of CAPE, both ERAI_25 and Had2_25 are found to generally overestimate the CAPE over the south of 20 • N while it is found to be underestimated by both ERAI_12 and Had2_12.
All the models including ERAI_25 (Figure 2(j)), ERAI_12 km (Figure 2(o)), Had2_25 (Figure 3(j)) and Had2_12 (Figure 3(o)) produce a well-simulated location of the moist belt across 10-15 • N induced by the moisture transport of the monsoon flow. However, the humidity across 20 • N is overestimated by about 8% for ERAI_25 and underestimated by 3-4% for ERAI_12 compared with the reanalysis. This is also found to be overestimated by 2.5-5.0% for Had2_25.
The seasonal mean MPI and GPI for JJASON during the 1990-2005 period for each model are shown in Figure 4. This shows that all models are able to reproduce the main location of the relatively high MPI and GPI values around 15-20 • N which are in agreement with the reanalyses (Figure 4(a)). This also indicates the potential of the models to correctly capture the observed local maximum of TC genesis/track densities across 15-20 • N (see Figure 6 discussed in section 3.3). However, both ERAI_25 (Figure 4(b)) and Had2_25 (Figure 4(e)) generally overestimate the GPI. In addition, the Had2_25 generally has double the GPI of those from HadGEM2-ES (Figure 4(d)). This is possibly due to the higher relative vorticity and relative humidity as indicated for Had2_25 (Figures 3(h) and (j)) and HadGEM2-ES (Figures 3(c) and (e)), while there is no obvious difference between the MPI values. Meanwhile, both ERAI_12 (Figure 4(c)) and Had2_12 (Figure 4(f)) present smaller GPI compared with the reanalyses as well as those from ERA_25 (Figure 4(b)) and Had2_25 (Figure 4(e)). Overall, from the large-scale analysis above, it is noted that the 12 km model (ERA_12 and Had2_12) tends to produce relatively less favourable environmental conditions for TCs compared with the 25 km model (ERA_25 and Had2_25) which is likely to impact the representation of TCs in both the RCMs.

TC annual cycle
The full annual cycle of the TC frequency for 1990-2005 from IBTrACS and the RCMs are shown in Figure 5  simulate the main TC season during June-November reasonably well. The main TC season over the WNP has been correctly reproduced as shown by the relatively high frequency during June-September and low frequency during January-April. However, it is notable that the RCMs generally underestimate the frequency in JJAS, especially for the downscaled HadGEM2-ES. This can be explained by the stronger VWS (Figures 3(a), (f) and (k)) induced by the overestimated strength of the southwesterly monsoon (Figures 3(b), (g) and (l)). Moreover, Had2_12 tends to underestimate the JJAS frequency more compared with Had2_25, which may be due to the overestimated JJAS mean MSS over the SCS by 1 K for Had2_12 (figure not shown). For both the downscaled ERA-Interim and HadGEM2-ES, the RCMs overestimate the frequency in November-January, especially for the 12 km model. This may be associated with the mean 850 hPa RV in November-January over the SCS, which is overestimated by 1-2 × 10 −6 s −1 in the 12 km model (not shown).

Spatial distribution of TCs
This section discusses and compares the spatial distribution of TCs for the 1990-2005 period for the different RCMs and the observations. The TC seasonal genesis, and track density for JJASON are displayed in Figure 6.   the smaller GPI during JJASON in ERA_12 and Had2_12 as compared with ERA_25 and Had2_25 (Figure 4). For the TC track density over the SCS, both the downscaled ERA-Interim and downscaled HadGEM2-ES experiments underestimate the track density by 5-12 per unit area per season (Figures 6(g)-(j)) compared with IBTrACS ( Figure 6(f)). In addition, east of 120 • E, IBTrACS shows a maximum track density that occurs across 20 • N. The track density simulations of ERAI_25 and Had2_25 tends to be shifted northward with the maximum track density located near 25 • N, while ERAI_12 and Had2_12 simulates a better track density with the maximum density occurring across 22 • N. (a) To investigate further the simulated track densities discussed above, the TCs are categorized into different groups according to their average propagation direction during their lifetime. Figure 7 shows the track densities for tracks with different mean motion directions, including northwestward direction (NW, Figures 7(a)-(e)) associated with the steering of tropical easterlies and northeastward direction (NE, Figures 7(f)-(j)) corresponding to the steering of the monsoon flow. IBTrACS shows that the maximum track density of the NW tracks is around 20 • N east of the Luzon Strait. The density of the observed NW tracks (Figure 7(a)) is generally greater than that of the NE tracks (Figure 7(f)). However, for ERAI_25, it can be seen that the maximum track density of the NW tracks (Figure 7(b)) is underestimated and less than that of the simulated NE tracks (Figure 7(g)). For ERAI_12, the model shows a well-produced maximum NW track density near 20 • N (Figure 7(c)). The simulated maximum NW track density is greater than the maximum NE track density (Figure 7(h)), though the model tends to underestimate the NE track density compared with IBTrACS. For Had2_25 and Had2_12, both models under-predict the NW track density (Figures 7(d) and (e)) and over-predict the NE track density (Figures 7(i) and (j)). This is likely related to the environmental steering induced by the overestimated strength of the monsoon flow in both the Had2_25 (Figure 3(g)) and Had2_12 (Figure 3(l)). However, due to the better simulated southwesterly monsoon, the Had2_12 presents less of an overestimation of the NE track density compared with that for Had2_25. In general, for both downscaled ERA-Interim and downscaled HadGEM2-ES, the 12 km model (ERA_12 and Had2_12) performs better than the 25 km model (ERA_25 and Had2_25) in simulating the track distributions categorized by different propagation directions.  Figure 8 shows the distribution of the mean propagation speed of TCs compared with the TCs in the observations. All the RCMs demonstrate similarities in simulating relatively low speeds over the ocean and higher speeds after recurving poleward and making landfall. For the NW propagation (Figures 8(a)-(e)), all the RCMs simulate the relatively high speed over the Philippines, Indochina peninsula and southern China. However, the magnitudes of propagation speeds are generally under-predicted. For the NE propagation (Figures 8(f)-(j)), all the RCMs can simulate the observed speeds of TCs after making landfall over China. However, the propagation speeds are also found to be slower compared with the observations. It is difficult to explain such under-predicted speeds of TC motion. Although the propagation of TC is highly dependent on the environmental steering flow, the propagation is also dependent on the beta-drift which depends on the intensity of TCs and latitude (Chan and Williams, 1987). These factors will be investigated in future work to understand the mechanisms leading to the biases in the simulated TC propagation.

TC intensity
The simulated maximum 10 m wind speeds and the central meansea-level pressure (MSLP) of the TCs for the 1990-2005 period were investigated to examine the ability of the RCMs to simulate the TC intensity. These parameters are analysed when TCs reach their maximum intensity in terms of the 10 m maximum wind speed. Figure 9 shows the frequencies of TC categorized by 10 m maximum wind speed, which is calculated within the SCS (105-123 • E). It is found that ERA_12 and Had2_12 generally produces less intense TCs with fewer TCs with intensities >30 m s −1 as compared with ERA_25 and Had2_25. The ERAI_12 is also found to overestimate the frequency of TCs <25 m s −1 .
To understand how the underestimated TC intensity in the 12 km model (ERA_12 and Had2_12) may be associated with the large-scale environmental conditions, the TC-associated environmental variables in JJASON at the mean location of TC peak intensity are analysed. For the downscaled ERA-Interim, the 250-850 hPa VWS is 2.5-5.0 m s −1 in both ERAI_25 and ERAI_12. However, the 500-700 hPa relative humidity is between 52.5 and 55% in ERA_12, in contrast to 60-62.5% in ERA_25. Meanwhile, the large-scale ascending motion between 250 and 850 hPa is about −6 × 10 −2 to −2 × 10 −2 Pa s −1 in ERA_12, in contrast to −10 × 10 −2 to −6 × 10 −2 Pa s −1 in ERA_25. Similar results are also found in the downscaled HadGEM2-ES. Moreover, the mean location of the TC peak intensities in ERAI_25 and Had2_25 tend to occur near 19 • N, 117 • E. However, the TC peak intensities in the 12 km model (ERAI_12 and Had2_12) tend to occur near the Luzon Strait where the TCs are influenced by the blocking/channelling effect between the high topography, i.e. Taiwan and Luzon (Hsu et al., 2013;Wu et al., 2015;Liu et al., 2016). This implies that the 12 km model (ERAI_12 and Had2_12) tends to simulate a less favourable environment for TC development as compared with the 25 km model (ERAI_25 and Had2_25). In general, the simulated TC-associated environments can help to explain the under-predicted frequency of very intense TCs in the 12 km model, although the simulation of TC intensity also depends on multiple factors such as the model dynamical core , the size of domain (Landman et al., 2005;Goswami and Mohapatra, 2014) and convection parametrization (Knutson and Tuleya, 2004), etc.
Both the simulated near-surface wind speeds and the central MSLP inside TCs are sensitive to the model physical parametrizations such as for the boundary layer (Kepert, 2012;Zhang and Marks, 2015) and cloud microphysics (Wang, 2002), which can influence the simulated quadratic wind-pressure relation in the gradient wind balance (Knaff and Zehr, 2007). Hence, the wind-pressure relationship of TCs is often tested in numerical models (e.g. Knutson et al., 2007;Manganello et al., 2012). The scatter diagram of 10 m maximum wind versus the central MSLP at the same time shows that all the RCMs can simulate reasonably well the quadratic wind-pressure relationship ( Figure 10). This also implies that the varying sizes of TCs in gradient wind balance are simulated reasonably well. For IBTrACS, the lowest value of central MSLP over the model domain is 904 hPa. However, the strongest storms in all the RCMs are relatively weak in intensity measured by the lowest central MSLP, which is overestimated by around 20-50 hPa compared to IBTrACS.

TC structure
The mean composite structure for the most intense TCs (i.e. category 1-5 with 10 m maximum wind speed >32 m s −1 ; about 70-100 TCs for each RCM) during the 1990-2005 period are shown in Figures 11 and 12. These composites are computed in a similar way to Bengtsson et al. (2007). The TC propagation velocity is removed for each TC before compositing to present the wind composite structure in the system-relative frame of reference. This also removes the influence of the propagation direction on the horizontal asymmetries of wind structure as discussed by Frank (1977) and Rogers et al. (2012). Compared to the observed TC structure based on rawinsonde data (figures 5 and 10 in Frank (1977)), all the RCMs have correctly simulated the low central pressure (Figures 11(a), (c) and 12(a), (c)). The azimuthal average of wind illustrates that the highest tangential wind speed occurs at 850 hPa (Figures 11(b), (d) and 12(b), (d)) which agrees well with the study of Frank (1977). The models have also simulated the main low-level inflow below 850 hPa. The outflow above 250 hPa is also correctly reproduced by the ERAI_12 and Had2_12 (not shown for ERAI_25 and Had2_25 due to lack of archived data). For both ERAI_25 and Had2_25, the simulated outflow below 250 hPa occurs above 700 hPa. This is lower than that in the observed radial wind structure from Rogers et al. (2012), which suggests that the main TC outflow occurs at a height above 7.5 km (300-400 hPa). This implies that ERAI_25 and Had2_25 cannot capture the layer of nondivergence between 300 and 800 hPa, which is possibly related to the coarse vertical resolution (19 levels) of the 25 km model. The 12 km (63 levels) model (ERAI_12 and Had2_12), however, simulates a better vertical radial wind profile with main inflow below 850 hPa and outflow above 200 hPa. The non-divergence layer with weak radial wind between 300 and 800 hPa is also reasonably well simulated by ERAI_12 and Had2_12.

TC-ENSO teleconnection
This section discusses the response of the TC activity to different phases of ENSO from the downscaling of HadGEM2-ES during 1961-2005, in terms of TC track density and Accumulated Cyclone Energy (ACE). The ACE is calculated in the same way as in Bell et al. (2000), and is a measurement of TC activity that combines the frequency, lifetime and wind intensity of TCs. The large-scale conditions associated with the TC-ENSO teleconnection are discussed at the end of this section.
The simulated annual mean TC track densities in El Niño years relative to La Niña years during 1961-2005 are shown in Figure 13. For the observed TCs from IBTrACS (Figure 13(a)), a significantly inhibited (enhanced) TC activity is seen in El Niño (La Niña) years over the main SCS (p value <0.05, using the Wilcoxon rank-sum test). East of 130 • E, the track density is significantly enhanced (reduced) during El Niño (La Niña) events. This result is consistent with the findings of Wang and Chan (2002). For the Had2_25 and Had2_12, the inhibited TC activity in El Niño years is correctly captured by the models when compared with the observed TCs (Figures 13(b) and (c)). However, Had2_25 under-predicts the response of the track density over the coast of Vietnam, while the Had2_12 simulates a better response in the same area. Both Had2_25 and Had2_12 overestimate the track density across the south of 10 • N. This might be due to the overestimated TC frequency in winter ( Figure 5). However, the observed enhancement of track density seen across the east of 130 • E in the observations is not reproduced by either Had2_25 or Had2_12, although both of them exhibit slight increases in track density over the east of the Philippines. This can be explained by the underestimate in track density over the east of Philippines ( Figure 6) due to the overestimated wind speed of the monsoon flow in both Had2_25 and Had2_12. Consequently, the response of the westward track over the east of the Philippines is under-predicted.
The positive correlation between ACE over the WNP in the main TC season and the Niño 3.4 index during July to October (JASO) has been explored by Camargo and Sobel (2005). Figure 14 presents the scatter plot of annual total ACE vs. JASO Niño 3.4 index. For the observed TCs from IBTrACS, the relationship between the JASO Niño 3.4 index based on the HadiSST data and the observed ACE in the SCS shows a negative correlation (−0.22), but this is not significant (p = 0.15). This result implies a weakening (enhanced) ACE in El Niño (La Niña) years, which is consistent with Camargo and Sobel (2005). For the downscaled HadGEM2-ES, both Had2_25 and Had2_12 simulate the weakening ACE in El Niño years reasonably well. The correlation between ACE and JASO Niño 3.4 index is −0.33 (p = 0.03) and −0.24 (p = 0.11) respectively in the Had2_25 and Had2_12.
Under the different phases of ENSO, it is found that the response of the TC-associated large-scale factors strongly affect the response of the TC activity (Camargo et al., 2007). Here, the responses of different large-scale environments during MJJASON to ENSO are analysed and shown in Figure 15. For the dynamical environments in El Niño years relative to La Niña years, the  (Figure 15(a)). This implies a less favourable environment for TCs in El Niño years. The response of the 850 hPa wind field shows that the main SCS is dominated by northwesterlies that leads to a significant slowing down of the easterly trade winds across 10-20 • N (Figure 15(b)). This response leads to an eastward shift of the MT and fewer TCs are steered into the SCS from the east of the Philippines. For the vertical velocity between 250 and 850 hPa (Figure 15(c)), there is a significant increase in descent over the SCS and increased ascent over southern China. The 850 hPa RV also shows a significant increase over southern China but an insignificant decrease over the SCS. The resulting weakened large-scale ascent and cyclonic environment acts to damp the TC activity in El Niño years. For the thermal environments, the ERA-Interim reanalysis presents an increased MSS between 250 and 850 hPa in El Niño years relative to La Niña years (Figure 15(d)). However, this response is not significant, with p value >0.05. The midtropospheric humidity between 500 and 700 hPa also shows a significant decrease (Figure 15(e)). The change of the thermal environments in El Niño years relative to La Niña years also causes a less favourable environment for TCs. In general, the large-scale responses shown by the reanalysis are consistent with the observed inhibited (enhanced) TC activity in El Niño (La Niña) years as presented in Figures 13 and 14.
Figures 15(f)-(j) and (k)-(o) depict the simulated large-scale responses associated with ENSO for Had2_25 and Had2_12 respectively. Compared with the ERA-Interim reanalysis, both Had2_25 and Had2_12 have reproduced the increased VWS across 20-30 • N in El Niño years relative to La Niña years. However, the models have not reproduced the increased VWS over the east of the Philippines. Furthermore, the change of VWS in El Niño years relative to La Niña years is not significant in both Had2_25 and Had2_12. For the 850 hPa wind field, both Had2_25 and Had2_12 have reasonably simulated the increased westerlies over southern China and the northwesterlies over the east of the Philippines. But Had2_25 simulates a smaller increase in the northwesterlies over the SCS and Had2_12 shows an increase in northeasterlies. The simulated change in 850 hPa wind speed exhibits certain similarities when compared with the ERA-Interim reanalysis, such as the accelerated westerlies over southern China and deceleration across 10-20 • N. However, in contrast with the ERA-Interim reanalysis, these simulated changes in wind speed are not significant. The simulated response of 250-850 hPa vertical velocity in Had2_25 shows a slight increase in ascent which is different from the ERA-Interim reanalysis, although the increase in ascent over southern China is correctly simulated. The Had2_12, however, simulates a better change in the 250-850 hPa vertical velocity with increased descent over the SCS. Had2_12 also shows a better simulation of the decrease in RV comparing to Had2_25.
For the thermal environments, the increased MSS over the SCS in El Niño years relative to La Niña years is correctly simulated by Had2_25 and Had2_12, while Had2_12 exhibits a better magnitude of increased MSS. Both models also show the decrease in 500-700 hPa relative humidity over the SCS, but it is not significant (p > 0.05) in contrast to the ERA-Interim reanalysis.
In general, both Had2_25 and Had2_12 have some ability to simulate the change of the large-scale environments in El Niño years relative to La Niña years. However, the simulated large-scale response is generally less significant as compared with ERA-Interim, though these biases may also be related to the difference of sample size between ERA-Interim and the downscaled HadGEM2-ES simulations. The Had2_12 shows a better ability to simulate the large-scale responses, especially for the increased descent and decrease in 850 hPa RV over the SCS. The Had2_25 underestimates the decreased 850 hPa RV and ascent over the SCS, which explains the underestimated response in TC track density over the SCS and Vietnam. It is difficult to explain such biases in the annual cycle of TCs through the biases in the large-scale environments, although it is found that both ERAI_25 and Had2_25 tend to overestimate the VWS in JJAS and the 850 hPa relative vorticity is seen to be overestimated during OND in all the RCMs (figure not shown). Further studies are needed to understand the apparent overestimated genesis in May in the models. This includes the analysis of the simulated Mei-Yu front and its association with the genesis in May-June (Lee et al., 2006).

Discussion and conclusions
The distributions of TC genesis, track densities and the mean propagation speed are realistically captured by all of the RCMs. Compared to the 25 km model, the 12 km model shows a generally better simulation of track density distribution. The 12 km RCM also presents a better simulation of the vertical structure of the radial wind. However, for the analysis of intensity in terms of the 10 m maximum wind speed, the very intense TCs in the 12 km model are less intense than those from the 25 km model. This may be explained by the less favourable environment for TCs in the 12 km model. It is worth noting that the relatively favourable environment in the 25 km model may offset the underestimated frequency of intense TCs, which can be found in models at the horizontal resolution of 25 km (Chen and Lin, 2013;Redmond et al., 2015). The intensities in the models can be also influenced by the thermodynamic structure of TCs (Wang and Wu, 2004). However, due to the lack of required variables, this is not investigated in this study.
The downscaled ERA-Interim simulations generally demonstrate a better representation of TCs and the associated large-scale environmental variables compared to the downscaled HadGEM2-ES experiments. This may be explained by the biases of the simulated large-scale environments in the global HadGEM2-ES, which may be related to the poorly resolved synoptic scales in the driving model due to the coarse model resolution (1.875 • × 1.25 • ), whereas the ERA-Interim reanalysis (with resolution of ∼80 km) provides relatively well resolved synoptic scales (Dee et al., 2011). However, the influence of the driving data resolution to the simulated TCs in RCMs is still not adequately understood and requires further study.
This study compares the simulation of TCs using RCMs with different model resolutions. However, it is difficult to attribute differences between the simulations due to the differences in model resolution alone as the results can also be affected by the different choices of model dynamical core, physical parametrizations and domain size and location used in the two RCMs. Due to limited computational resources, each of these two RCMs is not run with the same resolutions to more directly assess the impact of resolution or to asses the different choices of model dynamics and parametrizations. In addition, the limited period of 1990-2005 for the model intercomparison used in this article should ideally be extended to provide a greater confidence in results, although the length of period used is common for several studies using a high-resolution model (Oouchi et al., 2006;Done et al., 2013;Haarsma et al., 2013).
The results from the downscaled HadGEM2-ES show that both RCMs can capture the observed responses of TCs to the different phases of ENSO. These include the reduced track density and the weakened ACE in the El Niño phase, which are in agreement with the results from Wang and Chan (2002) and Camargo et al. (2007). Compared to Had2_25, Had2_12 shows a better simulated response of the track density. This may be explained by the better simulated response of the 250-850 hPa vertical velocity and the 850 hPa RV in Had2_12. However, most of the simulated responses of the TC-associated large-scale environments are not statistically significant in both the RCMs, which is not what is found for the ERA-Interim reanalysis. The biases in the simulated TC-ENSO responses are not only influenced by the atmospheric responses from the lateral boundary conditions, but are also affected by the simulated SST responses in the driving model. The results from Collins et al. (2008) show that the HadGEM2 model produces a wider but weaker warm SST anomaly for El Niño phases over the tropical east Pacific. Such biases require further study using the ensemble approach (e.g. Vitart and Anderson, 2001). However, it is still a computationally challenging task to extract the TC-ENSO signal from the sampling noise due to the internal variability and has issues concerning the number of ensemble members and differences in driving models. Also, this study does not consider the different types of ENSO and their impacts on the TC activity over the WNP (Kim et al., 2011).
Finally, the evaluation of simulated TCs from different versions of the MetUM RCM shows a reasonable simulation of TCs and their modulation by ENSO in both versions of the model, but further experimentation is required to understand the simulated TCs and associated large-scale biases in these models and the differences between them. The study also indicates some credibility in the use of these models for the future projection of TC activity under global warming. Follow-up work will use both the 25 km model and the 12 km model to study the projection of TC climatologies associated with the changes of the largescale environment under two different warming scenarios, i.e. representative concentration pathway 4.5 (RCP4.5) and RCP8.5 (Moss et al., 2010). The model evaluation in this article will be helpful to interpret the future projection of TC activity using these models. The results may also be of use to existing and future users of the MetUM RCM in the SCS region in assessing its usefulness in simulating TCs in this vulnerable region. Whilst such singlemodel studies can have large uncertainties, as demonstrated here they can still be useful in understanding the relationship between TCs and their environment and they fall between the very highresolution case-studies of TCs and studies using lower-resolution large ensembles.