Modelling the moistening of the free troposphere during the northwestward progression of Indian monsoon onset

Understanding and prediction of Indian monsoon onset is of paramount importance for agricultural planning, which affects food production and the gross domestic product of the country. A recent observational study suggests that the progression of the Indian monsoon onset in a northwestward direction, which is perpendicular to the mean flow, is reinforced by the moistening of the free troposphere by pre‐monsoon showers and wetting of the land surface. As the onset progresses, the mid‐tropospheric dry layer is thought to be constantly moistened from below by detrainment from shallow cumulus and congestus clouds from the southeast. The dry layer becomes much shallower towards southeast India, making the profile closer to moist adiabatic, providing favourable conditions for deep cumulus convection. Increased moistening of the free troposphere thereby pushes the northern limit of moist convection northwestwards. Here, we examine the representation of this process in hindcast simulations from the fully coupled atmosphere–ocean seasonal forecast system of the UK Met Office, GloSea5. The model effectively captures the mid‐level dry‐air intrusion from the northwest that suppresses convection over the northwestern parts of India. We also show that detrainment from shallow convection, measured by moisture tendencies around the freezing level, acts to saturate the free troposphere ahead of the monsoon onset, eroding the dry layer from the southeast. This work suggests that initialized coupled models are capable of simulating dynamic and thermodynamic processes inherent in monsoon progression during the onset.


INTRODUCTION
India receives more than 80% of its annual rainfall during the monsoon season spanning from June-September. The Indian summer monsoon (ISM) is characterized by a sudden onset and a gradual withdrawal. Rainfall over several regions of India shoots up from below 5 mm/day to above 15 mm/day following monsoon onset (Soman and Kumar, 1993). Prediction of the monsoon onset well in advance is crucial for agricultural planning, since the sector contributes to 22% of the country's GDP (Subash and Gangwar, 2014). Using a state-of-the-art Statistical Ensemble Forecasting System (SEFS), the India Meteorological Department (IMD) provides a forecast of the monsoon onset over Kerala, a small state in the southwest coast of peninsular India, 21 days in advance with a reported accuracy of ±4 days. Since 2012, however, IMD has also been issuing forecasts based on a dynamical model, a version of the Climate Forecasting System (CFS), from the National Centers for Environmental The ISM onset is a rather complex phenomenon governed by several processes at different spatial and temporal scales (Li and Yanai, 1996;Wang et al., 2009;Lau and Nath, 2012), which makes its simulation and skilful prediction difficult. The Indian monsoon onset can be considered as a result of the northward propagation of the tropical convergence zone beyond 10 • N (Gadgil, 2003). Li and Yanai (1996) suggest that the large-scale land-sea contrast arising due to sensible heating over the Tibetan Plateau region in spring results in a reversal of the meridional temperature gradient through significant depth in the troposphere, leading to the monsoon onset. Later, Dai et al. (2013) used thermal wind equations to suggest that the upper tropospheric temperature plays a bigger role than surface temperature in triggering the onset.
Traditionally, the IMD declares onset based on sustained rainfall over Kerala. During the monsoon onset, widespread rainfall occurs over Kerala for a few days, along with stronger lower tropospheric westerly wind and higher relative humidity reaching from the surface to at least 500 hPa. If 60% of the available 14 stations in Kerala receive rainfall of about 2.5 mm or more for two consecutive days, with a co-occurrence of strong westerly winds and outgoing long-wave radiation (OLR) values below 200 W/m 2 in a region confined over the north equatorial Indian Ocean, then the onset is officially declared to have occurred on the second day. The climatological onset date is June 1, with a standard deviation of 8 days. However, even this traditional method has a large variability, with earliest and latest onsets differing by as much as 46 days (Ananthakrishnan and Soman, 1988). The ISM onset and progression can also be considered as a result of the first episode of the northward-propagating intraseasonal oscillation (Lau and Waliser, 2011). However, Lee et al. (2013) suggest that monsoon onset is associated more with the biweekly mode of intraseasonal variability than with the northward-propagating 30-60 day mode.
Apart from the onset definition method followed by IMD, there are several other objective methods to define the start of the Indian monsoon. Xavier et al. (2007) defined an onset index based on the tropospheric thermal gradient and determined that the ISM onset is in phase with a reversal of the meridional tropospheric thermal gradient south of the Tibetan Plateau. Fasullo and Webster (2003) introduced a hydrological index for the onset and withdrawal of the monsoon based on the vertically integrated moisture flux being advected towards South Asia. Wang and Fan (1999) and Wang et al. (2001) defined monsoon onset based on the 850 hPa zonal winds over the southern Arabian Sea. Alternatively, a monsoon intensity index based on summer vertical wind shear over a large region was defined by Webster and Yang (1992), corresponding in their view to the integrated diabatic heating over the broad Asian monsoon region. While analysing model simulations, it may be better to use large-scale dynamic indices, as models simulate the large-scale circulation and winds better compared with local precipitation patterns (Alessandri et al., 2015). Moreover, onset definitions based on rainfall over small regions are more susceptible to "bogus onsets" (Flatau et al., 2001). Finally, however, Wang and LinHo (2002) defined a measure of onset at each grid point based on a rainfall threshold, but by considering the seasonality of rainfall relative to the winter rain. This allows the metric to be used objectively for different datasets, particularly those from models that may suffer biases in their mean state. Unlike other onset indices, the Wang and LinHo index is defined for each grid point and hence is a two-dimensional index rather than a one-dimensional definition of the monsoon onset for the country as a whole.
Unlike the mean monsoon winds, which are southwesterly or westerly across the Indian peninsula, the monsoon onset proceeds in a northwestward direction, perpendicular to the direction of the mean flow. This northwestward progression of the rainfall is synchronized with the wider northward and northeastward progression over East Asia and the Western North Pacific and forms part of a "grand onset pattern" starting from a common point over the southeast Bay of Bengal in late April (Wang and LinHo, 2002). Krishnamurti et al. (2012) have shown the role of soil moisture, stratiform cloud and divergent circulations for the progression of the onset isochrones from Kerala to New Delhi using observed datasets and a high-resolution mesoscale model. They found that the motion of the isochrones is very sensitive to the parametrization of soil moisture and non-convective anvil rains immediately north of the onset isochrones. Subsequently, Krishnamurti et al. (2017) noted that an initial enhancement of planetary boundary-layer moisture near the Bay of Bengal leads to moist rivers of moisture; an inflowing stream of buoyancy follows the atmospheric moist river that extends from the Bay of Bengal towards the region of extreme orographic rains. As described by Parker et al. (2016), the spatial progression of Indian monsoon rainfall following the initial onset is affected by extratropical dry-air intrusions over the northwestern part of India. The presence of such dry-air intrusions has also been noted during intraseasonal dry (break) events: Bhat (2006) noticed the presence of dry air from over the deserts around the eastern Arabian Sea instead of marine air from the equatorial region during the 2002 drought; Krishnamurti et al. (2010) showed how dry spells in 2009 are associated with these mid-level dry-air incursions from the West Asian desert regions. There are two ways by which the dry-air intrusions suppress local convection: firstly by thermal inversion, which will suppress convective activity, and secondly by the entrainment of dry air into an ascending parcel, which will reduce its buoyancy (Mapes and Zuidema, 1996;Sherwood, 1999;Parsons et al., 2000). Using ERA-Interim reanalysis and observations, Parker et al. (2016) found that, as the monsoon onset progresses, the mid-level northwesterlies carry dry air over India. These northwesterlies are deeper in the far northwest of India and shoal towards the southeast, since the initial monsoon onset takes place there and it is much easier to advect further supplies of moisture from the nearby ocean. As the onset begins, the dry layer over the southeast is moistened from below by shallow cumulus and congestus clouds (Parker et al., 2016), making the dry layer shoal further towards southeast India. As the onset advances, there will be stronger advection of moisture from the Arabian Sea and also as the flow turns back around from the Bay of Bengal against the east coast, which increasingly moistens the mid-level dry air. In addition, further low-level moistening occurs from recycling of moisture from the newly wetted land surface. Hence, as the onset progresses across India, the mid-level northwesterlies weaken and the free troposphere is moistened ahead of the existing deep convection, helping in the northwestward advancement of the monsoon rainfall.
Despite the importance of the monsoon onset, the modelling community finds it a challenge to simulate the progression of the rains. Sperber et al. (2013) found that the CMIP5 coupled climate models could only capture the Indian monsoon rainfall onset 10 days too late over much of India and the northwestward rainfall progression was not well captured by these models. The inability of the coupled models to capture various aspects of the monsoon can be attributed mainly to an incorrect representation of physical processes such as convection (Bollasina and Ming, 2013) and common model biases, such as in sea-surface temperature (SST), that develop elsewhere in the climate system. The CMIP5 models consistently show development of cold SST biases in the northern Arabian Sea in winter, persisting into spring and summer, resulting in weak monsoon rainfall and a delayed onset (Levine et al., 2013). A similar relationship between Arabian Sea cold SST biases and monsoon rainfall was found (Levine and Turner, 2012) in an early HadGEM3 model configuration (Hewitt et al., 2011). An experiment using the SINTEX-F2 coupled model shows that the monsoon onset estimated from both rainfall and dynamic indices shows a delay of about 13 days, but it occurs 6 days early in the atmosphere-only component of the coupled model (Prodhomme et al., 2015). This suggests that coupling and associated SST biases play a crucial role in the delayed mean onset in coupled models. However, ocean-atmosphere coupling also plays a major role in a realistic simulation of the key factors governing the interannual variability of the monsoon and its onset, such as El Niño (Xavier et al., 2007), and thus is essential for any seasonal prediction system (Krishna Kumar et al., 2005).
In this study, we use the Met Office GloSea5-GC2 fully coupled seasonal prediction model simulations to analyse the Indian monsoon onset and the associated composite evolution of dynamic and thermodynamic fields. We will also validate the effect of mid-level dry-air incursions in the progression of monsoon onset in this coupled seasonal forecasting system. To the best of our knowledge, no qualitative assessment of the effect of dry-air incursions on monsoon onset propagation in global coupled models has been made so far. The GloSea5-GC2 system used in this study is a fully coupled system, initialized in late spring, and hence the cold Map of the study region with June-September rainfall climatology (mm/day) from GloSea-GC2 for May 9 start date as contours.
Red dashed lines represent the monsoon progression isochrones from May 20-July 15. The black solid line represents the axis for the creation of vertical cross-sections from northwest to southeast, used later in this study, and the extended blue dashed line shows the extension of the axis used for the vertical cross-sections of cloud fraction in Figure 9. The three red circles represent the locations close to Jodhpur, Nagpur and Visakhapattanam (from northwest to southeast, respectively). These three locations are used for generating the tephigrams in Figure 7 SST biases that normally develop in the Arabian Sea during winter and spring are smaller in these hindcasts. Thus the system provides an ideal test-bed to determine whether realistic physical mechanisms associated with the monsoon onset are present. Rather than focusing on the statistical skill metrics of the model in simulating the Indian monsoon onset (a companion article, (Chevuturi et al., 2018), will address these aspects), we will use GloSea5-GC2 to understand whether a model can simulate the physics and dynamics of monsoon onset and progression when systematic biases are likely to be quite low. We will follow some of the methods from Parker et al. (2016), looking at atmospheric cross-sections parallel to the direction of monsoon onset progression (the cross-section axis is as shown in Figure 1), to understand the evolution of atmospheric structure. Section 2 gives a description of the data and methods used in the study. The onset and progression of the monsoon as simulated by the model are given in section 3. Section 3.1 explains the composite evolution of dynamic fields and section 3.2 explains the thermodynamic evolution of atmospheric structure and mid-level dry-air incursions following the onset. The results are concluded in section 4.

The GloSea5-GC2 hindcasts
In this study, we analyse the dynamic and thermodynamic features associated with the Indian monsoon onset in the hindcast simulations of the Met Office Global Seasonal Forecast System 5 (GloSea5-GC2: MacLachlan et al., 2015;Williams et al., 2015). The GloSea5-GC2 seasonal forecast system has a horizontal resolution of 0.8 • × 0.5 • in the atmosphere and land-surface components and 0.25 • in the ocean and sea-ice models. The coupled GC2 model consists of the following: the Met Office Unified Model (MetUM) Global Atmosphere 6.0 (GA6.0: Brown et al., 2012;Walters et al., 2017) with a dynamical core known as Even Newer Dynamics for General Atmospheric modelling of the Environment (ENDGame: Wood et al., 2014); the land-surface model Joint UK Land Environment Simulator (JULES) Global Land 6.0 (GL6.0: Best et al., 2011, Walters et al., 2017; the ocean model Global Ocean 5.0 (GO5.0) based on NEMO (Madec, 2008;Megann et al., 2013); and sea-ice model The Los Alamos Sea Ice Model (CICE) Global Sea Ice 6.0 (GSI6.0: Hunke and Lipscomb, 2010;Rae et al., 2015). The coupled model has a vertical resolution of 85 levels in the atmosphere (with model top at 85 km and ≈ 50 levels below 18 km), four soil levels, 75 levels in the ocean and five sea-ice thickness categories. The MetUM and JULES are initialized from daily ERA-Interim reanalysis (Dee et al., 2011) and NEMO and CICE are initialized from GloSea5 Ocean and Sea Ice analysis. The GA6.0 (atmosphere) and GL6.0 (land) models run as part of the same model executable and are "tightly coupled" on the same grid. Similarly, GO5.0 (ocean) and GSI6.0 (sea-ice) models are compiled into the same model executable and operate on the same grid. The ocean-atmosphere coupling of GA6.0/GL6.0 to GO5.0/GSI6.0 is performed every 3 hr using the OASIS3 coupler (Valcke, 2013). This results in the exchange and interpolation of model fields such as fluxes between the models. The momentum, fresh water, heat fluxes and wind stress are passed via OASIS from the atmosphere to the ocean, whereas coupled fields such as SST, surface velocities, ice fraction, ice and snow thickness are passed from the ocean to the atmosphere. The coupling frequency of 3 hr allows the diurnal cycle to be better resolved in both atmosphere and ocean boundary layers . Model uncertainties are represented by a stochastic physics scheme in order to generate ensemble spread: Stochastic Kinetic Energy Backscatter v2 (SKEB2: Bowler et al., 2009). This scheme adds random vorticity perturbations during the model integration by a physical parametrization to replace kinetic energy that has been dissipated. This scheme is described in detail in Bowler et al. (2009). GloSea5-GC2 uses the lagged-start ensemble generation technique (MacLachlan et al., 2015) to represent the initial-condition uncertainty. In GloSea5-GC2, physical parametrizations have improved compared with the earlier versions. This includes a modified microphysics scheme, better representation of sedimentation of small droplets, realistic treatment of in-column evaporation, etc. For more details, please refer to Walters et al. (2017). The hindcasts relevant for study of the Indian monsoon are available for three start dates: April 25, May 1 and May 9, each integrated over 140 days for a period of 20 years spanning 1992-2011. Climate forcings are set to observed values until 2005 and follow RCP4.5 afterwards. GloSea5-GC2 hindcasts consist of eight ensemble members at each start date for years 1992-1995, 2010 and 2011 and five ensemble members for years 1996-2009. The ensemble mean is used in this study.

Reanalysis and observational datasets
Total precipitation, winds and relative humidity from the ERA-Interim re-analysis dataset (Dee et al., 2011) for the period 1992-2011 are used for comparison with GloSea5-GC2 hindcasts. ERA-Interim has a horizontal resolution of 0.75 • × 0.75 • . ERA-Interim relative humidity and winds are regridded to the model grid (using bilinear interpolation) for better comparison of the evolution of the atmospheric structure. ERA-Interim precipitation over land is better than that over ocean, yet it retains a positive bias compared with the Global Precipitation Climatology Project (GPCP: Adler et al., 2003, Huffman et al., 2009) precipitation (Dee et al., 2011). Hence we also use daily precipitation from the APHRODITE gridded rain-gauge dataset (Yatagai et al., 2012) in this study for comparison. APHRODITE data have a spatial resolution of 0.5 • × 0.5 • and comprise station data of high temporal and spatial coverage, especially over the Himalayas. Prakash et al. (2015) suggests that the APHRODITE dataset performs well for the Indian region compared with most other gridded gauge datasets.
To calculate the Wang and LinHo (2002) index (see next section for a description), we have also used pentad mean rainfall estimates for the period 1992-2011 at 2.5 • × 2.5 • resolution from the GPCP. The model data are regridded to the GPCP grid using a distance-weighted average for better comparison.

Onset definition methods
In this study, we have used a monsoon circulation index (Wang and Fan, 1999;Wang et al., 2001)  Spatial distribution of onset pentads obtained by applying the Wang and LinHo (2002) index to (a) GPCP pentad rainfall data set and (b) GloSea5-GC2 hindcasts using the April 25 start date ensemble only, both for the period 1992-2011. Pentads are defined relative to January 1. Hence the climatological onset date June 1 falls in pentad 31 described in section 1. We calculate the Wang-Fan index climatology separately for both reanalysis and model for the years 1992-2011 and use the index value corresponding to June 1 (climatological onset date in Kerala) as the threshold. For each year, the onset date is then defined as the first day on which the index exceeds the threshold value, with the provision that it stays above the threshold for at least seven consecutive days. The condition of persistence for 7 consecutive days is to ensure that the index value is not induced by a synoptic event, but reflects a strong establishment of the southwest monsoon (Wang et al., 2009). In order to validate the model further, we have used the Wang-LinHo index, which shows the spatial progression of the onset at each grid point ( Figure 2). As mentioned in section 2.2, the climatological pentad rainfall from GloSea5-GC2 is regridded to the GPCP grid. The pentad time series is then smoothed with a five-pentad running mean following Sperber et al. (2013) and the relative pentad mean rainfall rate at each grid point is calculated as where R i is the pentad mean precipitation rate for the ith pentad and R JAN is the January mean precipitation rate. Here i = 24 is the earliest pentad for which we could obtain data for each year from the GloSea5-GC2 hindcasts with a start date of April 25. The Julian pentad in which the relative pentad-mean rainfall rate exceeds 5 mm/day is defined as the onset pentad. For more detail, please refer to Wang and LinHo (2002). Note that, by using the climatological pentad rainfall, we are examining the pattern of onset of the mean rainfall (rather than the average of the onset from each ensemble member/year of the hindcast), in a similar manner to Sperber et al. (2013). Wang and LinHo found that the spatial pattern of the onset pentads identified using this index shows a good agreement with the pattern of onset based on rain-gauge observations.

Application of onset metrics in GloSea5-GC2 and model validation 2.4.1 Mean monsoon behaviour in GloSea5-GC2
The GloSea5-GC2 has moderate skill at predicting the all-India monsoon rainfall; however, the skill is much higher at predicting the large-scale circulation . In terms of several metrics, the GloSea5-GC2 seasonal forecast system has skill similar to that of other state-of-the-art seasonal forecast systems; however, coupled biases exist. The GloSea5-GC2 system has a dry bias over India, with an all-India rainfall deficit of 0.72 mm/day owing to the climatologically late onset of monsoon in this model that reduces the precipitation in May and June . This bias is far smaller than the typical bias in the uninitialized coupled models of CMIP5, which is around 5 mm/day in the multi-model mean (Sperber et al., 2013).  found that a coupled mean-state bias in the Indian Ocean leads to erroneous SSTs resembling a positive Indian Ocean Dipole (IOD) in the mean state, leading to a weak relationship between IOD and the all-India rainfall. However, they found that the ENSO-monsoon teleconnection is well captured in GloSea5-GC2. For more details of GloSea5-GC2 hindcasts and their monsoon biases, please refer to Johnson et al. (2017). Jayakumar et al. (2017) subsequently found that air-sea interactions in the central equatorial Indian Ocean in the GloSea5-GC2 hindcasts do not support the initialization and northward propagation of monsoon intraseasonal oscillations, reinforcing the equatorial rainfall mean state bias in GloSea5-GC2. As we shall show, this appears to have little bearing on the evolution of the atmospheric structure during the monsoon onset and its northwestward progression.  Wang and Fan (1999) index (m/s) from ERA-Interim reanalysis (red line) and GloSea5-GC2 seasonal hindcasts for the three start dates shown separately (black/grey lines) for the period 1992-2011. The first time point in Figure 3b corresponds to May 9, which is the earliest day common to all three start date ensembles

Onset metrics applied to GloSea5-GC2
Before applying the onset metrics to GloSea5-GC2, we have validated the efficiency of GloSea5-GC2 at simulating the all-India summer (June-September, JJAS) monsoon mean rainfall (Figure 3a). The all-India (averaged over the country as a whole) JJAS mean rainfall (AIR) simulated by GloSea5-GC2 (mean = 8.49 mm/day) is similar in magnitude to the GPCP AIR (mean = 8.24 mm/day) for the period 1992-2011 and falls within the standard deviation of the GPCP rain (standard deviation = 0.53 mm/day). GloSea5-GC2 slightly overestimates the rainfall in most years; however, in 1997, 2004 and 2007, GloSea5 underestimates the rainfall; these are strong El Niño years and GloSea5-GC2 is too effective at capturing ENSO-monsoon teleconnections . The climatological evolution of the Wang-Fan index calculated from the model hindcasts for different start dates is comparable to that calculated from the ERA-Interim reanalysis (Figure 3b). During the first few days starting from May 9, only the index calculated from the May 9 start date lies close to the reanalysis. As time progresses, this initialization drifts towards the values calculated from the other two start dates of GloSea5-GC2 and these three values are slightly underestimated compared with ERA-Interim. However by mid-June, the index values from the model hindcasts are in close agreement with those from the reanalysis. In an alternative definition, the onset circulation index of Wang et al. (2009), which uses the wind over the southern Arabian Sea region, also shows similar results (not shown). In this article, we have used the Wang-Fan monsoon circulation index as the main onset definition; we will construct composites of the evolution of atmospheric fields relative to this date.
The spatial pattern of the onset in GloSea5-GC2 is validated using the Wang-LinHo index (Figure 2). The Wang-LinHo index calculated from GPCP ( Figure 2a) suggests that, during pentad 31 (June 1-5), rainfall onset occurs over the southwest coast of peninsular India on the westward side of the Western Ghats and the northeastern parts of India near the northern Bay of Bengal. By pentad 34, the onset hits the east coast of peninsular India and then progresses in a northwestward direction over mainland India, covering the whole of the country by pentad 38 (July 5-10). The Wang-LinHo index calculated from GloSea5-GC2 (Figure 2b) shows that the onset occurs on the southwest coast of India during pentad 30. However, the onset hits the northeastern parts of India only by pentad 34, before it progresses in a northwestward direction and covers the whole of India by mid-July. Figure 2a,b show that the onset of the monsoon rainy season occurs progressively northwestwards from the western side of the Bay of Bengal towards inland regions over India; therefore GloSea5-GC2 is very effective at capturing this progression of the onset.
However, the onset pentads on the eastern side of the Western Ghats using the Wang-LinHo index are not well captured in the GloSea5-GC2 hindcasts (Figure 2b). This may be attributed to the fact that the relative pentad mean rainfall rate never exceeds the 5 mm/day threshold on the eastern side of the Western Ghats, which is in turn due to a negative mean rainfall bias in the GloSea5-GC2 hindcasts over this region (not shown). Johnson et al. (2017) show that there is a negative mean rainfall bias of about 1-5 mm/day over the Western Ghats and to the east of the Ghats. However, the precipitation FIGURE 4 Spatial pattern of rainfall (mm/day) relative to the circulation onset at various lags: during circulation onset (mean rainfall for 3 days from the day of onset; first column), 1 week after circulation onset (mean rainfall for 3 days from a week after onset; second column), 2 weeks after onset (mean rainfall for 3 days starting 2 weeks after the onset; third column) and a month after onset (mean rainfall for 3 days from a month after onset; fourth column) from the multi-year, ensemble mean of (a-d) GloSea5-GC2 hindcasts, (e-h) ERA-Interim reanalysis and (i-l) APHRODITE observational dataset. For GloSea5-GC2 hindcasts, the onset days are determined using the Wang-Fan Index for the simulations produced with May 9 start date. For APHRODITE and ERA-Interim, the onset dates are determined using the Wang-Fan index calculated from the ERA-Interim reanalysis details on a fine scale are not perfectly represented in the model, due to its coarse spatial resolution. Figure 4 shows the spatial pattern of the rainfall as it progresses following the onset as defined by the circulation index. We show this at the point of circulation onset (averaged across 3 days), a week after circulation onset (averaged for 3 days from a week after onset), 2 weeks after circulation onset (averaged for 14 th , 15 th and 16 th day after onset) and a month after circulation onset (3 day average from a month onwards). The circulation onset date for each year is identified based on the Wang-Fan monsoon circulation index as described above. The onset dates based on the ERA-Interim wind data are used as the onset dates for APHRODITE too.

ONSET AND PROGRESSION OF RAINFALL
The spatial pattern of rainfall from GloSea5-GC2 shows that circulation onset is characterized by heavy rainfall over the southwest coast of peninsular India, northeastern parts of India and Bangladesh (Figure 4a). As the rainfall progresses following the circulation onset, rainfall extends towards the central and northern parts of India (Figure 4b-d). A week after the circulation onset (Figure 4b), the southeast Arabian Sea near the southwest coast of India and northeastern regions of India receive rainfall greater than 20 mm/day. The rains progress in a northwestward direction over the Indian land region and a month later the monsoon is established over the whole of India (Figure 4d). The progression of rainfall following the circulation onset in the ERA-Interim reanalysis (Figure 4e-h) shows a similar pattern, suggesting that GloSea5-GC2 is simulating the climatological progression of rainfall well. However, as the onset proceeds, the rainfall over the southeast Arabian Sea is more intense and confined to the southwest coast of India in GloSea5-GC2, compared with ERA-Interim. APHRODITE gridded gauge data also show heavy precipitation on the western side of the Western Ghats and northeastern parts of India following the onset, with the rainfall progressing in a northwestwards direction towards central India (Figure 4i-l). Hence it is evident that the simulated spatial pattern of the rainfall progression by GloSea5-GC2 compares well with those from observations FIGURE 5 Spatial pattern of relative humidity (%) and winds (vectors, m/s) averaged over 3 days from onset (first column), 7 days after onset (second column), 14 days after onset (third column) and 1 month after onset (fourth column) from the ensemble mean of GloSea5-GC2 simulations at different pressure levels, averaged over the years 1992-2011. The reference vector is 5 m/s. The white areas represent lack of data due to high orography and reanalysis. We are now in a position to assess the evolution of dynamic and thermodynamic fields surrounding the monsoon onset and to determine whether they correspond to those found in observations.

Evolution of dynamic fields during monsoon advance
We now analyse the atmospheric dynamics in GloSea5-GC2 during the advance of the monsoon. The composite evolution of relative humidity and winds at different levels from the mid-troposphere down to the surface following the circulation onset in GloSea5-GC2 is shown in Figure 5, following a similar analysis for ERA-Interim by Parker et al. (2016). Immediately following the circulation onset, at 850 hPa ( Figure  5i), strong westerlies prevail over southern peninsular India and northwesterlies prevail over northern and central parts of India in the GloSea5-GC2 simulations. Similar characteristics are observed at 925 hPa. The westerlies over southern peninsular India and the equatorial Indian Ocean are confined to the lower levels during the circulation onset (Figure 5i). At upper levels, these winds weaken and become easterlies at 600 hPa over southern peninsular India (Figure 5a). However, the northwesterlies over northern and central India are deep, extending up to 600 hPa (Figures 5a,e,i,m). At 600 hPa, the strong northwesterly flow (≈ 5-10 m/s) extends from Afghanistan across the subcontinent to the southeast coast of peninsular India (Figure 5a). The spatial pattern of relative humidity at 850 hPa suggests that, during the onset, south peninsular India and northeastern parts of India are very humid with 80-85% relative humidity (RH), as the strong winds carry moisture from over the ocean into these regions, whereas the central and northwestern parts of India are drier (RH < 40%) from the pre-monsoon surface conditions and also due to the prevailing northwesterly winds there, which carry dry air from the desert regions in northwest India and Pakistan.
As the monsoon progresses, the lower-tropospheric westerlies become stronger, with maximum intensity towards the south (Figure 5j-l). At mid-levels (600 and 700 hPa), the westerlies gradually extend towards northern peninsular India as the monsoon advances. About one month after circulation onset, a deep region of westerlies covers most areas of the peninsula (Figure 5d,h,l,p). However, the northwesterlies shoal and retreat and change their direction to become northerlies at 600 hPa, a month after the initial onset ( Figure  5d). The northerlies join the westerlies over the north Arabian Sea and advect dry desert air into the north and eastern Arabian Sea (Figure 5d). As mentioned in section 1, the presence of such advection was noticed by Bhat (2006) during the 2002 drought. Over the Bay of Bengal, at 925 hPa, strong southwesterlies prevail even a month after the circulation onset, whereas at 850 hPa the winds become westerlies over the west Bay of Bengal as the monsoon progresses. Similar results were observed by Parker et al. (2016) using ERA-Interim reanalysis data. As the monsoon progresses, the northern plains have much moister air. The central and northwestern parts of India become more humid as a result of the withdrawal of the northwesterly flow. Parker et al. (2016) suggest that these northwesterlies carry dry air from the mid-troposphere that marks the northern limit of the monsoon rainfall progression.
In order to compare the dynamic evolution of the atmosphere from GloSea5-GC2 with that from the ERA-Interim as shown in Parker et al. (2016), we show the difference in the evolution of relative humidity and winds between GloSea5-GC2 and ERA-Interim ( Figure 6). The ERA-Interim dataset is regridded to the GloSea5-GC2 grid using bilinear interpolation. At 925 hPa, the model is much drier above the land, whereas above the ocean the model has a moist bias, except over the west Bay of Bengal (Figure 6m-p). At 850 and 700 hPa (Figure 6e-l), the model is generally drier than the reanalysis. However, as the onset progresses, the equatorial Indian Ocean and southern Bay of Bengal become more humid in the model compared with ERA-Interim at 700 and 600 hPa (Figure 6a-h). Generally, in the mid-troposphere the model is much drier (difference in RH < −15%) and at lower levels the model is warmer (not shown) and drier compared with ERA-Interim over the regions of dry-air incursion. However, the pattern evolution of the relative humidity and winds following circulation onset is well captured by GloSea5-GC2 compared with the reanalysis data.

Evolution of thermodynamic fields during monsoon advance
To study the evolution of the thermodynamic structure of the atmosphere as the monsoon advances through it over India, tephigrams for three locations (as marked in Figure 1) along the path of the climatological monsoon rainfall progression are analysed during and following the circulation onset (Figure 7). The three regions selected for the study are grid points near to Jodhpur, Nagpur and Visakhapattanam, following the upper-air station observations shown at these locations in Parker et al. (2016). All three locations consistently show that, as the monsoon progresses, low-level temperature decreases and relative and absolute humidity at lower levels increases, consistent with the observations by Parker et al. (2016) At all locations, the atmospheric column moistens as the monsoon progresses. The profile is moister towards the southeast. At Nagpur (Figure 7e-h) and Visakhapattanam (Figure 7i-l), higher levels are nearly pseudo-adiabatic, as observed by Parker et al. (2016) The lifting condensation level (LCL) and the level of free convection (LFC) become lower in altitude at each location as the monsoon progresses in time. Using radiosonde data, Parker et al. (2016) found that, as the onset progresses, the vertical profiles of the selected locations show an elevated mixed layer separating low-level and upper-level profiles that are nearly pseudoabiabatic. The elevated mixed layer and a nearly pseudoadiabatic layer near the surface are evident at Visakhapattanam in Figure 7.
To examine the thermodynamic atmospheric structure more carefully along the direction of monsoon advance, Figure 8 shows the vertical sections of different thermodynamic parameters from northwest to southeast India along the axis shown in Figure 1 (black solid line). Vertical profiles of water-vapour mixing ratio can give a fairly good idea regarding the moisture distribution at each level. Around the time of the circulation onset (Figure 8a), the layer of moister air, which is the monsoon layer supplied by moisture to the west over the Arabian Sea, is shallow with mixing-ratio values greater than 10 g/kg confined below the 850 hPa level. Around June 15, the moist monsoon layer deepens compared with the initial onset period. Deepening of the moist layer occurs more rapidly at the southeast end of the section owing to moisture convergence near the Bay of Bengal, the northwest of the section being proximal to the source of dry-air advection. By July 15, the monsoon is well established along the southeast to northwest section and the mixing-ratio values are as high as 10-14 g/kg below 700 hPa in the southeast and 750 hPa in the northwest. Deepening of the layer of high Composite difference (GloSea5 minus ERA-Interim) of relative humidity (%) and winds (vectors, m/s) at different pressure levels averaged over 3 days from onset (first column), 7 days after onset (second column), 14 days after onset (third column) and 1 month after onset (fourth column), averaged over the years 1992-2011. The reference vector is 5 m/s water-vapour mixing ratio is a sign of an increase in vertically integrated moisture flux. Hence, as the monsoon progresses, we find that the layer of westerlies deepens ( Figure  6), carrying more moisture from the Arabian Sea towards the northwest regions, and the northwesterlies weaken, resulting in a reduction in dry-air incursions from northwest desert regions. Figure 8d-f shows the vertical cross-section of relative humidity. During the onset, there is a layer of high relative humidity in the southeast at around 600 hPa, i.e. near the freezing level. Johnson et al. (1996) suggest that deep convection penetrating the stable layers near the 0 • C level (which themselves occur due to melting of convective-cloud ice and precipitation) may detrain significantly, thereby resulting in the formation of mid-level cloud layers. Hence the high relative humidity wedge near 600 hPa during the onset might be a result of detrainment from cumulus congestus clouds as suggested by Parker et al. (2016). As the onset progresses, the detrainment of moisture from the cumulus clouds increases the relative humidity and the high relative humidity layer deepens towards the southeast. By around July 15, the southeast region of the section has relative humidity greater than 50% throughout the troposphere. Vertical profiles Tephigrams for the three approximate station locations of (a-d) Jodhpur, (e-h) Nagpur and (i-l) Visakhapattanam, along the transect from northwest to southeast, during onset (first column), 1 week after onset (second column), 2 weeks after onset (third column) and a month after onset (fourth column) from GloSea5-GC2 hindcasts averaged for the years 1992-2011. The thick black lines represent the freezing level of relative humidity from observations as shown in Parker et al. (2016) indicate similar results.
To look in more detail at the dry-air incursions, we next analyse the vertical profile of equivalent potential temperature ( e ) in Figure 8g-i, as the dry air is characterized by low values of e . The layer of lowest e around 600 hPa defines the layer of mid-level dry-air incursions. A relatively lower e layer extends from about 300 hPa to the surface along the northwest to southeast section. As the onset progresses, the low e layer becomes shallower as the northwesterlies retreat. By around July 15, the lowest e is confined to around 600 hPa and does not extend so far to the southeast. This layer is deeper than the layer indicated in observational data shown in Parker et al. (2016). By July 15, the lower troposphere features higher e values depicting the moistening of the lower layers by the monsoon layer.
In order to gain a more detailed understanding regarding the moist convection, we analyse the saturated equivalent potential temperature ( es ) field. The vertical profile of es (Figure 8j-l) shows that the es values are highest near the surface during the beginning of monsoon. As the monsoon advances, the layer of highest es becomes shallow and by July 15 the highest es values are confined near the surface towards the northwest. During the onset, es is almost constant with height above 600 hPa and near the surface, which represents a near-pseudoadiabatic profile. However, between 700 and 800 hPa there is a strong vertical gradient in es representing the layer of dry-air incursion with low dry static stability as 15 June 15 July Mixing ratio (g/kg) Relative humidity (%) θ e (K) θ es (K)

FIGURE 8
Northwest to southeast vertical sections of (a-c) water-vapour mixing ratio (g/kg), (d-f) relative humidity (%), (g-i) e (K) and (j-l) es (K) around June 1 (first column), June 15 (second column) and July 15 (third column), averaged over the period 1992-2011. The axis of the cross-section is given in Figure 1.  observed by Parker et al. (2016). As es is homomorphic with temperature, lower values of es near the surface are likely related to surface cooling, due to an increase in soil moisture and a lowering of the LCL as the monsoon advances. The vertical section of relative humidity in Figure 8 showed a layer of increased relative humidity near the freezing level. Hence, to better understand the processes involved, we have looked at the vertical cross-section of International Satellite Cloud Climatology Project (ISCCP) cloud fraction with cloud optical thickness ( ) > 0.3 output from the GloSea5-GC2 simulations ( Figure 9). This "satellite-simulator" parameter represents the fractional area covered by clouds ( > 0.3 accounts for all clouds except the very thin ones) at each level, as would be observed from above by satellites. To get a wider perspective, we have extended the northwest-southeast section further into the Bay of Bengal (the extended portion is shown by the dashed blue line in Figure 1). Around May 15 (which is the centre of a pentad-average value from May 13-17), the cloud fraction is largest near the tropopause over the ocean ( Figure 9a). As the monsoon progresses, the fraction of cloud at 100 hPa increases and extends northwestwards. Around June 1, a layer of cloud with cloud fraction > 0.7 forms towards the southeast at around 500 hPa, very close to the freezing level. We also plot the tendencies in specific humidity over the cloud fraction diagnostic shown in Figure  9. The increase in specific humidity (the specific humidity on June 3 minus that on May 30) is largest near the surface. However, there is increased specific humidity near the cloud layer too. Together, this supports the theory proposed by Parker et al. (2016) regarding the detrainment of moisture from mid-level clouds near the freezing level. The theory suggests that, during monsoon onset, shallow clouds develop near the stable freezing layer and moisten the free troposphere as the monsoon onset progresses. These mid-level clouds extend upwards as the onset proceeds and by around mid-July ( Figure  9e) clouds cover the whole of the upper troposphere above 600 hPa. A layer of large cloud fraction develops at lower levels in the northwest after June 1, and the fraction increases as the onset proceeds. From Figure 5, it is clear that, at lower levels, there are strong westerlies flowing from over the northern Arabian Sea into northwestern parts of India during the onset. This could be indicating that cloud formation is fed by winds that carry moisture from over the northern Arabian Sea into this region. Around June 1, the vertical development of this cloud fraction is capped by the dry-air incursion above it. However, as the onset proceeds, the dry-air incursion is eroded from the southeast, as explained above, and hence results in the vertical development of these low-level clouds and the moistening of the atmosphere in the northwest.
In order to understand changes in the atmospheric profile caused by the arrival of the onset, time-pressure cross-sections around Nagpur of several atmospheric parameters such as relative humidity, e , es and virtual potential temperature ( v ) are shown in Figure 10. Nagpur is the location chosen for this analysis, as it lies in central peninsular India and in the middle of the three locations used in our earlier analysis. Before the onset of the monsoon circulation, the layer of highest relative humidity lies at around 600 hPa, just below the freezing level ( = 0 • C) (Figure 10a). About 20 days prior to the circulation onset, the moist layer with relative humidity greater than 50% starts developing downwards. The moistening is due mainly to shallow convection at lower levels above the LCL and moistening of the sub-LCL layer by evaporation and advection, lowering of the LCL and allowing lower levels of the free troposphere to be increasingly moistened by convection with lower cloud bases. Soon after the onset, the upper limit of the layer of high relative humidity extends upward above the freezing level and by around 10 days after the circulation onset RH is greater than 60% from the surface to 400 hPa. These features are consistent with the theory put forth by Parker et al. (2016) and the results from Figure 9, i.e., as the onset approaches, shallow clouds start forming at the altocumulus layer near the freezing level and these clouds penetrate to higher levels, moistening the tropospheric profile over time. e shows the lowest values between 500 and 700 hPa before the monsoon onset Time-pressure section of (a) relative humidity (%), (b) e (K) and (c) es (K) with virtual potential temperature v overplotted (white contours, K) at a grid point close to Nagpur averaged over the years 1992-2011. In all panels, the black solid line represents the LCL and the black dashed line represents the freezing level (T = 0 • C). The profiles are computed for days relative to the local circulation onset at Nagpur (Figure 10b). The lowest values of e represent the layer of dry-air incursions. As time progresses, the dry layer becomes shallower and weaker from both the upper troposphere and the surface. During the pre-monsoon period, CAPE is too low and CIN is too high in the boundary layer to sustain convection, due to the low values of e in the boundary layer. After the onset of monsoon circulation, e values are higher near the surface. Parker et al. (2016) noticed relatively high values of e above the LCL from about 8 days prior to onset. However, in GloSea5-GC2 the higher e values above the LCL are more evident following the onset. The layer of higher e values in the upper levels also deepens as time progresses. Figure 10c shows the vertical structure of es at Nagpur. During the pre-onset period, a deep adiabatic layer exists in the lower layer of the atmosphere from the surface to about 600 hPa, characterized by a wide spacing in the 315 and 320 K virtual potential temperature contours as observed by Parker et al. (2016). By around 20-30 days after the circulation onset, es is nearly constant with height, suggesting a pseudoadiabatic profile. About 30 days prior to the onset, the es values are higher near the surface, as seen in the vertical sections in Figure 8j,k.

CONCLUSIONS
Forecasting the Indian monsoon onset is of paramount importance. Most climate models fail to simulate the Indian monsoon onset reasonably (Sperber et al., 2013;Prodhomme et al., 2015), with substantial delay of the initial onset and progression of the rains. However, many climate models do seem to represent the northwestward progression of the rains to some extent (Zou and Zhou, 2015). We use the initialized fully coupled GloSea5-GC2 hindcast system to understand the evolution of atmospheric structure as the monsoon progresses. GloSea5-GC2 has higher skill in predicting large-scale circulation compared with Indian monsoon rainfall . Hence we use dynamic onset indices based on large-scale circulation to determine the timing of the onset in the GloSea5-GC2 hindcasts before subsequently analysing the physical characteristics of the atmosphere as the onset progresses. The goal of this article was to present the changes in atmospheric structure following the onset, specifically testing whether the model can simulate the processes associated with monsoon advance as in observations, rather than a skill metrics analysis. To this end, we concentrated on two topics: (a) the evolution of rainfall and wind following the onset and (b) the effect of mid-level dry-air incursions in the progression of monsoon rainfall from southeast to northwest India. GloSea5-GC2 is found to be very effective at capturing the physical mechanisms related to the Indian monsoon onset and progression. The onset is marked by rainfall greater than 10 mm/day on the southwest coast of peninsular India and the northeastern regions of India. As the onset progresses, monsoon rainfall propagates in the northwestward direction from the Bay of Bengal coast to the dry northwestern regions of the Indian land through the Indo-Gangetic plains, consistent with observations and reanalysis data. As the monsoon progresses, the mean winds are westerlies over the southern regions of India, from the surface up to around 600 hPa. The prevailing winds are northwesterlies over the northern regions of India, with the highest speeds at 750-600 hPa. However, the monsoon onset propagates in a northwestward direction, perpendicular to the mean flow over the northern plains. Parker et al. (2016), using observational and reanalysis data sets, suggest that the mid-level humidity plays a major role in the northwestward propagation of the monsoon convection.
We have analysed the dry-air incursion mechanism, suggested by Parker et al. (2016), in the coupled GloSea5-GC2 hindcasts. The novelty of this study is that this mechanism has never been looked at in coupled forecast models. Parker et al. (2016) suggest that the retreat of mid-level dry air to the northwest during the progression of the monsoon must be caused by a net moistening from the southeast as the onset proceeds. This could be caused by a moistening of the free troposphere by shallow clouds or by increased soil moisture and evaporation into the boundary layer, leading to the formation of more active clouds. We find that the GloSea5-GC2 model effectively captures the effect of mid-level dry-air incursions in the progression of monsoon rainfall. During the monsoon onset, strong westerly winds carry moisture from over the ocean towards the dry Indian land. Over southeast India, the southwesterlies from the Bay of Bengal branch of the monsoon winds carry moisture towards the coastal regions. This results in the development of shallow clouds in the southeastern region, as shown in Figure 9. The northwesterly winds carry mid-level dry air from the Afghanistan region towards the southeast coast of India. This layer of drier air suppresses convection over the northwestern parts, but in the southeast the development of shallow cumulus and congestus moistens the dry air from below, resulting in a shallower layer of dry air towards the southeast. As a result, there is a weakening of CIN towards the southeast, which can be seen in the near-pseudo-adiabatic profile near Visakhapattanam. As the onset progresses, the southwesterlies become stronger and the LCL lowers, resulting in a higher rate of shallow-cumulus moistening; the low-level westerlies over the Arabian Sea also become stronger, resulting in a moistening of the mid-level dry air from below. The moistening of mid-level dry air, both over the Arabian Sea and over land, and the wetting of land surface by cumulus clouds result in a shoaling or retreat of the dry-air incursions from the northwest. This results in the gradual progression of the monsoon convection from the southeast towards northwestern parts of India. These results are consistent with the results using ERA-Interim reanalysis data and observations by Parker et al. (2016). The fact that the model captures the northwestward propagation of the monsoon and its interaction with the well-observed dry-air incursion mechanism suggested by Parker et al. (2016) gives more confidence in the model predictions associated with the onset. Also, analysis of the ISCCP cloud fraction and specific humidity data from the model, as explained above, supports the mechanism proposed by Parker et al. (2016) in which detrainment from shallow convection near the freezing level acts to saturate the free troposphere ahead of the onset. We have also analysed the effect of soil moisture in controlling the progression of the monsoon to the northwest (not shown). However, the model did not show any evidence of soil moisture increasing ahead of the precipitation front; instead, soil moisture was found to follow the precipitation.
Previous studies have emphasized the role of initialization and air-sea coupling in achieving skilful seasonal forecasts of ISM rainfall and its onset (Vitart and Molteni, 2009;Alessandri et al., 2015). Additionally, Bollasina and Ming (2013) emphasize the importance of coupled air-sea interactions in determining the way in which the Indian monsoon evolves from May to June. Even though the main intention of this study is to understand the physical mechanisms associated with the onset and not to determine the prediction skill of the model, the efficiency of the initialized fully coupled forecast model GloSea5-GC2 in simulating the dynamic and thermodynamic features associated with the Indian monsoon onset suggests that its forecast skill could be useful. To this aim, further analysis dedicated to assessment of the monsoon onset prediction skill in GloSea5-GC2 is being undertaken (Chevuturi et al., 2018).