Analytical study of the performance of the IMERG over the Indian landmass

The paper compares the final run of Integrated Multi‐Satellite Retrieval of Global Precipitation Mission (IMERG) products with India Meteorological Department (IMD) gridded data over the Indian land mass during the southwest monsoon period (June–September) in the period 2014–2017. Spatiotemporal variations of the IMERG are evaluated with IMD rainfall by employing different statistical techniques, and the capability of the IMERG is examined using the categorical skill metrics. A region in central India is the focus in order to study the southwest monsoon's static and dynamic characteristics, such as rainfall distribution and monsoon activity, using the IMERG in conjunction with IMD rainfall data sets. The integrated condensation rate (ICR) was estimated using the specific humidity profiles, and it correlated with the IMERG and IMD rainfall. The IMERG is found to be a potential source for adequately reflecting the ground gauge‐gridded data of categorical rainfall amounts, from very light rain (trace–2.4 mm) to very heavy rain (about 115.6–204.4 mm). However, the IMERG does not reflect satisfactorily the extreme heavy rain events (≥ 204.5 mm·day–1) during the study period. The significant correlation between IMERG/IMD rainfall and the ICR suggests that improved adjustments methods are required for better results when depicting accurate extreme heavy rainfall events by the IMERG.


| INTRODUCTION
A cost-effective, uninterrupted and continuous way to retrieve a rainfall data set globally is by using remote sensing. In last three decades, advancements in space and computational technology provide many multisatellite rainfall products at a very high spatial and temporal resolution. The Global Precipitation Mission (GPM) is a state-of-the-art precipitation mission that makes use of active as well as passive precipitation radars and a series of infrared (IR) channels to produce a research version of precipitation by merging algorithms for microwave, IR and ground-based inputs (Huffman et al., 2018). The GPM, which is an advanced version of the Tropical Rainfall Measurement Mission (TRMM), carries a passive microwave radiometer (advanced 13-channel), such as GPM microwave imager (GMI), coupled with a Ka/Kuband dual-frequency precipitation radar (DPR) to provide the advance precipitation estimates (Draper et al., 2015). There is no dispute that satellite rainfall estimates show quantitative errors because of cloud effects, limitations in sensor performance and retrieval algorithms (Woldemeskel et al., 2013). A combination of both data sources is effective at maintaining high-quality rainfall data from stations and spatially acquiring continuous information from satellite observations (Martens et al., 2013). Merged rainfall products have shown their importance in many applications in the areas of weather forecasting, water resource management and identifying the hotspots of hazardous events (Zhou et al., 2008;Tapiador et al., 2012;Lakshmi Kumar et al., 2014;Tao et al., 2016;Paredes-Trejo et al., 2017Rossi et al., 2017;Thakur et al., 2018). Satellite rainfall is very critical in the study of the Indian summer monsoon, as this monsoon is not limited only to the Indian land mass with different phases (quasi-biweekly oscillations, intra-seasonal oscillations and interannual variability), but also extends over the Arabian Sea and Bay of Bengal. Hence, the evaluation of satellite data sets is necessary over the Indian monsoon region for their use and applicability. For this purpose, many studies have focused on the evaluation of satellite rainfall performance with the ground gauge rainfall data sets developed over the Indian land mass with the TRMM (Nair et al., 2009;Uma et al., 2013;Prakash et al., 2015Prakash et al., , 2016a and the Indian satellite INSAT (Mitra et al., 2018;Singh et al., 2018). The Multi-Satellite Retrieval of Global Precipitation Mission (IMERG), being a very good precipitation data set in terms of depicting spatiotemporal variations because of its high spatial (0.1 × 0.1 ) and temporal (30 min) resolution, and its evaluation with India Meteorological Department (IMD)-gridded data sets of 0.25 × 0.25 resolution developed based on the dense network of rain gauges (7,000 rain gauges, ) provides the actual performance of satellites over the Indian land mass.
The main objective of the study is (a) to evaluate, validate and report the performance of the IMERG research version product with respect to the existing IMD highresolution-gridded rainfall for the early four years of its release between 2014 and 2017 during the summer monsoon over India both spatially as well as temporally using descriptive statistics; and (b) to study the characteristics of the monsoon over the monsoon core region of India using high-resolution-gridded rainfall.

| DATA
Multi-satellite-gridded precipitation product IMERG V5 level 3 and gauge-based-gridded IMD rainfall data sets over the Indian land mass were used between 2014 and 2017 during the southwest (SW) monsoon. A brief description of each data set is given below.
The IMERG V5 is a newly emerged multi-satellite rainfall product with high spatiotemporal resolution (0.1 latitude/longitude, half-hourly/daily). These are estimated from the passive microwave and IR satellite measurements. Finally, monthly data from the Global Precipitation Climatology Center (GPCC) were used for bias adjustments in the IMERG research version products. For more complete data and algorithm details, see Huffman (2016), Thakur et al. (2018) and Lakshmi Kumar et al. (2019).
As a benchmark, daily gauge-based-gridded rainfall data at a spatial resolution of 0.25 latitude/longitude over India  are used. The input for this product is rainfall recorded from about 6,995 gauge stations across India following various quality controls. The average number of daily gauge rainfall used in the development of this product is about 3,100. It can show many features of rainfall, including spatial gradient in orographic rainfall.

| METHODOLOGY
To validate the satellite-estimated rainfall against the IMD rainfall, the two data sets were kept on an equal footing of a spatiotemporal resolution at 0.25 latitude/ longitude on a daily scale. Descriptive statistics such as bias, SD/co-efficient of variation (CV), root mean square error (RMSE) and correlation between the data sets were computed both temporally and spatially. Further, in order to assess the ability of multi-satellite IMERG rainfall products in the detection of rainfall against the IMDgridded rainfall, categorical metrics such as probability of detection (POD), false-alarm ratio (FAR), frequency bias index (FBI) and Peirce skill score (PSS) were calculated from the 2 × 2 contingency table (Table 1) (Wilks, 2006;Hogan et al., 2010).
In the present study, a "no rain" event is assumed as rainfall < 0.5 mmÁday -1 . The metric parameter hits (events estimated by satellite and reported in the IMD), false alarms (events estimated by satellite but which did not occur), misses (events not estimated by satellite but which occurred) and correct negatives (events not estimated by satellite and which did not occur) are essential to understand the capability of any satellite-based rainfall estimate (Tian et al., 2009;Tang et al., 2015). The skill metrics used in the present study are given in Table 2.
To study the SW monsoon's characteristics over the central Indian region, a 9 × 9 box was selected ( Figure 1). Different rainfall intensity categories with revised thresholds suggested by the IMD in its circular 5/2015 (3.7) (Forecasting Circular No. 5/2015(3.7)), such as very light rain (R1 = trace-2.4 mm), light rain (R2 = about 2.5-15.5 mm), moderate rain (R3 = about 15.6-64.4 mm), heavy rain (R4 = about 64.5-115.5 mm), very heavy rain (R5 = about 115.6-204.4 mm) and extremely heavy rain (R6 ≥ 204.5 mm) were chosen to examine the behaviour of the IMERG in comparison with the IMD-gridded rainfall over the central Indian region. The characteristic features of the SW monsoon were also studied, such as its activity (i.e. weak, normal, active and vigorous), in comparison with the IMERG data in order to evaluate its performance against IMD rainfall data. The criteria of the above-mentioned monsoon activity were adapted from the IMD (www.Imd. gov.in/section/nhac/termsglossary.pdf) and are given below: • Weak monsoon: Rainfall (RF) < half of normal daily rainfall (NRF). • Normal monsoon: 0.5×NRF < RF < 1.5×NRF.
Finally, the integrated condensation rate (ICR) was estimated by using the specific humidity profiles of the European Centre for Medium-Range Weather Forecasts (ECMWF) and following the method of O'Gorman and Schneider (2009). The ICR for a specific humidity profile was calculated using: where ω s is the vertical velocity (ms −1 ); C p is the specific heat capacity; L v is the latent heat of vaporization (JÁkg −1 ); T is temperature (K); θ is potential temperature; P is pressure at different pressure levels (mb); dθ is the temperature difference; and dP is the pressure difference. The estimated condensation rate has been correlated with the IMD and IMERG data sets and studied for their linear association.

T A B L E 2
Skill metrics for rainfall occurrence evaluation

Evaluation metric Definition Remark
Probability of detection (POD)

| RESULTS AND DISCUSSION
The spatial distribution of the mean seasonal SW monsoon rainfall from the IMD and IMERG (at 0.25 × 0.25 ) over the Indian land mass for the period 2014-2017 is shown in Figure 2. Preponderant features of the SW monsoon over India such as higher rainfall zones over Western Ghats, northeast India and lower rainfall regime in northwest India, and portions of southeast India are well depicted by the IMERG (Figure 2b). However, a mixture of negative and positive bias by the IMERG is observed over the central Indian region, some parts of the Indo-Gangetic plains, and the Western Ghats. The negative bias of satellite rainfall is fully discussed by earlier studies which reported the satellite's incapability of detecting the cloud parameters. The positive bias of satellite rainfall over other topographical regions has been reported differently in different studies. For example, Uma et al., 2013 reported that the comparison of the TRMM rainfall with the IMD-gridded data on a daily scale did not show any good agreement over the Indian region, and was comparable only beyond 5 × 5 box regions of India. It can be ascertained that the good agreement of satellite rainfall with ground measurements mainly depends on the spatiotemporal resolution of satellite rainfall and the density of ground measurements. The categorical metrics for the IMERG with respect to the IMD are presented in Table 3. The POD, FBI and PSS are > 0.96, > 0.97 and > 0.66, respectively, and a minimum FAR (0.01) is observed during the study period of 2014-2017. These values are analogous to other studies of IMERG validation with the ground gauge network over different years and regions (Sharifi et al., 2016;Prakash et al., 2016b;Asong et al., 2017). For example, Satyaparakash et al. (2016) reported the POD, FBI and PSS as about 0.78, about 1.00 and about 0.55 for the IMERG V3 data for 2014 over India. Hence, it should be emphasized that the IMERG V5 has the potential to detect the rainfall over India.
The time series of the IMERG rainfall along with the IMD and climatology   (Figure 3) shows that the variation in daily rainfall of the IMERG is stronger than shown by the IMD data. As the mean daily rainfall from the IMERG is higher than of the IMD, the uncertainty in the IMERG seems to be increasing with higher rainfall rates that show a higher SD as a function of increasing rainfall rate. The correlation for all-India rainfall of the IMERG and IMD for the study period is 0.64 with a 0.01 level of significance. As the rainfall for all India is from the diversified regions with different instability processes, the significant correlation shows that the IMERG is better performing over India. In addition to the daily rainfall comparison, the weekly comparison was also carried out. The agreement between the two data sets substantially improved when they were temporally summed to a week. The bias of the IMERG with the the rest of the area (62%) was overestimated by the IMERG. A maximum bias was observed over the portions where the IMD rainfall variability was high (e.g. Western Ghats and northeast India) (Figure 4a). However, it is known that the higher bias cannot be reduced even if the data are averaged. As the spatial coverage of India is very high, these direct bias estimates may be useful when comparing the rainfall amounts as reported by Smith et al. (2006) who mention that the indirect bias methods are highly useful for the bias adjustment of satellite precipitation. The correlation map (Figure 4b) shows that the linear association of the IMERG rainfall is good and significant over the monsoon core region of India, where India receives most of the rainfall during the SW monsoon. A larger RMSE (> 10 mmÁday −1 ) is observed over the west coast, northeast, along the Himalayan foothills, and between 20 and 25 N (Figure 4c). To study the variability in daily rainfall, the CV is computed from the data set shown in (Figure 4d, e). It gives the variability in daily rainfall with respect to its seasonal mean rainfall. There is a lower CV along the west coast, Himalayan foothills, and northeast and eastern central India (high rainfall zones), while a higher CV is observed over northwest India and Jammu and Kashmir in both data sets. Figure 5 shows the mean seasonal rainfall of the SW monsoon for the period 2014-2017 over India obtained from the IMERG at spatial resolutions of 0.10 × 0.10 and at 0.25 × 0.25 (which are used in the present study by regridding). The two data sets show similar spatial variations in depicting the large-scale features of the SW monsoon. Furthermore, the two data sets of the IMERG were also examined at 0.25 × 0.25 and 0.1 × 0.1 along with the IMD data at 0.25 × 0.25 by subjecting them to probability density function (PDF) analysis, which can assist the validation at different grid levels. The PDF analysis is a traditional approach used by several investigators for satellite data validation (Hossain and Huffman, 2008;Bharti and Singh, 2015). Figure 6 shows the PDFs for the mean daily rainfall over different grid boxes (5 × 5 ) selected over different geographical regions of India for the period of the SW monsoon, 2014-2017. The PDFs revealed the higher probability of the occurrence of rainfall with the IMD, whereas the same is lower during the higher rainfall rates over regions 1 and 4, respectively. For regions 2 and 4, broad peaks were observed in the scale parameter, which infers that the probability of occurrence of rainfall is in good agreement with the IMERG. Note that the PDFs of the IMERG at 0.25 × 0.25 and 0.1 × 0.1 do not show any variations in shape and scale. From this analysis, it can be ascertained that the IMERGs with 0.1 and 0.25 grid resolutions do not vary, but the variation is seen with the IMD. The quantification of discrepancy between the IMD and IMERG at 0.25 grid resolutions is discussed by taking the two actual events in the later section.
Here a domain was chosen in order to evaluate the monsoon's quantitative performance in terms of its activity and distribution over the central Indian region, which is part of the core monsoon zone characterized by traditional monsoon breaks and active spells. Six categories of rain events were selected based on the IMD criteria: very light rain (R1), light rain (R2), moderate rain (R3), heavy rain (R4), very heavy rain (R5) and extremely heavy rain (R6). The rainfall ranges are given in the Methodology section. The number of grids comprising this criterion over the study region was computed from the IMERG as well from the IMD. Figure 7a depicts the good association between the two data sets in terms of the number of grids in each rainfall category from R1 to R5 as evidenced by the correlation (R) in each category. As the intensity of the rain increases from light rainfall (R2) to extremely heavy rainfall (R6), the correlation decreases. In the case of extremely heavy rain (≥ 204.5 mmÁday -1 ), the association between the two data sets is insignificant. Figure 7b shows the association in rainfall amounts in all categories between the two data sates in terms of R. From  Figure 7b, it can be seen that the IMERG performance is good for very light rain (R1), light rain (R2), moderate rain (R3), heavy rain (R4) and very heavy rain (R5), as evidenced by the significant correlations of 0.65, 0.79, 0.73, 0.66 and 0.58, respectively. Its performance is poor for extremely heavy rain (R6), where the correlation is very low (0.11). Figure 8 shows that the frequency distribution (%) of the number of grids falls in different rain categories along with the contributed mean rainfall. There is an underestimation of the IMERG when depicting the percentage of the number of grids during R2 and R3, while there is an overestimation during R1 and R4. The magnitude of mean rainfall is underestimated by the IMERG in R2 and R3, where it is overestimated in the other categories.
Based on the station daily rainfall data available, a comparative study among the station data, with the IMERG at 0.1 and 0.25 and the IMD at 0.25 grid resolutions, was made. Figure 9 depicts the one week timeseries rainfall data (July 3-9, 2016) for the station Satna located at 24.57 N and 80.83 E where extremely, heavy and moderate rainfall occurred during this week of the SW monsoon 2016. It can be inferred that the station data dominate the gridded data by showing the higher magnitudes during the extreme rainfall conditions. However, this needs to be analysed in detail by knowing how many rain gauges were considered to interpolate the rainfall over the respective grid as these IMD data were developed using the Sheppard interpolation technique by considering the multi-rain gauge network . During the very heavy rainy day (July 6), all the data sets show coincidence. However, the rest of the days show a disparity which infers the quantitative changes among the data sets of IMD and IMERG. This preliminary analysis shows the satellite's limitations in estimating accurate rainfall, and because of this reason, these data sets need to be interpreted carefully. However, these data sets will be highly useful in providing the promising time-series averaged aerially, in particular when an inadequate ground-gauge network persists.
It is reported (Iribarne and Godson, 1981) that the precipitation rate from the thermodynamic process can be estimated from the condensation rate. Also, rainfall is a key process that represents the condensation and the amount of heating from condensation (Karaseva et al., 2012). Satellite retrieval of rainfall at an IR wavelength is sensitive to the size distribution of the hydrometeors, and the condensation is captured by IR and visible satellite imagery. Mathur (1995) studied the condensation of convective and non-convective systems with the assimilation of rainfall. However, the rain droplets produced from the cloud face the aerodynamic drag exerted by the surrounding air (Pauluis and Dias, 2012). Hence, it may be interest to understand the relation between satellite rainfall and ground-reaching rainfall with the condensation rate estimated from the specific humidity profiles. This may be taken as a performance indicator of satellite rainfall to compare with the ground rainfall where the problem of the drag shift of rain droplets can be overcome by choosing the wide spatial coverage. Here, the ICR (ms −1 ÁkÁmb −1 ) was calculated by using the formula proposed by O'Gorman and Schneider (2009) over the study area. Figure 10 shows the association between the ICR and rainfall estimated by the IMD and IMERG. Although the IMD rainfall is purely gauge based and the IMERG rainfall is remotely sensed (where the contribution of cloud-top temperature is significant), both data sets (IMD/IMERG) showed good agreement with the estimated ICR from 850 to 500 mb evidenced by the correlations of 0.73 and 0.75, respectively. This relation shows the potential of the IMERG to delineate the stratiform rainfall from the low-level cumulus clouds over the central India region. This strong relation also reveals the disparities observed in extreme rain cases, which can be dealt with by adjusting the methods of comparison such as fixing the thresholds, and so on.
As these two data sets are in fair agreement with condensation rate and rainfall, a coarse classification of the monsoon with gridded data was made in a manner similar to what the IMD has done with gauge data for a particular region. Based on the rainfall received by the IMD weather stations, the IMD has classified the Indian summer monsoon into the following category.
The monsoon is classified into four types (i.e. weak, normal, active and vigorous) by keeping all the criteria the same, but with the number of grids instead of gauges. Table 4 shows the different types of monsoon observed from 2014 to 2017 from the IMD and IMERG-gridded rainfall data sets. From the tabulated values, it can be inferred that the IMERG outperformed when depicting the monsoon activity in the category of weak and normal, as evidenced by a 2% deviation from the IMD data sets. In the case of the active monsoon, the IMERG has showed a 19% deviation from the IMD data sets. However, the IMERG's performance is not satisfactory when depicting the vigorous monsoon days over the study area.
Though the current techniques are more advanced when estimating the satellite rainfall magnitudes, its accurate spatiotemporal representation over India is a challenge. Direct methods of comparison for satellite rainfall with ground measurements show several disparities, particularly over land. However, the IMERG could satisfy many of the features of rainfall variations over India, as depicted by the promising gridded data sets developed based on the highly dense observational network . The performance of the IMERG during the extreme heavy rain events (> 204.5 mmÁday −1 ) creates a challenge, and its comparison with the ground data has to investigated further in order to use the data for hazardous events, such as flash floods and landslides. However, the substantial correlation of the IMERG and T A B L E 4 Number of days of weak, normal, active and vigorous monsoon activity over the study area during the southwest monsoon, 2014-2017, obtained from the India Meteorological Department (IMD) and Integrated Multi-Satellite Retrieval of Global Precipitation Mission (IMERG) gridded rainfall data sets

Weak
Normal Active Vigorous   Monsoon year  IMD  IMERG  IMD  IMERG  IMD  IMERG  IMD  IMERG   2014  58  56  120  119  16  16  0  3   2015  50  50  117  114  25  26  1  4   2016  25  30  121  117  28  34  0  0   2017  39  40  120  120  15  23  0  0 IMD with the ICR signifies the accurate quantification of rain estimates by the IMERG, and in the case of extreme heavy rain events, different analytical techniques with different thresholds may help in improvising the IMERG data sets for applicability. The present analysis shows that the time series developed based on the IMERG data sets can be a viable option for better spatiotemporal characterization. The quantitative representation of the IMERG rainfall can be used to study water resource and agriculture management purposes over India.

| CONCLUSIONS
High-resolution gauge-adjusted multi-satellite rainfall estimation has shown its remarkable application, and its importance is very high, especially over data-void areas.
The evaluation of the Integrated Multi-Satellite Retrieval of Global Precipitation Mission (IMERG) against India Meteorological Department (IMD)-gridded rainfall data during the southwest monsoon in the period 2014-2017 is performed daily at a spatial resolution of 0.25 latitude/ longitude. Several skill, error matrices and verification schemes are calculated in order to assess the performance of the IMERG over the Indian land mass, as well as monsoon-dominated core region of India at a grid level. The analysis showed that the IMERG has good agreement with the IMD rainfall over the central Indian region in terms of the quantitative assessment of rainfall, except during extreme heavy rain events (≥ 204.5 mm). As far as the spatial pattern is concerned, it is well performed over India when depicting the large-scale features, but with different biases. In the monsoon core region, the IMERG rainfall data set can distinguish between weak and normal monsoon days very well than during active monsoon days. However, its performance is not satisfactory at picking vigorous monsoon days. The IMERG is well correlated with the integrated condensation rate (ICR) as the IMD, which shows the potential of the IMERG at capturing accurate rainfall levels from the infrared and microwave imagery. The overall performance of the IMERG over the central Indian region is worthy, and the data sets of rainfall are promising and may be used for different applications such as weather forecasting and hazard management.