Journal list menu

Volume 144, Issue S1 p. 292-312
ADVANCES IN REMOTE SENSING OF RAINFALL AND SNOWFALL
Free Access

Validation of the CHIRPS satellite rainfall estimates over eastern Africa

Tufa Dinku

Corresponding Author

Tufa Dinku

International Research Institute for Climate and Society, The Earth Institute at Columbia University, Palisades, New York

Correspondence

Tufa Dinku, International Research Institute for Climate and Society(IRI), The Earth Institute at Columbia University, 61 Route 9W, Monell Building, Palisades, NY 10964-8000, USA.

Email: [email protected]

Search for more papers by this author
Chris Funk

Chris Funk

U.S. Geological Survey, Earth Resources Observation and Science Center, Sioux Falls, South Dakota

University of California, Santa Barbara, Climate Hazards Group, Santa Barbara, California

Search for more papers by this author
Pete Peterson

Pete Peterson

University of California, Santa Barbara, Climate Hazards Group, Santa Barbara, California

Search for more papers by this author
Ross Maidment

Ross Maidment

Department of Meteorology, University of Reading, Reading, UK

Search for more papers by this author
Tsegaye Tadesse

Tsegaye Tadesse

National Drought Mitigation Center, University of Nebraska-Lincoln, Lincoln, Nebraska

Search for more papers by this author
Hussein Gadain

Hussein Gadain

Somalia Water and Land Information Management (SWALIM) Project, Food and Agriculture Organization of the United Nations, Cairo, Egypt

Search for more papers by this author
Pietro Ceccato

Pietro Ceccato

International Research Institute for Climate and Society, The Earth Institute at Columbia University, Palisades, New York

Search for more papers by this author
First published: 01 April 2018
Citations: 441
Funding information National Aeronautics and Space Administration, NNX14AD30G, NNX16AN14G and NNX15AL46G. U.S. Geological Survey's Drivers of Drought project. NASA, NNX15AL46G, NNX16AN14G, NNX14AD30G.

Abstract

Long and temporally consistent rainfall time series are essential in climate analyses and applications. Rainfall data from station observations are inadequate over many parts of the world due to sparse or non-existent observation networks, or limited reporting of gauge observations. As a result, satellite rainfall estimates have been used as an alternative or as a supplement to station observations. However, many satellite-based rainfall products with long time series suffer from coarse spatial and temporal resolutions and inhomogeneities caused by variations in satellite inputs. There are some satellite rainfall products with reasonably consistent time series, but they are often limited to specific geographic areas. The Climate Hazards Group Infrared Precipitation (CHIRP) and CHIRP combined with station observations (CHIRPS) are recently produced satellite-based rainfall products with relatively high spatial and temporal resolutions and quasi-global coverage. In this study, CHIRP and CHIRPS were evaluated over East Africa at daily, dekadal (10-day) and monthly time-scales. The evaluation was done by comparing the satellite products with rain-gauge data from about 1,200 stations. The CHIRP and CHIRPS products were also compared with two similar operational satellite rainfall products: the African Rainfall Climatology version 2 (ARC2) and the Tropical Applications of Meteorology using Satellite data (TAMSAT). The results show that both CHIRP and CHIRPS products are significantly better than ARC2 with higher skill and low or no bias. These products were also found to be slightly better than the latest version of the TAMSAT product at dekadal and monthly time-scales, while TAMSAT performed better at the daily time-scale. The performance of the different satellite products exhibits high spatial variability with weak performances over coastal and mountainous regions.

1 INTRODUCTION

Analyses of climate variability and trends require long-term and temporally consistent rainfall time series. Applications that use rainfall data in modelling the impact of climate variability and change on different socio-economic activities also require long-term climate time-series data at high temporal and spatial resolutions. Traditionally, rainfall measurements from conventional ground weather stations are the primary sources of such climate data. However, historical records from station observations are inadequate over many parts of the world due to sparse (and in many cases declining) or non-existent station networks. Thus, satellite-based rainfall products have been increasingly used as complements or in place of station observations. There are now a few satellite-based rainfall products that provide over 30 years of rainfall time series. These include the Global Precipitation Climatology Project (GPCP: Adler et al., 2003), the Climate Prediction Centre (CPC) Merged Analysis of Precipitation (CMAP: Xie and Arkin, 1997), African Rainfall Climatology version 2 (ARC2: Novella and Thiaw, 2013), and the Tropical Applications of Meteorology using SATellite and ground based observations (TAMSAT) rainfall estimate (Grimes et al., 1999; Thorne et al., 2001; Maidment et al., 2014; Tarnavsky et al., 2014).

The longest time series are offered by GPCP and CMAP that go back to 1979. However, these two products suffer from very coarse spatial (2.5° latitude/longitude) and temporal (monthly) resolutions. In addition, the time series of these two products may not be consistent over time as they both combine data from different sources with different weightings for each year depending on data availability. While this approach may provide more accurate estimates for any one year, the interannual variations may be influenced as much by the different mix of inputs as by actual changes in rainfall amounts. Thus, trends and variability statistics based on these products may be inaccurate. The ARC2 and TAMSAT products have relatively high spatial (0.1° and 0.0375°, respectively) and temporal (daily) resolutions. The two products also exclusively use thermal infrared (TIR) data, which makes their time series relatively consistent over time. However, ARC2 uses station data obtained through the Global Telecommunications System (GTS), which might introduce some inconsistencies since the density of these observations can vary substantially over time (e.g. Maidment et al., 2015). The main limitation of these two products is that they are not available outside the African continent.

There are now relatively new satellite-based rainfall products with good spatial (0.05° latitude/longitude) and temporal (daily, pentad and dekadal) resolution, as well as quasi-global coverage (50°S–50°N). These are the Climate Hazards Group (CHG) Infrared Precipitation (CHIRP) and CHIRP combined with station data (CHIRPS) from the University of California at Santa Barbara and U.S. Geological Survey (Funk et al., 2014; 2015a). The CHIRP and CHIRPS (hereafter CHIRP/S) time series go back to 1981. The first of the two products (CHIRP) could be considered reasonably consistent over time as it is based on TIR estimates, with mean bias removed using a satellite-enhanced station-based climatology CHPclim (Funk et al., 2015b). The CHIRPS product may have some inhomogeneity over parts of the world where the availability of station data is not consistent over time. This problem is mitigated, however, by blending the stations with the CHIRP background (Funk et al., 2015a).

Owing to the uncertainties associated with satellite rainfall retrievals, validation of these products under diverse geographic and climate conditions is very critical. Validations of many satellite rainfall products have been conducted over the different parts of East Africa at different spatial and temporal scales. These include Dinku et al. (2007; 2008; 2011); Hirpa et al. (2010); Romilly and Gebremichael, (2011); Worqlul et al. (2014); Young et al. (2014); Maidment et al. (2013; 2014); Diem et al. (2014); Awange et al. (2016); and Maidment et al. (2017). Many of these validation studies, focused on the complex topography of Ethiopia, have demonstrated the challenges of satellite rainfall retrieval over the region. The emphasis of results from these different studies is that the skills of satellite rainfall estimates over this region vary greatly with climate, topography and seasonal rainfall patterns.

In this study, the CHIRP/S products are evaluated over parts of eastern Africa that include various mountainous, coastal and desert regions. These two products were evaluated by comparing them with reference rain-gauge data as well as with the ARC2 and TAMSAT rainfall estimates. The evaluation was done at regional level (East Africa), as well as country levels for Ethiopia, Kenya and Tanzania. The evaluation period is 2006 to 2010. This period is selected mainly because of the availability of station data. However, the validation period for Rwanda is 2010 to 2014, because there were very few stations reporting during 2006 to 2010.

To our knowledge, this is the first evaluation of the CHIRP/S products over eastern Africa except for Maidment et al. (2017), which did limited validation over Uganda. This is because: (a) these are relatively new products, and (b) the CHIRPS product ingests a good number of stations from the region, which makes it hard to find an independent set of stations for validation. We are fortunate that one of the co-authors of this article has been working with the National Meteorological Services (NMS) in the region as part of the Enhancing National Climate Services (ENACTS) initiative (Dinku et al., 2014a), in which one of the activities has been evaluating the three satellite rainfall products (ARC2, TAMSAT and CHIRP/S) over each country. The objective was to choose the best satellite product suitable for each country. Working with the NMS facilitated access to many more stations, some of which were used for this validation work.

Even though the main focus of this validation work is the CHIRP/S products, ARC2 and TAMSAT have also been evaluated in the process of comparison with CHIRP/S. The two products were selected for comparisons because they have similar properties as CHIRP/S (TIR-based, relatively high spatial resolution and long time series) and are widely used in the region. Comparison is also made between the latest versions of the TAMSAT product (TAMSAT3) and the earlier version (TAMSAT2) to assess the improvements made for TAMSAT3.

The main strength of the current study, compared to previous studies of the region, is that it covers a large part of the region and uses a larger number of ground observations. Section 2 describes the study region and data. Evaluation of the products is presented in section 3. The results are discussed in section 4, and section 5 presents the summary and conclusion.

2 STUDY REGION AND DATA

2.1 Study region

The study area is located over eastern Africa and covers Ethiopia, Kenya, Somalia, Uganda, Rwanda and Tanzania (Figure 1). The region has the most complex topography in the continent. Elevation varies from an area below sea level over Ethiopia to Mount Kilimanjaro in Tanzania at 5,895 m. It is affected by the seasonal north–south migration of the intertropical convergence zone (ITCZ). This movement of the ITCZ results in four different rainfall seasons over the region: December–February, March–May, June–September and October–December. Figure 2 presents seasonal rainfall patterns over different parts of the region. This graphic shows that seasonal patterns are different from one country to the other, but also from one part of a country to the other as shown for Ethiopia. The climate of the region is influenced by El Niño/Southern Oscillation (Ogallo, 1988; Indeje et al., 2000; Anyah and Semazzi, 2006; Otieno and Anyah, 2012) as well as variability of sea-surface temperature over the Indian Ocean (Williams and Funk, 2011). The interactions between the climate of a region and global sea-surface temperatures is complex in that, for instance, different ENSO phases will have different impacts during different seasons and over different parts of the region (Clark et al., 2003; Otieno and Anyah, 2012). The complex orography, combined with the myriad synoptic systems that produce rainfall, has resulted in very diverse climates that span eight different climate zones that range from warm deserts to humid highland climate (Peel et al., 2007); and Ethiopia encompasses seven of the eight zones.

Details are in the caption following the image
Study region with validation (+) and CHIRPS (*) stations. CHIRPS stations vary over time and the figure shows stations used for July 2006. Background image is elevation in metres
Details are in the caption following the image
Monthly rainfall climatology over (a) northwestern Ethiopia, (b) southern Ethiopia, (c) central Kenya and (d) central Tanzania. The green and red lines represent the 5th, 50th and 95th percentiles of the rainfall, respectively. Sources. http://www.ethiometmaprooms.gov.et:8082/maproom/Climatology/ http://kmddl.meteo.go.ke:8081/maproom/Climatology/index.html http://maproom.meteo.go.tz/maproom/Climatology/index.html

This complex geography and climate of the region offers an opportunity and, at the same time, poses a challenge to the validation of satellite rainfall products. The opportunity is that one can test the performance of the different products under diverse climate conditions within a relatively limited area. It will be shown later that this complex climate and associated rainfall types result in high spatial variability in the performance of the satellite products. Generally, TIR-based satellite products underestimate rainfall over coastal and mountainous regions because most of the rain comes from clouds with temperature higher than the threshold used by the satellite algorithms. On the other hand, satellites could overestimate rainfall over desert areas owing to sub-cloud evaporation. The challenge is finding a rain-gauge network that is dense enough to resolve the different climate zones. The next section will show that the stations network used for this validation work covers most of those climate zones reasonably well.

2.2 Reference data

2.2.1 Rain-gauge data

Rain-gauge data from six countries in East Africa (Figure 1) were used as a reference for this evaluation. The evaluations over Ethiopia, Kenya, Uganda, Rwanda and Tanzania were part of the ENACTS project implemented in those countries (Dinku et al., 2014a; 2014b). One of the activities for implementing ENACTS has been selecting the best satellite rainfall product from among ARC2, TAMSAT and CHIRP/S. It is results from those individual validations that are combined and presented here. However, ENACTS has not been implemented in Somalia, and the data over Somalia were contributed by one of the co-authors.

Hundreds of stations were available from the ENACTS countries. However, some of those stations were also used in the CHIRPS product. Thus, only those stations located at least 25 km away from those used in CHIRPS were used for the validation (Figure 1). The CHIRPS product has used different number of stations over the years (Figure 3) due to declining data availability. Thus, the distribution of CHIRPS stations shown in Figure 1 correspond to stations used in CHIRPS during 2006 to 2010, which is the validation period. This also means that this validation may not be fair to CHIRPS because it uses fewer stations, particularly over Tanzania and Kenya, during the validation period compared to earlier years. On the other hand, validating CHIRPS during the period when it used maximum number of stations could also be misleading because the accuracy is expected to decrease with a decline in the number of stations used by the product. Thus, the density for the 2006–2010 period is representative of accuracies typical for recent early warning and hydrological extreme applications.

Details are in the caption following the image
Number of stations used in monthly CHIRPS product over Ethiopia, Tanzania and Kenya for the month of July. (Source. ftp://chg-ftpout.geog.ucsb.edu/pub/org/chg/products/CHIRPS-2.0/diagnostics/stations-perMonth-byCountry/pngs/)

Overall, about 1,200 stations were used for validation. Many of these stations are manned by volunteers. As a result, the quality of some of the data might not be as good as those collected by professional observers. Rigorous quality checks were performed on the datasets. These include checking for false zeros (for daily data), and performing temporal and spatial checks to detect outliers. Temporal check compares the observation of a given day with observations for others years but the same month. The spatial checks compare an observation with other observations from nearby stations. There are cases where a spatial check may not be performed because of the lack of nearby stations. However, the quality check procedures may not remove all the errors. Another source of error could be the locations of some of the volunteer stations. These measurement and location errors may increase the random errors when evaluating the satellite products.

2.2.2 ENACTS data

ENACTS is an initiative to improve the availability, access and use of climate information in Africa (Dinku et al., 2014a). Improving data availability includes combining quality-controlled station data from the national observation network with satellite rainfall estimates. This is done at each of the NMSs in this study. The main strength of the ENACTS approach is that it has access to all data from national weather stations by working directly with the NMS. For example, rainfall data from about 1,400 stations and 1,100 stations were available for Tanzania and Ethiopia, respectively. After rigorous quality checking, about 600 stations and 500 stations were used for Tanzania and Ethiopia, respectively (Figure 4). For comparison, only about 18 stations are made available by each country through GTS. The limitation of ENACTS data is that it belongs to the NMS and can only be shared with outside users according to each NMS's data policy. The merging method involves two steps: (a) all stations with at least 10 years of data are used to remove climatological bias from satellite (TAMSAT) estimates, and (b) the bias-adjusted satellite products are combined with contemporaneous station observations (Dinku et al., 2013).

Details are in the caption following the image
ENACTS (.) and CHIRPS (+) stations over parts of eastern Africa. CHIRPS stations vary over time, thus this figure shows stations used for July 2006. Background is elevation in metres

2.3 Satellite data

2.3.1 CHIRP/S

A detailed description of the CHIRP/S products has been provided in Funk et al. (2014; 2015a). A brief summary of the description of the CHIRP/S algorithm and process is provided below.

The CHIRP/S algorithm combines three main data sources: (a) the Climate Hazards group Precipitation climatology (CHPclim), a global precipitation climatology at 0.05° latitude/longitude resolution estimated for each month based on station data, averaged satellite observations, elevation, latitude and longitude (Funk et al., 2012; 2015b); (b) TIR-based satellite precipitation estimates (IRP); and (c) in situ rain-gauge measurements. The CHPclim is distinct from other precipitation climatologies in that it uses long-term average satellite rainfall fields as a guide to deriving climatological surfaces. This improves its performance in mountainous countries like Ethiopia (Funk et al., 2015b).

The CHIRP/S algorithm involves the following steps (Funk et al., 2015a): (a) derive TIR precipitation estimates (IRP) from quasi-global geostationary satellite observations, which are generated using local regressions between Tropical Rainfall Measuring Mission multi-satellite precipitation analysis pentads (TMPA 3B42: Huffman et al., 2009; 2011) and cold cloud duration (CCD); (b) convert the IRP to percentage anomalies and multiply by the CHPclim, producing the unbiased precipitation fields. Step (b) results in the CHIRP product (an unbiased IRP), which is a time series that goes back to 1981 at a spatial resolution of 0.05° latitude/longitude. The TIR data have some missing images, particularly in the early 1980s. In such cases, the missing values were filled with unbiased data from the atmospheric model rainfall fields from the National Oceanic and Atmospheric Administration (NOAA) Climate Forecast System, version 2 (CFSv2). This filling procedure only affects a small part of the IRP record.

The next step adjusts CHIRP using contemporaneous station observations from around the globe. The station data include the monthly Global Historical Climate Network version 2 archive (Peterson and Vose, 1997), the daily Global Historical Climate Network (Durre et al., 2010) archive, the global summary of the day dataset (GSOD), and the daily GTS archive provided by NOAA's Climate Prediction Center (CPC). Additional data are also used from some regions, e.g. East Africa, the Sahel, Central America and Afghanistan (Funk et al., 2014; 2015a). The procedure for combining CHIRP with station observations uses the expected correlation between precipitation for a given pixel and that from the nearby stations. These correlations are estimated from the CHIRP fields. An additional correlation value, which is supposed to be an estimate of the correlation between “true” precipitation at each pixel and CHIRP values, is also used. A value of 0.5 is assigned to this correlation, which is estimated from correlations between CHIRP pixel values and gridded station observations. Bias ratios are then calculated from the nearest five stations. These ratios are then combined into a single correction factor by a weighted average, where the weights are the squares of the correlation coefficients. These correction factors are multiplied by the CHIRP values to create adjusted-CHIRP. In the final step, the original (unadjusted) CHIRP is combined with the adjusted-CHIRP. The square of the correlation between CHIRP and “true” rainfall, as well as the estimated correlation of the nearest station, is used to determine the proportion of CHIRP and adjusted-CHIRP to be combined. The CHIRPS product is the output from this final step.

Merging of station data with CHIRP is done at pentad (5-day) and monthly time-scales with the pentads later rescaled such that the sum of pentads in a calendar month is equal to the monthly values. A daily version is created from the pentads and monthly fields. The daily CHIRPS uses daily cold cloud duration (CCD) percentages to discriminate between rain/no-rain events, and then the corresponding pentad rainfall is partitioned among the daily rain events proportional to percentage of CCD. Two versions of CHIRPS are produced operationally. The preliminary version (ftp://chg-ftpout.geog.ucsb.edu/pub/org/chg/products/CHIRPS-2.0/prelim/) uses just GTS stations, which is then updated to the final version (ftp://chg-ftpout.geog.ucsb.edu/pub/org/chg/products/CHIRPS-2.0/) with more station data. Preliminary CHIRPS is available 2 days after the end of a pentad, while the final version is generated the third week of the following month.

With regards to homogeneity of the time series, CHIRP would be more consistent than CHIRPS. The CHIRPS product ingests different numbers of stations over the years depending on availability of station data, which has typically been decreasing over time (Figure 3). The ingestion of different numbers of stations may lead to inhomogeneity of the CHIRPS time series. The use of weighted bias ratios in calculating CHIRPS rather than using absolute station values, however, may minimize the effect of varying number of stations over time. The CHIRP product may also have some inhomogeneity due to missing satellite slots in the early 1980s.

2.3.2 African Rainfall Climatology (ARC2)

The Climate Prediction Centre's African Rainfall Climatology version 2 (ARC2: Novella and Thiaw, 2013) dataset goes back to 1983 at a daily time-scale and spatial resolution of 0.1° latitude/longitude. As the name implies, this dataset is created specifically for climate studies in Africa in an attempt to overcome the lack of long-term temporally consistent rainfall time series. The ARC2 algorithm uses three-hourly thermal infrared (TIR) brightness temperature and a threshold of 235 K for discriminating raining clouds from non-raining ones. This temperature threshold is used to compute cold cloud duration (CCD) from satellite TIR images. Then a simple linear relationship is used to convert CCD into rainfall amounts. Rain-gauge data made available through the World Meteorological Organization's GTS are used to adjust the final ARC2 product. This procedure follows the CMAP blending process (Xie and Arkin, 1997): station data are interpolated to produce a continuous surface, then combined with CCD-based rainfall estimates by weights that are inversely proportional to the estimated standard errors.

2.3.3 TAMSAT

The TAMSAT rainfall estimates are produced at the University of Reading in the United Kingdom. The TAMSAT method (Grimes et al., 1999; Thorne et al., 2001; Maidment et al., 2014; Tarnavsky et al., 2014) is based on the assumption that cold cloud-top temperatures of tropical storms identify raining clouds. These temperatures are obtained from Meteosat thermal-infrared images. The length of time that a satellite pixel is colder than a given temperature threshold (i.e. the cold cloud duration) is summed over a given period (historically this has been 10 days) to produce temporally accumulated CCD fields. The methodology assumes that CCD is linearly related to rainfall. The TAMSAT algorithm uses historical rain-gauge observations to calibrate the CCD to produce seasonally and spatially varying climatological calibration parameters that do not change from year to year. These calibration maps are then applied to the TIR record (1983 to the present day) to produce a continually updated rainfall time series. The use of climatological calibration parameters ensures that the estimates are temporally consistent. The main strength of the TAMSAT approach is that the algorithm is locally calibrated using rain-gauge data from many parts of Africa. This ensures that the local rainfall–CCD relationship, which varies depending on many factors such as orography and proximity to lakes and the coast, is well defined where sufficient gauge records exist. TAMSAT estimates are available at daily, pentad (currently latest version only) and dekadal time-scales within a couple of days after the end of each pentad or dekad.

The latest version of the TAMSAT product (TAMSAT3) is used here. While the essence of the TAMSAT method has not changed, the main differences between TAMSAT2 (previous version) and TAMSAT3 (Maidment et al., 2017) are: (a) the estimates are now calibrated and produced at the pentadal time step (as opposed to the dekadal time step for TAMSAT2), with daily and dekadal estimates derived from the pentadal estimates; (b) the use of rectangular calibration zones in TAMSAT2 has been replaced by detailed calibration fields derived from interpolated point values; (c) CCD are calibrated against mean gauge rainfall as opposed to median rainfall to reduce the dry bias associated with TAMSAT2; and (d) the rainfall amount calibration coefficients are adjusted by the CHPclim pentadal fields to reduce mean bias and improve characterization of geographical detail in the rainfall estimates (similar to the mean bias adjustment used to derive CHIRP). Thus, there is similarity between TAMSAT3 and CHIRP. The main differences between TAMSAT3 and CHIRP are that: (a) TAMSAT implements a varying rain/no-rain temperature threshold while CHIRP uses a fixed rain/no-rain temperature threshold, and (b) TAMSAT starts from 1983 while CHIRP starts form 1981. TAMSAT avoided starting from 1981 because of too many missing satellite images, while CHIRP's approach is to fill in the missing data with climate model estimates. This may impact the homogeneity of the CHIRP time series during 1981 and 1982.

3 EVALUATION OF SATELLITE PRODUCTS

The main focus of this validation work is to assess the performance of the CHIRP/S over eastern Africa. The performance of these products is also compared with the performance of other operational satellite rainfall products that have similar characteristics (i.e. ARC2 and TAMSAT). This section describes the approach used and presents validation results at different spatial and temporal scales.

3.1 Approach

3.1.1 Spatial scales

The validation in this study was done both at regional and national levels. The available number of stations varies from country to country (Figure 1). Relatively, there are a large number of stations from Ethiopia, Kenya and Tanzania. As a result, a more detailed validation was done over these countries at a national level. On the other hand, there are fewer stations from Rwanda, Somalia and Uganda. The data from these countries were used only as part of the regional-level validation. The main objective of the regional-level validation is to provide a picture of how the qualities of the satellite products vary over the whole region. To this end, the validation at regional level was done at each station location.

As the satellite products are pixel averages of rainfall estimates, the validation data also need to be converted to area-average. Thus, the station data were also gridded. The gridded reference data were generated by combining station data with satellite estimates. The gridded product, produced in the context of the ENACTS project (Dinku et al., 2014a), is similar to CHIRPS except that ENACTS incorporates many more stations (Figure 4). Thus, ENACTS is expected to be closer to the “true” rainfall observations. However, only ENACTS pixels that contain at least one of the validation rain-gauges were used for evaluating the satellite products. The evaluations were done for area average over 0.1° × 0.1° latitude/longitude pixels. The 0.1° is selected because it is the spatial resolution of ARC2, which is the lowest (coarse) resolution among the three satellite products. Thus, CHIRPS, TAMSAT and ENACTS pixels were aggregated to the 0.1° grid.

As all the validation stations in ENACTS data are at least 25 km away from any of the CHIRPS stations, the validations data may be considered independent. However, this is only partially true as the ENACTS pixels may have contributions from some of the CHIRPS stations, even if they are over 25 km away. This may have an impact on the results. To assess these impacts, validation with point-based observations from the validation stations were also included for comparison. Furthermore, CHIRP and TAMSAT3 use stations (by way of CHPclim), which could be more than those used in CHIRPS. The CHPclim, as well as the CHIRP and TAMSAT3, benefit from the much greater density of long-term average climate normal – there are many more estimates of long-term mean in situ average precipitation than there are typically available in standard observing systems for any given month. Thus, although CHIRP and TAMSAT3 use only station–satellite climatology for removing mean bias, the fact that gauges are used for bias adjustment may still have some impact on the results as compared to products that do not involve bias adjustment.

3.1.2 Temporal scales

The validations were done at daily, dekadal (10-day) and monthly time-scales. Daily data were available only for Ethiopia and Tanzania; as a result, daily validation was done only over those countries. Dekadal validation was done both at regional and national levels, while monthly validation was done at national levels over Ethiopia, Kenya and Tanzania.

3.1.3 Validation statistics

Different validation statistics were used to evaluate the different satellite products. Evaluation at daily time-scale focused on assessing the skill of the satellites in detecting the occurrence of rainfall. The validation statistics used here are probability of detection (POD), false alarm ratio (FAR), and the Heidke Skill Score (HSS). The POD is used to assess the skill of the satellite products in detecting the occurrence of rainfall, while FAR assesses false detections. The HSS statistic measures the accuracy of the estimates while accounting for matches due to random chances. The rainfall threshold used for discriminating between rainy and dry days is 1 mm. This threshold is used both for the gauge and the satellite pixels. This may affect the result by increasing frequency of rainfall occurrence by satellite relative to the gauge. Linear correlation coefficient (CC) and multiplicative bias (Bias) were also used just to offer an insight into the skill of the products in estimating rainfall amounts. Bias, mean error (ME), mean absolute error (MAE), CC, and Efficiency (Eff) are used for evaluation at dekadal and monthly time-scales. The efficiency, also known as coefficient of efficiency, shows the skill of the estimates relative to a reference (in this case the gauge average). The formulae and other descriptions, including optimum values, of the different evaluation statistics are given in Table 1.

Table 1. Descriptions of validation statistics used in the article. A, B, C and D represent hits, false alarms, misses, and correct negatives, respectively
Statistics Formula Range Unit Best value
Probability of detection urn:x-wiley:00359009:media:qj3244:qj3244-math-0001 0 to 1 None 1
False alarm ratio urn:x-wiley:00359009:media:qj3244:qj3244-math-0002 0 to 1 None 0
Heidke Skill Score urn:x-wiley:00359009:media:qj3244:qj3244-math-0003 −∞ to 1 None 1
Mean error urn:x-wiley:00359009:media:qj3244:qj3244-math-0004 −∞ to +∞ mm 0
Correlation coefficient urn:x-wiley:00359009:media:qj3244:qj3244-math-0005 −1 to 1 None −1 or 1
Mean absolute error urn:x-wiley:00359009:media:qj3244:qj3244-math-0006 0 to ∞ mm 0
Bias urn:x-wiley:00359009:media:qj3244:qj3244-math-0007 0 to ∞ None 1
Efficiency urn:x-wiley:00359009:media:qj3244:qj3244-math-0008 −∞ to 1 None 1
  • Note. G = gauge rainfall measurements; urn:x-wiley:00359009:media:qj3244:qj3244-math-0009 = average of the gauge measurements; S = satellite rainfall estimate; N = number of data pairs.

It should be noted that the validation results, by construction, will tend to represent accuracy levels in places where station data are available. These station locations tend to follow population and agriculture. Densely populated areas with active agriculture and airports tend to be much more frequently sampled than arid regions where households practice pastoral livelihoods.

3.2 Results

3.2.1 Validation at regional level

In order to explore the spatial variability of the performance of the satellite products, validation statistics were calculated for each station location. The regional level validation was done on a dekadal time-scale, as daily data were not available for some of the countries. The results are presented in Figures 5-7.

Details are in the caption following the image
Comparison of correlation coefficients for different satellite products over the Greater Horn of Africa for dekadal (10-day) accumulation. The grey scale in the background is elevations in metres
Details are in the caption following the image
Comparison of bias values for different satellite products over the Greater Horn of Africa for dekadal (10-day) accumulation. The values are given in % just for the convenience of display. Value above 100 show overestimations while values below 100 correspond to underestimations. The grey scale in the background is elevations in metres
Details are in the caption following the image
Comparison of skill (Eff) for different satellite products over the Greater Horn of Africa for dekadal (10 day) accumulation. The grey scale in the background is elevation in metres

Figure 5 shows correlation coefficients (CC) between dekadal station observations and the dekadal satellite pixel values for the different satellite products. All the four satellite products show high CC values over the northern half of the Ethiopian highlands and southern and western parts of Tanzania. Lower CC values are observed mainly over the southern half of Ethiopia, most of Somalia, highlands and coastal regions of Kenya and Tanzania, and most of Uganda and Rwanda. Over Ethiopia, the transition from high to low CC values is very sharp. The main divide is the Rift Valley with higher CC values north of the Rift Valley and lower values on its southern side. It is important to note that both sides are mostly mountainous. One possible explanation for this sharp difference could be the difference in the synoptic systems, and even seasons, that produce the rainfall over the two regions. Northern Ethiopia has a distinct peak in boreal summer (Figure 2a). For Tanzania and Kenya, the lower CC values are mainly over the coastal mountainous regions, while for Uganda they are around Lake Victoria. This could be a result of warm coastal and orographic rainfall processes.

Comparing the different satellite products, ARC2 and TAMSAT3 are somewhat similar. However, TAMSAT3 has higher CC values over Ethiopia and lower values over Rwanda. CHIRP and CHIRPS have CC values that are higher than the other two products, particularly over areas with low CC values such as Kenya and coastal Tanzania. On the other hand, CHIRP and CHIRPS are somewhat similar except over Rwanda and Kenya where CHIRPS has higher CC values. This is a result of CHIRPS incorporating contemporaneous observations from Rwanda and Kenya as opposed to simple mean bias adjustment for CHIRP.

Figure 6 presents the mean bias in percentages (%). Blue colours represent underestimation (bias value <75%) and red and brown colours show overestimation (bias >125%). Among the four products, ARC2 is a typical TIR-based product because its use of ground observation is limited to GTS stations, which are very few over this region. The other products use bias adjustment with climatological data computed from many more stations and satellite (TAMSAT and CHIRP) and more contemporaneous observations than available through GTS (CHIRPS). Thus, results for ARC2 offer a better insight into the challenges of TIR-based retrievals over this region. Figure 6 shows that, in general, ARC2 underestimates rainfall amounts over the mountainous and coastal regions where most of the rainfall comes from warm clouds. On the other hand, ARC2 overestimates rainfall amounts over most the dry and warm regions probably because of sub-cloud evaporation and other factors. Figure 6 also shows some exceptions where ARC2 underestimates over dry and hot areas (e.g. Somalia) and overestimates over mountainous regions (e.g. parts of Kenya and Tanzania). This could be because of either local climate or the quality of the ground observations.

The performance of the other three products is much better than ARC2, mainly because of the climatological bias adjustments. ARC2 does incorporate contemporaneous stations data from GTS stations, whenever available. This shows that simple bias correction using climatological data, which is constructed from many more stations that available through GTS, results in better improvement compared to using few contemporaneous observations; TAMSAT3 still underestimates over the coastal regions and overestimates over Uganda and parts of Ethiopia and Kenya. CHIRP and CHIRPS look very similar. This is expected, as the CHIRPS algorithm combines the bias-corrected CHIRP with station observations.

Figure 7 assesses the skill of the different satellite-based products using the efficiency (Eff) statistic. The Eff shows a pattern very similar to that of CC in Figure 5: higher skill over the northern half of Ethiopia and western Tanzania, and low skill over the rest of the region. As a result, the discussion of Figure 5 may also apply to Figure 7. Both ARC2 and TAMSAT3 exhibit very low skill over Somalia, Rwanda, the southern highlands of Ethiopia, and highlands and coastal regions of Kenya and Tanzania. The TAMSAT3 Eff values over Rwanda are not shown because these are negative values and the scale starts at zero. The skill over Rwanda is very low even for CHIRP/S, but not as low as ARC2 and TAMSAT3. CHIRPS shows better skill than CHIRP owing to the ingestion of contemporaneous station data. There are no differences between CHIRP and CHIRPS over Somalia because currently CHIRPS does not incorporate station data from Somalia. The CHG has recently obtained daily rainfall data over Somalia from about 50 stations, which will be used in the next version of CHIRPS.

3.2.2 Validations at national level

Validation over Ethiopia

Figure 8 presents a qualitative comparison of the five satellite-based rainfall products (ARC2, TAMSAT2, TAMSAT3 and CHIRP/S) with rain-gauge measurements at the dekadal time-scale. All the products represent the spatial distribution of rainfall reasonably well. There are also differences among the different products. ARC2 and TAMSAT2 displace the high rainfall area westwards compared to the station observations, while TAMSAT3, CHIRP and CHIRPS maps are closer to the station observations. The ENACTS product, which incorporates data from most of these stations, is very close to the station observations as expected. Though the CHIRP/S rainfall fields are closer to the station observation compared to ARC2 and TAMSAT2, they overestimate the areal coverage of the rainfall fields. The grey areas over the northern part of the country represent zero rainfall, while CHIRP/S show some low rainfall values. This is likely due to the relatively strong influence of the CHPclim on CHIRP/S; in the absence of station data to force the CHIRP/S to zero in areas of low rainfall, a reversion to a low, but non-zero estimate is likely.

Details are in the caption following the image
Sample dekadal rainfall fields over Ethiopia for second dekad of April 2009, comparing gauge observation and the five satellite-based rainfall products at the original resolutions of the products. The grey area represents zero rainfall, while the colour bar shows rainfall amounts in mm. Elevation map is provided for reference

The scatter plots in Figure 9 compare the satellite products and gridded gauge measurements at dekadal time-scales. There is wide scatter for all the products. Part of this scatter may be attributed to uncertainty in station locations, and uncertainties associated with some gauge observations. ARC2 has wider scatter than the other products as well as systematic underestimation of rainfall amounts. TAMSAT2 exhibits systematic underestimation of moderate and higher rainfall amounts, which is corrected in TAMSAT3. However, TAMSAT3 has wider scatter compared to TAMSAT2, which is a result of the bias correction process. The CHIRP/S products have less scatter and less bias compared to the other products. On the other hand, there is no substantial difference between CHIRP and CHIRPS.

Details are in the caption following the image
Comparison of different satellite products against area-average gauge over Ethiopia at dekadal time-scale. Rainfall amounts are given in mm

The error statistics CC, Eff, multiplicative bias (Bias) and MAE are presented in Table 2, which has two parts: the left side shows actual validation using 0.1° grid pixel averages while the right side compares point stations measurements with satellite pixels at the original resolution of the satellite products. The second part is presented to assess the impact comparing area-average satellite products to point station observations. The validation statistics for all the products are better for the case of pixel-to-pixel comparison. This is expected because point-based observations may not represent pixels (area averages) well. This also means that the results presented in Figures 5-7, which are point-to-pixel comparisons, could underestimate the actual performance of the satellite products. The differences between the two parts of Table 2 (and also Tables 4 and 6) could be used to assess how much that underestimation might be.

Table 2. Validation statistics for dekadal rainfall products over Ethiopia using pixel-to-pixel (left) and point-to-pixel (right) comparisons
Pixel-to-pixel(0.1° × 0.1° grid) Point-to-pixel(at original resolution of products)
CC Eff Bias MAE CC Eff Bias MAE
ARC2 0.76 0.52 0.71 17.8 0.67 0.39 0.70 21.6
TAMSAT2 0.83 0.61 0.69 15.7 0.74 0.46 0.68 18.8
TAMSAT3 0.84 0.69 1.00 14.7 0.75 0.54 0.99 19.3
CHIRP 0.85 0.73 0.99 14.3 0.76 0.58 0.98 19.0
CHIRPS 0.87 0.75 0.95 13.4 0.77 0.59 0.94 18.2

The comparisons discussed below are based on the pixel-to-pixel comparisons in Table 2. The ARC2 product has the lowest CC values (0.76) while the values for the other products are very similar (ranging just from 0.83 to 0.87). ARC2 has also the lowest skill (Eff = 0.53), and highest random error (MAE = 17.8), which was also shown in the scatter plots (Figure 9). Table 2 also shows that both ARC2 and TAMSAT2 underestimate rainfall amounts. TAMSAT3 exhibits much improvement over TAMSAT2, particularly with respect to bias. As pointed our earlier, this is a result of mean bias adjustment employed by TAMSAT3. The CHIRP/S products have better skills (Eff of 0.73 and 0.75) compared to all the other products. However, there is no appreciable difference between CHIRP and CHIRPS.

Figure 10 presents the scatter plots for monthly totals of rainfall (satellite products against area-average gauge) over Ethiopia while the error statistics are presented in Table 3. TAMSAT2 is not included in the comparison here as the improvement from TAMSAT2 to TAMSAT3 has already been demonstrated at the dekadal time-scale. As expected, the monthly aggregations have reduced the scatters considerably. The underestimations by ARC2 stand out in Figure 10. TAMSAT3 and CHIRP/S products have much less scatter. On the other hand, CHIRPS has less scatter relative to CHIRP and TAMSAT3, but shows slight underestimation of higher rainfall amounts. This underestimation of extremes may arise from the CHIRPS blending process; the CHIRP is combined with distance-weighted average anomalies from surrounding stations. This spatial averaging of the station inputs may artificially reduce the variability of the precipitation estimates. This tends to be a standard trade-off in estimating settings. High performance “on average” may come at the price of underestimating extremes. The error statistics have also improved for all the products with higher correlations (CC ≥ 0.86) and skill (Eff ≥ 0.64) compared to the dekadal version shown in Table 2. The performance of TAMSAT3 and the CHIRP/S products is very similar, while ARC2 is the product with low performance.

Details are in the caption following the image
Comparison of different satellite products against area-average gauge over Ethiopia at monthly time-scale. Rainfall amounts are given in mm
Table 3. Validation statistics for monthly rainfall products over Ethiopia
CC Eff Bias MAE
ARC2 0.86 0.64 0.71 43.7
TAMSAT3 0.91 0.82 1.01 31.9
CHIRP 0.92 0.84 0.99 30.2
CHIRPS 0.93 0.87 0.96 26.5

Validation over Tanzania

Figure 11 compares rainfall fields over Tanzania for the different satellite products with rain-gauge measurements. All the products represent the overall structure of the rainfall field reasonably well. ARC2 shows good agreement with the gauge measurement while TAMSAT2 underestimates the high rainfall amounts. Again, bias adjustment of the TAMSAT product, as well as the other changes made to the calibration methodology, has reduced the error in TAMSAT3. The CHIRP/S products are close to the station observations, but again overestimate the spatial extent of the rainfall field. The CHIRP/S products also underestimate the high rainfall amounts over the central part of Tanzania. ENACTS exhibits the closest agreement with gauge measurements owing to the fact that data from most of those stations were used to create the ENACTS product.

Details are in the caption following the image
Sample dekadal rainfall fields over Tanzania for second dekad of January 2007, comparing gauge observations (point data) and the five satellite-based rainfall products at the original resolutions of the products. The grey area represents zero rainfall, while the colour bar shows rainfall amounts in mm. Elevation map is provided for reference

The scatter plots in Figure 12 compare the satellite products averaged over 0.1° grid with area-average gauge data at the dekadal time-scale. As in the case of Ethiopia, wide scatter is observed for all the products. Again, parts of these scatters could be attributed to factors such as uncertainty in station locations and gauge measurement. ARC2 exhibits the widest scatter while TAMSAT2 underestimates high rainfall, which does not appear in TAMSAT3. ARC2, TAMSAT2 and TAMSAT3 also miss a significant number of rainfall events (satellite values are zero even when gauges report rainfall amounts over 100 mm). This problem is also observed for Ethiopia (Figure 9), but was not as severe. This could be because of orographic (Ethiopia) or coastal (Tanzania) rainfall processes, which are missed because of the cold temperature thresholds used by these algorithms. Both ARC2 and TAMSAT use cloud-top temperature thresholds to distinguish between raining and non-raining clouds. The threshold for ARC2 is fixed (Novella and Thiaw, 2013), while that of TAMSAT is variable (Maidment et al., 2014). These thresholds may not work for orographic and coastal clouds because these clouds can produce rainfall at relatively warmer cloud-top temperatures. The CHIRP/S products exhibit less scatter and bias compared to ARC2 and TAMSAT, and seems to capture the high rainfall values missed by ARC2 and TAMSAT. This is despite the fact that TAMSAT uses different thresholds obtained through calibration with gauges while CHIRP uses a single threshold for all locations and seasons. One possible explanation for this discrepancy could be that the TAMSAT algorithm may not use many stations from the coast, which may lead to colder thresholds.

Details are in the caption following the image
Comparison of different satellite products against area-average gauge over Tanzania at dekadal time-scale. Rainfall amounts are given in mm

The validation statistics for dekadal rainfall products over Tanzania is summarized in Table 4. As in the case of Ethiopia (Table 2), the point-to-pixel comparison is presented for reference. The analysis here is based only on pixel-to-pixel comparison. The CHIRP/S products are better than all the other products with respect to all the validation/error statistics. The strongest statistics for CHIRP/S are skill (Eff = 0.56, 0.57) and bias. Both ARC2 and the TAMSAT products underestimate rainfall amounts with TAMSAT2 exhibiting more severe underestimation (Figure 12). Again, TAMSAT3 is a clear improvement over TAMSAT2 in terms of bias, but there is no big difference between CHIRP and CHIRPS.

Table 4. Validation statistics for dekadal rainfall products over Tanzania using pixel-to-pixel (left) and point-to-pixel (right) comparisons
Pixel-to-pixel (0.1° × 0.1° grid) Point-to-pixel (at original resolution of products)
CC Eff Bias MAE CC Eff Bias MAE
ARC2 0.68 0.41 0.79 16.5 0.61 0.32 0.75 19.8
TAMSAT2 0.69 0.45 0.76 16.5 0.62 0.34 0.72 19.9
TAMSAT3 0.69 0.43 0.92 16.4 0.62 0.35 0.87 19.8
CHIRP 0.76 0.56 1.00 15.5 0.68 0.45 0.95 19.0
CHIRPS 0.78 0.57 1.03 14.9 0.70 0.47 0.98 18.6

The results for evaluations at monthly time-scales are presented in Figure 13 and Table 5. Figure 13 shows less scatter compared to the dekadal version (Figure 12), but wider scatter and underestimation of rainfall amounts by ARC2 and TAMSAT3. It also shows better performance by CHIRP/S relative to ARC2 and TAMSAT3. Table 4 shows the error statistics for monthly accumulations. The CHIRP/S products perform better than both ARC2 and TAMSAT3 with higher correlations and skill, little or no bias, and smaller random errors.

Details are in the caption following the image
Comparison of different satellite products against area-average gauge over Tanzania at monthly time-scale. Rainfall amounts are given in mm
Table 5. Validation statistics for monthly rainfall products over Tanzania
CC Eff Bias MAE
ARC2 0.76 0.53 0.80 38.0
TAMSAT3 0.74 0.52 0.92 37.3
CHIRP 0.85 0.71 1.00 32.2
CHIRPS 0.86 0.73 1.03 29.6

Validation over Kenya

Figure 14 compares rainfall fields over Kenya. Overall, the spatial rainfall pattern depicted by the station observations is also represented by the rainfall fields of the different satellite products, except TAMSAT2. Even though areal coverage of the rainfall field in TAMSAT2 is similar to that of the gauge, the spatial structure is not well represented as TAMSAT2 misses all the high rainfall areas. This is of course corrected in TAMSAT3, which is a very good example of the improvement from TAMSAT2 to TAMSAT3. There are similarities between TAMSAT3 and CHIRP, which could be ascribed to the fact that the two products use the same data for bias correction. Both TAMSAT products miss the coastal rainfall over the southeastern part of Kenya. ARC2 also misses the coastal rainfall. The small circles in ARC2 over the east are the result of combining station measurement with zero satellite values. The CHIRPS and ENACTS products are somewhat similar, except that CHIRPS overestimates areal coverage of high rainfall values particularly over western Kenya, which could be a result of radius of interpolation.

Details are in the caption following the image
Sample dekadal rainfall fields over Kenya for second dekad of May 2010, comparing gauge observations (point data) and the five satellite-based rainfall products at the original resolutions of the products. The grey area represents zero rainfall, while the colour bar shows rainfall amounts in mm. Elevation map is provided for reference

The scatter plots in Figure 15 compare the satellite products averaged over 0.1° grid with area-average gauge data while the error statistics are presented in Table 6. The relative performances of the different products over Kenya are very similar to those over Ethiopia and Tanzania. However, the overall performance of the satellite products over Kenya is much less than the performances over Ethiopia and Tanzania. For instance, the range of CC values is 0.76 to 0.87 over Ethiopia, 0.68 to 0.78 over Tanzania, but drops to 0.63 to 0.73 over Kenya. The reason for this is unclear, but may be related to warm rain processes over coastal and mountainous parts of Kenya that are not well captured by the TIR data.

Details are in the caption following the image
Comparison of different satellite products against area-average gauge over Kenya at dekadal time-scale. Rainfall amounts are given in mm
Table 6. Validation statistics for dekadal rainfall products over Kenya using pixel-to-pixel (left) and point-to-pixel (right) comparisons
Pixel-to-pixel (0.1° × 0.1° grid) Point-to-pixel (at original resolution of products)
CC Eff Bias MAE CC Eff Bias MAE
ARC2 0.63 0.33 0.75 16.2 0.55 0.24 0.72 19.0
TAMSAT2 0.65 0.41 0.88 15.6 0.55 0.31 0.85 18.8
TAMSAT3 0.67 0.35 1.09 16.8 0.58 0.27 1.05 20.0
CHIRP 0.69 0.44 1.09 16.4 0.61 0.35 1.07 19.7
CHIRPS 0.73 0.50 1.13 15.4 0.65 0.39 1.10 18.7

The results for evaluations at monthly time-scales are presented in Figure 16 and Table 7. Figure 16 shows that both ARC2 and TAMSAT3 have wider scatter than CHIRP/S and also underestimate some rainfall amounts. The underestimation is more prominent for ARC2. However, there is much less scatter compared to the dekadal version (Figure 15). CHIRPS is the product with smallest random error (less scatter). Table 7 also shows that the CHIRP/S products perform better with slightly better correlation coefficients, much better skill (higher Eff), and smaller random errors.

Details are in the caption following the image
Comparison different satellite products against area-average gauge over Kenya at monthly-time scale. Rainfall amounts are given in mm
Table 7. Validation statistics for monthly rainfall products over Kenya
CC Eff Bias MAE
ARC2 0.71 0.43 0.75 37.7
TAMSAT3 0.75 0.49 1.08 36.7
CHIRP 0.78 0.57 1.10 34.9
CHIRPS 0.83 0.65 1.13 31.6

3.2.3 Validation at daily time-scale

The daily validation was done only over Ethiopia and Tanzania because of availability of daily rain-gauge data. Each country is divided into two parts. For Ethiopia, parts of the country north and south of the Rift Valley, which are referred to as northwest (NW) and southeast (SE), were evaluated separately. This delineation is based on the performances of satellite products over the two regions (Figures 5 and 7). Similarly, Tanzania is divided into west (inland) and east (coastal) parts because of the same reason as the Ethiopian case. The satellite products evaluated were only ARC2, TAMSAT3 and CHIRPS. TAMSAT2 was not evaluated because it has now been replaced by TAMSAT3, and CHIRP was not included in this section because CHIRPS is already an improvement over CHIRP. These two products (TAMSAT2 and CHIRP) were included in the previous sections just for reference, i.e. to show the improvement, or lack of improvement, from TAMSAT2 to TAMSAT3 and from CHIRP to CHIRPS.

The validation statistics used to assess the products at daily time-scales mainly focus on assessing the products' skill in detecting the occurrence of rainfall. Thus, POD, FAR and HSS are used. However, CC and Bias are also included. Table 8 presents the statistics for Ethiopia. These are point-to-pixel comparisons to avoid interpolation of daily rainfall values over such mountainous terrain. Correlation is low for all the products over both NW and SE parts with lower values over the latter. All the rainfall detection statistics (POD, FAR and HSS) show better performance of the satellite products over the NW region. This is in agreement with results shown in Figures 5-7. Comparing the three satellite products, TAMSAT3 exhibits a better performance over both regions in all the validation statistics. In particular, rainfall detection by TAMSAT3 is much better than the other two products. This is somewhat different from what was observed from the comparisons at dekadal and monthly time-scales where CHIRPS showed a slightly better performance over TAMSAT3 (Tables 2 and 3). The better detection statistics by TAMSAT3 may be ascribed to a local calibration by the TAMSAT algorithm.

Table 8. Validation statistics for daily rainfall products over northwestern (north of the Rift Valley) and southeastern (south of the Rift Valley) Ethiopia using point-to-pixel comparisons
Northwest (NW) Southeast (SE)
CC Bias POD FAR HSS CC Bias POD FAR HSS
ARC2 0.36 0.70 0.55 0.29 0.48 0.27 0.61 0.42 0.34 0.34
TAMSAT3 0.47 0.92 0.77 0.33 0.58 0.34 0.98 0.65 0.39 0.44
CHIRPS 0.37 0.88 0.52 0.29 0.46 0.27 0.90 0.40 0.35 0.32

Daily validation over Tanzania is presented in Table 9. Correlation coefficients are low over both the inland and coastal parts of the country, but slightly lower over the coast. The satellite products also show a better performance over the inland part in all the other validation statistics. This is in agreement with what was observed from dekadal validation at regional (eastern Africa) level (Figures 5-7). The detection skills are very low over the coast, which is depicted by very low POD and very high FAR values. TAMSAT3 exhibits a slightly better performance over both parts of Tanzania. It is interesting to note that CHIRPS is not that much different from ARC2 at the daily time-scale, particularly in detecting rainfall, while it was much better when compared at dekadal time-scales (Table 4).

Table 9. Validation statistics for daily rainfall products over western and eastern Tanzania using point-to-pixel comparisons
West (interior) East (coast)
CC Bias POD FAR HSS CC Bias POD FAR HSS
ARC2 0.31 0.81 0.59 0.42 0.45 0.27 0.52 0.30 0.50 0.26
TAMSAT3 0.35 0.83 0.66 0.43 0.47 0.33 0.70 0.41 0.56 0.28
CHIRPS 0.29 0.91 0.57 0.43 0.43 0.24 0.85 0.27 0.53 0.24

3.3 Discussion

The main focus of this validation work has been to assess the performance of recently released satellite-based rainfall estimates (CHIRP/S) over eastern Africa. The performance of these products was also compared with the performance of other operational satellite rainfall products that have similar characteristics (i.e. ARC2 and TAMSAT). The TAMSAT team has just introduced its latest version (TAMSAT3). There are some substantial differences between the earlier version (TAMSAT2) and the latest version. Thus, comparison of these two products has also been performed at the dekadal time-scale. The comparison of these different products can thus offer an insight into their weaknesses and strengths.

Comparison of some validation statistics (CC, Bias and Eff) over the whole region has shown that the performance of the different satellite products varies considerably from place to place. This may be ascribed to the complex climate of the region described in section 2.1. In many cases, significant spatial differences are observed within a relative short distance. For instance, stark differences were observed over two parts of Ethiopia, roughly separated by the Great East African Rift Valley (Figures 5-7). Even though both sides are dominated by mountainous terrain, performance of satellite products is much better north of the Rift Valley compared with the southern part. These differences may be ascribed to differences in synoptic systems and associated seasonality. The difference in rainfall seasonality over the two areas is shown in Figure 2. It has also been shown that the two areas are under the influence of different synoptic systems during the main rainy seasons (Segele et al., 2009; Jury, 2010). The southern part exhibits suppressed convective activity relative to the northern part. As a result, the satellite products may miss some of the rainfall over the southern region. This is evident from Table 8, which shows lower rainfall detection skills over the southern part.

The other stark difference is over Tanzania, where there are marked differences between the coastal and mountainous areas on one hand and the western plains on the other. Here the difference may be ascribed to coastal and orographic warm rain processes (e.g. Dinku et al., 2011). This may also apply to most of western Kenya and Rwanda (“a country of a thousand hills”, as the locals call it) as well as parts of Somalia.

The TAMSAT product could be considered to be temporally consistent owing to its use of only TIR data. However, there would be some inconsistency in the early part of the time series due to missing satellite observations. The main weakness of TAMSAT2 used to be consistent underestimation of high rainfall amounts. However, these have been reduced substantially in the recent version through changes in the calibration methodology and mean bias adjustment. The improvement of TAMSAT3 over TAMSAT2 demonstrates the importance of mean bias removal in improving satellite without the need for contemporaneous ground observations. TAMSAT3 was also shown to perform slightly better than both ARC2 and CHIRPS at a daily time-scale (Tables 8 and 9). Maidment et al. (2017) have reported similar results over Niger, Nigeria, Uganda, Mozambique and Zambia. The main difference between CHIRPS and TAMSAT3 is more pronounced over Ethiopia. This is interesting given that CHIRPS was shown to be better than TAMSAT3 at dekadal and monthly time-scales. This may be ascribed to how the two products generate daily estimates. CHIRP is trained with 0.25° National Aeronautics and Space Administration (NASA) TMPA rainfall estimates (Huffman et al. 2011). The distribution of these targets will be less extreme than station data. In addition, CHIRP relies on a fixed CCD threshold (235 K), that may lead to weaker daily detection skills. On the other hand, TAMSAT is calibrated with stations data and uses local calibration to select CCD thresholds over a given geographical area.

ARC2 exhibits high random errors and lower skill. These may be attributed to two main factors. The first is that ARC2 uses a single rain/no-rain threshold (235 K) for the whole of Africa. As a result, ARC2 may miss rainfall from warm cloud processes such as orographic and coastal rains. This problem is more prominent over Tanzania where ARC2 may miss rainfall values over 100 mm (Figure 12). The other factor is that ARC2 uses three-hourly, as opposed to half-hourly, TIR observations. As a result, it may miss some short-lived rainfall events, which are frequent over the Tropics.

CHIRP is much better than both ARC2 and TAMSAT2 and slightly better than TAMSAT3. It has little or no bias and significantly higher skill. In addition to the algorithm itself, CHIRP has one main advantage over ARC and TAMSAT2. This advantage is the use of carefully generated gauge–satellite climatology, CHPclim (Funk et al., 2012; 2015b) to remove mean biases. However, the difference between CHIRP and TAMSAT3 can only be ascribed to differences in the two algorithms as both products use the same data (CHPclim) for bias removal.

The main weakness of CHIRP is the overestimation of the rainfall area. This could be seen from Figures 8, 11 and 14. The CHIRP/S rainfall fields show some low rainfall values over some areas where ground observations and the other satellite products show zero rainfall values. This artefact may be due to the use of TRMM TMPA as “truth” data in the TIR estimation procedure. Since the TMPA is at 0.25° resolution, training to this data may produce drizzle. A drizzle effect could also be an artefact of the linear regression estimation step in the CHIRP–TMPA fitting procedure. Future versions of the CHIRP might consider a two-step estimation procedure, with the first step estimating rainfall extent (as a binomial process), and the second step estimating rainfall amount. This CHIRP/S drizzle problem is limited to very low rainfall amounts. As a result, its impact on the overall performance of the products is not significant. For instance, this does not appear to affect the bias values. Future versions of the CHIRP/S might consider a two-stage estimation process – with the first stage evaluating the probability of a rain event, and the second stage evaluating the rainfall intensity, for the areas deemed to have received precipitation.

The CHIRPS product has not been discussed separately above because it has been shown that it is very similar to CHIRP. No substantial difference has been observed between these two products. This is against intuition, as the addition of the new stations observations for each dekad of each year should have improved CHIRP. Potentially, there could be different reasons for this. One cause could be that the mean bias adjustment removes the main differences between actual measurements and the satellite-based products. This would be good because it implies that one can improve satellite rainfall products significantly using simple bias adjustment without the need for contemporaneous ground observations. This has also been demonstrated for the case of TAMSAT2 versus TAMSAT3. Historical observations are available from many sources while the availability of current observations is very limited. This is also good because the product will be more homogeneous as the station input would not change over time. The other possible cause could be the last step of the algorithm that produces CHIRPS. This step combines the original (climatologically adjusted) CHIRP and the same product combined with contemporaneous station observations. The proportion of this combination depends on the square of the correlation between CHIRP and “true” rainfall as well as the estimated correlation with the nearest station (Funk et al., 2014; 2015a). If the expected correlation between the point of interest and nearby stations is very small, CHIRPS and CHIRP will be very close. This could be the case over areas with high spatial variability of rainfall, such as Ethiopia. This may not be good because it means that the algorithm is not able to make use of the new station observations. Future versions of CHIRPS should revisit this question, and consider a blending procedure that gives more influence to the station data.

While a full review of the validation literature is beyond the scope of this study, the robust performance of CHIRPS does appear to be evident in comparisons with TMPA and CMORPH estimates over Italy (Duan et al., 2016) and Madagascar (Toté et al., 2015). Over West Africa, CHIRPS performance was found to be similar to CMORPH, TMPA and the PERSIANN datasets (Poméon et al., 2017). A gauge-based validation of CHIRPS over Brazil found high levels of skill (Paredes-Trejo et al., 2017).

4 SUMMARY AND CONCLUSIONS

The Climate Hazards Group Infrared Precipitation (CHIRP) and the version combined with contemporaneous station data (CHIRPS) were evaluated over the Greater Horn of Africa. The evaluations were done by comparing CHIRP and CHIRPS with reference rain-gauge data as well as with two similar satellite rainfall products (ARC2 and TAMSAT). In the process, ARC2 and TAMSAT have also been validated. A new version of TAMSAT (TAMSAT3) has just been released and this version was also compared with the previous version.

Validation was done at a regional level for eastern Africa as well as at country level over Ethiopia, Kenya and Tanzania. The regional-level validation was done for dekadal totals while the country validations also included monthly totals. Validation at a daily time-scale was also included for Ethiopia and Tanzania. The regional-level validation has revealed very interesting spatial patterns in the performance of the different satellite products. The main feature of this spatial structure confirms that satellite rainfall products have challenges over mountainous and coastal regions. However, as shown over Ethiopia, it does not mean that all mountainous areas behave the same way; the performance of the rainfall products over central and northern Ethiopia was very encouraging.

The CHIRP and CHIRPS products performed significantly better than ARC2 with higher skill, low or no bias, and lower random errors. These products were also better than TAMSAT3 in terms of skill and random error, but are about the same in terms of bias. However, TAMSAT3 showed slightly better performance over CHIRPS at daily time-scales.

No significant differences were observed between CHIRP and CHIRPS except over Kenya where CHIRPS shows a slightly better performance. CHIRPS has slightly lower random error over all areas. CHIRPS was expected to be much better than CHIRP because of the addition of the new stations observations. The possible reason for this could be the fact that the mean bias adjustment removes the main differences between actual measurements and the satellite-based products. This was also demonstrated by comparing TAMSAT3, which is mean bias adjusted, with the previous version. This could be a desirable outcome because it means that satellite rainfall products could be improved significantly just using simple bias adjustments. As historical observations are readily available from many sources, a mean bias adjustment is relatively easier than using concurrent station observations whose availability is limited over many parts of the world. The bias adjustments could also help to make the products temporally more homogeneous as the inputs may not change over time. This is in contrast to contemporaneous station inputs whose number may change over time, making the product less homogenous. One potential downside, however, could be that in places where the background climatology is very low, a multiplicative correction will make the unbiased precipitation also very low. This means that places that have a zero value in the climatology will never receive rain. For example, within the CHIRP procedure, an area with a climatological mean of zero will always have a CHIRP value of zero. Conversely, it seems that in areas of very low rainfall CHIRP tends to overestimate the number of rain events. Going forward, it might be possible to develop unbiasing procedures that deal separately with the probability of precipitation and with the quantity of precipitation. This might allow for correction of the CHIRP/S tendency to overestimate precipitation area.

ACKNOWLEDGEMENTS

We would like to thank the National Meteorological Services of Ethiopia, Tanzania and Kenya, as well as the data provided by the SWALIM project, for the station data used in this article. The authors are also very grateful for the financial support provided by NASA grants number NNX14AD30G, NNX16AN14G and NNX15AL46G, and the U.S. Geological Survey's Drivers of Drought project.