Temporal and spatial variability analysis of the solar radiation in a region affected by the intertropical convergence zone

The first aim of this work was to obtain temporal variability patterns for satellite‐derived solar radiation estimations in Zambia. A principal component analysis, in Zambia and the surrounding zones, was performed and from this analysis the physical phenomena associated with these patterns were established. According to the results, two temporal variability patterns stand out: the first is associated with the regional global climate characteristics, including both the deterministic and non‐deterministic components of solar radiation, and the other is strictly associated with the influence of the intertropical convergence zone (ITCZ) responsible for the behaviour of solar radiation during October–March. The second aim of the work was to analyse the spatial variability of the irradiance in the study area. For this aim, a clustering analysis based on the interquartile range of the data was performed. The analysis leads to a spatial distribution of the radiation in agreement with the influence of the ITCZ on the territory. Indeed, those stations less affected by the ITCZ, in the south and east of the territory, show a clear diminution of the radiation around June. However, the stations of the northwest zone, the most affected by the belt of low pressure during November–April, do not present this diminution.


| INTRODUCTION
Knowledge of the temporal and spatial variability of solar radiation in a region is generally needed in solar resource assessment studies. However, the spatial and temporal coverage of solar radiation ground data is frequently not enough to provide accurate results based solely on measurements. In the case of scarcity of ground measurements, estimations from satellites or numerical weather models are the only approximation to reality and thus all studies must be carried out with these tools (Sengupta et al., 2015).
From these considerations, the aim of the work is twofold: first, to obtain temporal variability patterns for satellite-derived radiation estimations in the area of study (Zambia and surrounding zones) and, from this analysis, to establish the physical phenomena associated with these patterns; once the patterns have been obtained and studied, the second aim of the work was to analyse the spatial variability of the irradiance in the study area. The methodologies applied for this study of variability were principal component analysis (PCA), for the temporal case, and clustering analysis, for spatial variability.
Regarding PCA, it is a powerful statistical technique for the study of variability characteristics of time series that has been widely used in several meteorological and climatological studies. This tool allows a set of spatial structures, called empirical orthogonal functions (Preisendorfer, 1988;Monahan et al., 2009), to be obtained which are associated with temporal variability patterns (principal components) of the variable. Each of these patterns explains a percentage of the variance of the whole of the data. This percentage can be known from the eigenvalues of the eigenvectors corresponding to the covariance matrix of the data (Kruskal, 1978;Saporta and Niang, 2009;Wilks, 2011) so the different principal components can be sorted by order of importance in order to reduce the number of components describing the process.
The principal components can be associated with physical phenomena, and their correlation with the temporal variability of the variable at each point of the analysed region can be described by correlation maps.
Regarding the clustering analysis applied to solar resource assessment (Zagouras et al., 2013(Zagouras et al., , 2014a(Zagouras et al., , 2014bGutiérrez et al., 2017;Rodríguez-Benítez et al., 2018), this was accomplished to study the spatial variability by dividing the territory, with regard to the temporal variability of the global solar radiation, into coherent zones. Global solar radiation estimations over a region from satellites or numerical weather models usually need to be corrected by using scarce measurements and thus measuring sites for solar radiation over this region must be determined (Polo et al., 2016). This is why clustering analysis (Romesburg, 1984;Everitt et al., 2001) is usually performed in order to identify zones where the radiation has a common behaviour and thus where one measurement can be representative of the level of radiation in the whole zone. There are different methods to establish the optimal number of clusters. One of them is the so-called silhouette method, which was proposed by Rousseeuw (1987), where for each data point i, in the cluster C i , the following value is defined: where a(i) is the average dissimilarity of the data point i from the rest of the data within the cluster and b(i) is the lowest average dissimilarity of i to any neighbour cluster. From s(i), the average silhouette width (ASW) is defined as follows: This value measures the quality of the entire cluster structure and thus its maximum provides a criterion for selecting the number of groups.
The common behaviour can be referred to the radiation value itself, as in Díaz-Torres et al. (2017), or to its temporal variability, as in Polo et al. (2015). In this last case, from radiation measurements obtained by applying a model inspired in the Angström-Prescott formulation (Prescott, 1940;Nguyen and Pryor, 1997), different zones of Vietnam with similar annual variability were obtained. The interquartile range (IQR) was used as a variability index and clustering analysis was applied to this variable. The results showed a clear correspondence between the zones obtained and the zones corresponding to the Köppen climate classification (Köppen, 1936;Kottek et al., 2006). Therefore, the application of a clustering analysis based on the temporal evolution of the radiation can serve for evaluating, in a first approximation, the quality of the estimations obtained from satellites of numerical models, as this analysis can indicate if the climatological characteristics of the region, to a greater or lesser extent known, are fulfilled from these estimations.
Our second aim (the analysis of spatial variability) is similar to the aim of the cited work of Polo et al. (2015) and thus the same technique of clustering can be used in this case. However, unlike in that paper, the previous analysis via principal components provides good information about the variability behaviour and justifies the use of the IQR for the clustering process.
The work was structured in the following terms: first, from the satellite-derived global solar radiation estimations, a PCA was carried out over the territory. This analysis provides variability patterns whose degree of fulfilment at each point is established via correlation maps. From these maps, and the corresponding principal components, the dominant climatological features in the region were analysed.
Next, the region analysed was divided into groups by a clustering process. According to the previous PCA, the IQR of the series was used for this process, which was calculated using a k-means algorithm (Kanungo et al., 2002). The algorithm generates groups so that each element belongs to the group with the nearest mean (Adam and Celebi, 2013). Thus, the objective of the technique is to minimize total intra-cluster variance.
Finally, although the purpose of the work was to analyse the variability of the radiation in areas without the availability of ground measurements, in order to give more robustness to the study the results obtained were also evaluated by using some stations with ground data in the area of study.

| DATA
The study was carried out in southern Africa, specifically between longitude 21.5 E and 35.9 E and latitude 18 S and 8.1 S. This region covers the geography of Malawi, Zambia and small zones of neighbouring countries such as Zimbabwe, Tanzania, Mozambique, Namibia, the Democratic Republic of the Congo and Angola. This region was chosen because of the seasonal variability of the radiation that can be expected as a result of the effect of the intertropical convergence zone (ITCZ) (Sultan and Janicot, 2000;Nicholson, 2009;Suzuki, 2011), although the study could be applied in any other region.
Daily solar radiation data corresponding to the period 2003-2011 (9 years), obtained from Meteosat 7 images, were used. The spatial resolution was 0.1 , in such a way that the grid covered 14,500 points, distributed in 100 rows and 145 columns. The images were computed on the basis of the Heliosat-3 model (Dagestad and Olseth, 2007) by introducing some modification on the albedo estimations (Polo et al., 2011(Polo et al., , 2013(Polo et al., , 2014. Also, the REST2 model (Gueymard, 2008) was used for clear sky computation together with the MACC reanalysis (Inness et al., 2013) in order to obtain the input of atmospheric components that REST2 requires.
Moreover, ground data corresponding to 10 stations distributed throughout the study area were also used in order to verify the results obtained. Data were taken from the World Bank Group (World Bank Group, 2014) and the webpage http://www.opensolardb.org/. This page offers data on average daily global radiation for each month of the year. The database takes data mainly from the World Radiation Data Centre, whose data have been obtained from pyranometers.

| Climate features of the region
According to the Köppen climate classification, most of the area of study can be classified ( Figure 1) as "humid subtropical climate" (Cwa). This climate is characterized by dry winters (McKnight and Hess, 2000) and usually occurs on the eastern coasts and eastern sides of the continents sited in mid-latitudes, where warm and moist southern flows coming from subtropical highs generate wetter summers than the winters.
On the eastern and northern zones, the climate belongs to category Aw (tropical wet and dry). This climate (McKnight and Hess, 2000) presents a pronounced dry season (precipitation of the driest month less than 60 mm and less than a quarter of the total annual precipitation).
Within the Cwa zone, several small regions of Dfa climate (hot summer continental without dry season) can be identified. These isolated Dfa regions correspond to the highest altitude zones within the studied domain, as can be observed in Figure 2 which shows an elevation map of the territory. The comparison between Figures 1 and 2 also shows climatic changes related to sudden changes in altitude, as can be clearly observed in the eastern zones.
Finally, some southern regions show climate BSh (hot steppe). This hot semi-arid climate is characterized by very hot summers and mild or warm winters and mostly spreads out along the south of the region covering primarily Botswana and large parts of Namibia.
On the other hand, the area of study is under the influence of the ITCZ (Figure 3). This zone is the belt where warm and humid easterly trade winds coming from the northern and southern hemisphere converge, yielding abundant cloudiness and precipitation. The annual position of this zone establishes two clearly different seasons in the zone: the dry season, from April to September, and the rainy season, from October to March. The cycle in the precipitation has a clear correlate in the solar radiation that reaches the Earth's surface.
The seasonal variation of the ITCZ is shown in Figure 3, where the red and blue lines represent the belt in austral winter and summer, respectively. The figure confirms the presence of the ITCZ in the study zone during the austral summer and its absence during the winter. However, this presence is more evident in the northern zone in such a way that according to the figure the belt does not reach the F I G U R E 1 Köppen climate classification southern zone of study. This aspect allows different zones of variability to be distinguished within the considered area according to the seasonal evolution of the ITCZ. Due to the influence of the ITCZ, during October-March the cloudiness is abundant, having a noticeable influence on the seasonal behaviour of solar radiation.

| RESULTS AND DISCUSSION
First, a PCA in the region of study was carried out. For this analysis, the solar global radiations must be standardized (removing the mean and dividing by the SD), becoming F I G U R E 2 Elevation map F I G U R E 3 Seasonal variation of the intertropical convergence zone (ITCZ). The red and blue lines represent the ITCZ in austral winter and summer, respectively. The purple rectangle shows the study area F I G U R E 4 Principal components: (a) first principal component; (b) second principal component anomalies. From the annual evolution of these anomalies at the 14,500 points of the grid, two evolution patterns (the first two principal components) were obtained (Figure 4), which explain 63.9% and 16.8%, respectively, of the variance of the data (the rest of the components were not considered since they explain a very low percentage of the variance: the third component explains only 2.7% and the fourth 1.7%). The first component (Figure 4a) shows highly variable behaviour during the first 3 months of the year (austral summer). However, during the austral fall, a clear decrease of the anomalies takes place. Next, they strongly increase during the winter (July, August and September) and finally decrease during the spring with high fluctuations. Quantitatively, the percentage of variance explained by this first component is only 24.37% for the trimester January, February and March; however, the percentage increases up to 72.1% for the next 3 months and up to 86.7% for July, August and September. During the period from October to December, the variance explained reduces to 70.5%. Therefore, the pattern able to explain more variability is found for the months corresponding to the austral winter. In this case, the pattern is less irregular and the F I G U R E 5 Correlation maps corresponding to the first two components: (a) first principal component; (b) second principal component persistence is higher. The opposite is the case for January, February and March, where an important roughness can be found and thus little variability of the radiation at the grid points can be explained. In this case, the anti-persistence is high, resulting in a less evident pattern, unlike the winter months. The first pattern represents both the deterministic variability of solar radiation over the year (which is directly associated to changes in the declination angle) and also part of the non-deterministic variability related to cloudiness. Indeed, the cloudiness periods, closely related to the ITCZ pattern, are reflected by periods with high fluctuating anomalies of the first component (approximately from October to April). Furthermore, the cloudiness during this period is so intense that the shape associated with the deterministic component vanishes. In contrast, the period from May to September is generally dry, particularly during the austral winter. This behaviour is also observed in the evolution of the first component where the lower cloudiness is reflected by smaller fluctuations. In this period the shape of the first component follows the expected behaviour of the deterministic component of the solar radiation. Moreover, this behaviour is in agreement with the Köppen classification since most of the region belongs to dry winter climates (denoted by "w" in the Köppen notation).
Regarding the second component (Figure 4b), during the austral fall and winter, positive anomalies can be noticed and there is no trend or pattern associated with the deterministic variability of the solar radiation. Thus, the evolution of this component is clearly influenced by the ITCZ, which disappears from the region during April-September in such a way that the cloudiness is less during this period.
In addition, the correlation between these evolution patterns and the evolution corresponding to each grid point is shown in correlation maps associated with each principal component ( Figure 5). From these maps and principal components, the dominant climatological features in the region are evidenced. Indeed, the correlation map corresponding to the first component (Figure 5a) shows important similarities, qualitatively, to the climate map (Figure 1): the highest correlation appears, in general, in those zones with Cwa climate (most of the territory) and, to a lesser extent, in zones with BSh climate. For the second component, strips from southwest to northeast are seen in the correlation map, so in the northwest part the correlation with regard to the second component is positive and in the southeast this correlation is negative (which indicates that, during the austral fall and winter, more negative anomalies take place than in the rest of the year). As indicated, this component is clearly associated with the ITCZ, which is responsible for the behaviour of solar radiation during October-March. In fact, the higher the ITCZ influence is the lesser the presence of a deterministic component of solar radiation and in F I G U R E 6 Interquartile range (IQR) map (Wh/m 2 ) F I G U R E 7 Average silhouette width (ASW) versus number of clusters consequence the lower the seasonal variability. This observation explains the smaller values of the IQR in the north of the region, just where the ITCZ is more intense, as can be seen in Figure 6, which shows an IQR map. The IQR is an index of the degree of annual variability of the radiation at each point and thus, in those places with lower seasonal variability, lower IQR is expected.
Once variability patterns had been obtained and analysed, the second aim of the work was to analyse the spatial variability of the irradiance by using a clustering analysis which divides the territory, with regard to the temporal variability of the solar radiation, into coherent zones. Since the aspect of the correlation map corresponding to the second component (Figure 5b) is very similar to the IQR map, this range can be considered as a variability index to be used as the variable of interest in clustering analysis.
The silhouette method was applied in order to obtain the optimal number of clusters (Figure 7). According to this figure, the optimal number of clusters is 2, as the value of ASW is maximum for this number.
The regions obtained, considering two clusters, are shown in Figure 8. The regions are clearly defined by slightly sloping strips, which are explained by the annual oscillation of the ITCZ.
In any case, although the clusters obtained are in agreement with the influence of the ITCZ on the region, they were also checked by using radiation measurements obtained in stations distributed throughout the study area (Table 1). These stations are also included in Figure 8 in order to identify more clearly the cluster they belong to: Kolwezi, Mbeya, Lubumbashi and Misamfu in the northwest zone and the rest in the south and east of the studied region. The annual evolution of the Global Horizontal Irradiance (GHI) in different stations, grouped according to the zone they belong to, is shown in Figure 9. The stations of the south and east zones, less affected by the ITCZ, present their annual minimum of daily radiation around June. This minimum is not observed in the stations of the northwest zone, although they show a slight plateau around this month. These stations are those most affected by the belt of low pressures during November-April, so the radiation is even lower in this period than in the rest of the year. On the other hand, all the stations show a similar pattern during the second semester with a global maximum of daily radiation around September-October. This pattern is also evidenced by the first principal component in the same period (Figure 4a), which is also marked by the high percentage of variance explained by the first principal component. These observations indicate that the regionalization by two clusters is enough to explain the ITCZ effect on solar radiation over the whole region.

| CONCLUSIONS
This study was carried out in a central Africa region (Zambia and surrounding zones), where a pronounced seasonal variability of the radiation can be expected by the effect of the ITCZ.
The aim of this work was twofold. The first aim was to obtain temporal variability patterns for satellite-derived radiation estimations in the area of study (Zambia and surrounding zones) by using a principal component analysis (PCA), and from this analysis to establish the physical phenomena associated with these patterns. Two temporal variability patterns were obtained from the PCA: the first component is associated with the global climate characteristics of the region (including both the deterministic and nondeterministic components of solar radiation), and the second component is strictly associated with the influence of the ITCZ, responsible for the behaviour of solar radiation during October-March.
The second aim of the work was to analyse the spatial variability of the irradiance in the study area. For this aim, a clustering analysis based on the interquartile range (IQR) of the data was performed, so the territory was divided into coherent zones with regard to the temporal variability of the solar radiation. Indeed, the application of a clustering analysis based on the temporal evolution of the radiation estimations derived from satellites is very useful for evaluating the goodness of these estimations in terms of their spatial variability and seasonal behaviour. On the other hand, this analysis allowed for the identification of zones where the radiation variability had a common behaviour and thus one measurement could be representative of that variability in the whole zone. In short, this type of clustering might serve to establish a regionalization to improve radiation models or the location of measuring ground stations.
The correlation map corresponding to the second component turned out to be very similar to the spatial distribution of the IQR and thus this range could be used as a variable for the clustering analysis. The strips observed from the clustering analysis, in agreement with the IQR strips, are coherent with the different influence of the ITCZ along the study zone, which means that the satellite-derived estimations agree with this aspect of the climatology of the zone.
In addition, although the clusters obtained identify zones of similar behaviour regarding the temporal variability of the radiation, they were also checked by using ground measurements. The pattern of daily radiation measured in the ground stations is coherent with the identification of two clusters as F I G U R E 9 Annual evolution of the Global Horizontal Irradiance (GHI) measurements regions of different global variability. Indeed, the stations sited in the south and east, less affected by the ITCZ, present a clear diminution of the radiation around June. This diminution is not observed in the stations of the northwest zone, the most affected by the belt of low pressures during November-April.
Both PCA and the clustering technique have been shown to be useful in analysing the spatial variability of solar radiation data, which might serve to establish regionalization for model improvement or for measuring ground station distribution.