Extreme precipitation events in the Mediterranean: Spatiotemporal characteristics and connection to large-scale atmospheric flow patterns

The Mediterranean is strongly affected by Extreme Precipitation Events (EPEs), sometimes leading to negative impacts on society, economy, and the environment. Understanding such natural hazards and their drivers is essential to mitigate related risks. Here, EPEs over the Mediterranean between 1979 and 2019 are analyzed, using ERA5 dataset from ECMWF. EPEs are determined based on the 99th percentile of the daily distribution (P99). The different EPE characteristics are assessed, based on seasonality and spatiotemporal dependencies. To better understand the connection to large-scale atmospheric flow patterns, Empirical Orthogonal Function (EOF) analysis and subsequent K-means clustering are used to quantify the importance of weather regimes to EPE frequency. The analysis is performed for three different variables, depicting atmospheric variability in the lower and middle troposphere: Sea level pressure (SLP), temperature at 850 hPa (T850), and geopotential height at 500 hPa (Z500). Results show a clear spatial division in EPEs occurrence, with winter (autumn) being the season of highest EPEs frequency for the eastern (western) Mediterranean. There is a high degree of temporal dependencies with 20% of the EPEs (median value of all studied grid-cells), occurring up to 1 week after a preceding P99 event at the same location. Local orography is a key modulator of the spatiotemporal connections and substantially enhances the probability of co-occurrence of EPEs even for distant locations. The clustering clearly demonstrates the prevalence of distinct synoptic-scale atmospheric conditions during the occurrence of EPEs for different locations within the region. Results indicate that clustering based on a combination of SLP and Z500 can increase the conditional probability of EPEs by more than three (3) times (median value for all grid cells) from the nominal probability of 1% for the P99 EPEs.


Introduction
The Mediterranean lies at the crossroads of Africa, Asia, and Europe. It is a medium-scale coupled atmosphere-ocean system of unique character; the result of complex topography, orographic influences and interactions with large-scale water and land bodies around the domain. As the Mediterranean extends within the Hadley and Ferrel cells of the northern hemisphere, it experiences influence of both midlatitude and tropical climate variability .
One of the most frequent natural hazards that affects the domain is Extreme Precipitation Events (EPEs), which can lead to landslides and floods (pluvial, fluvial, flash). Such events have severe negative consequences in society, environment and economy (Jonkman, 2005). EPEs are identified as the meteorological hazard of highest negative impact for many of the Mediterranean regions, given their frequency and the high vulnerability of the densely populated areas (Llasat et al., 2010(Llasat et al., , 2013. Moreover, the Mediterranean is a region where ongoing climate change is expected to have high impacts. Therefore, it is defined as "hot spot" to this regard (Giorgi, 2006). Some of the projected changes include increasing frequency and magnitude of EPEs (Cardell et al., 2020;Frei et al., 1998;Gao et al., 2006;Toreti et al., 2013). In fact, recent studies have already confirmed such changes using observational data (e.g. Alexander et al., 2006;Kostopoulou and Jones, 2005;Papalexiou and Montanari, 2019;Vautard et al., 2015). These changes, together with high vulnerability of the domain and high economic assets in many coastal areas, are expected to lead to some of the most dramatic increases globally in average annual losses due to flooding by 2050 (Hallegatte et al., 2013). Thus, there are large efforts and ongoing research to better understand this natural hazard and to identify ways of improving EPE predictability as well as to increase resilience of affected regions and societies (e.g. Hydrological Mediterranean Experiment -HyMeX; Drobinski et al., 2014). Such advances are of key relevance to mitigate adverse impacts and related risks.
Previous research has identified the spatiotemporal characteristics of EPEs over the Mediterranean, with most events occurring during winter half years (e.g. Grazzini et al., 2020;Houssos and Bartzokas, 2006;Khodayar et al., 2018;Lolis and Türkeş, 2016;Merino et al., 2016;Pavan et al., 2019). These results agree with the accumulated patterns over the domain, as those months record the highest precipitation amounts (Mariotti et al., 2002). In respect to EPEs, there is a clear seasonal differentiation between the western and eastern parts of the domain; with most events occurring during autumn and winter, respectively; results that are consistent with both reanalysis and observation datasets (Cavicchia et al., 2018;Raveh-Rubin and Wernli, 2015).
Many of the EPEs over the Mediterranean are closely associated with synoptic-scale atmospheric flow patterns, such as cut-off lows, throughs, and warm/cold fronts (Merino et al., 2016;Toreti et al., 2010Toreti et al., , 2016, while others are also related to Mesoscale Convective Systems (Rigo et al., 2019). Some of the EPEs are moreover connected to cyclonic formations over the Mediterranean Sea (Lionello et al., 2006), also known as Medicanes (Cavicchia et al., 2014). The quantification of such links is crucial, as Numerical Weather Prediction (NWP) models are more skillful in predicting large-scale circulations rather than localized EPEs, especially for extended-range forecasts (Lavaysse et al., 2018;Lavers et al., 2017Lavers et al., , 2018Vitart, 2014). One way to statistically identify EPE connections with large-scale patterns is to analyse EPE composites for various atmospheric variables and at different vertical levels. Toreti et al. (2016) presented such connections by analysing the field of potential vorticity, while Merino et al. (2016) used a range of fields describing the dynamics of the atmosphere in the low-and mid-level troposphere, an approach also used by Toreti et al. (2010). extended regions covering for example the Euro-Atlantic region. Detailed reviews about such connections is presented in Alpert et al. (2006), Trigo et al. (2006) and Xoplaki et al. (2012). Vicente-Serrano et al. (2009) analysed relationships between occurrence and magnitude of EPEs over northeast Spain and North Atlantic Oscillation (NAO; Walker and Bliss, 1932), Western Mediterranean Oscillation (WeMO;Martin-Vide and Lopez-Bustins, 2006), and Mediterranean Oscillation (MO; Conte et al., 1989). They found that the most extreme daily precipitation during winter is expected for negative WeMO events. The connections are stronger when analysing the impact of such oscillations over aggregated data. NAO, for example, is found to be significantly associated with winter precipitation not only for the west Mediterranean, but also for its eastern part (Quadrelli et al., 2001;Türkeş and Erlat, 2005).
Given the existing research within the area, the aim of this work is two-fold. This study focuses a) on better understanding spatiotemporal variability of EPEs over the Mediterranean, using ERA5, the latest reanalysis data from ECMWF. Data extending up to the recent period allow to update the findings and to compare the results with similar studies. Moreover, physical consistency, fine spatiotemporal resolution, and completeness of the dataset (compared to rain-gauge observations that have frequently many missing and/or unreliable data, or inconsistencies among different countries/regions) allow to analyse EPEs at fine spatial scales. This work aims b) at quantifying the connections between EPEs and large-scale atmospheric flow patterns over the Mediterranean domain. The existing literature mainly investigates such connections by analysing EPE composites derived from in-situ measurements, leading to reduced spatial coverage and use of limited temporal information. Thus, by using the entire daily atmospheric variability and the ERA5 dataset, a holistic overview of such connections over the whole domain can be delivered. Finally, this study aims at using the derived information in future research on sub-seasonal predictability of the identified patterns.
The data and methods used for this study, are described in sections 2 and 3, respectively. Section 4 presents the results and discusses the findings, and finally, section 5 summarizes the main conclusions and points out limitations and suggestions for future pathways.

Data
Data from ERA5, the latest reanalysis dataset of ECMWF (Hersbach et al., 2020), are used for the years 1979-2019, which provide a complete record. The data are generated at hourly resolution in a horizontal grid of roughly 30 km x 30 km, using vertical 137 levels from the surface up to 80 km height to resolve the atmosphere. Four different variables of the dataset are used in this study: Total precipitation, sea level pressure (SLP), temperature at 850 hPa (T850), and geopotential height at 500 hPa (Z500).
Total precipitation is analyzed at a horizontal resolution of 0.25 o x 0.25 o , to derive statistical information of EPEs at fine spatial scales. This resolution is moreover the closest to the original one. The selected domain covers the area 29 o /47 o N and -8 o /38 o E, referred to as Mediterranean from now on. Precipitation data within ERA5 are calculated from short-term forecasts of hourly accumulations. To reduce the influence of possible spin-up errors of the forecast model outputs (Dee et al., 2011), daily precipitation is calculated based on the accumulation of the forecast steps 7-18 for the models initiated at 18:00 UTC of the previous day, and at 06:00 UTC of the target day. This method results in a spin-up time of 6 hours before incorporating any of the model outputs into the analysis.
In order to understand the connection of large-scale atmospheric patterns to the occurrence of EPEs, atmospheric variability in the lower and middle troposphere is analyzed, based on SLP, T850 and Z500. These variables have been selected, as previous work demonstrated their importance in defining synoptic environments and fronts that are connected to EPEs (e.g. Catto et al., 2014;Greco et al., 2020;Hidalgo-Muñoz et al., 2011;Xoplaki et al., 2004). The selected spatial resolution for these variables is 1 o x 1 o . Mean daily values were calculated by averaging all 24 available hourly data, and the spatial domain used for deriving the data is 26 o /50 o N and -11 o /41 o E, so that the analysis captures the influence of the adjacent areas, e.g., the Atlantic Ocean and the Alps. It should be stated that during the exploratory analysis, various domains of larger spatial extend were analyzed (e.g. Euro Atlantic domain, Mediterranean further extended to the Atlantic), with connections between EPEs and large-scale patterns being weaker. This corroborates previous studies, demonstrating that larger domains are not efficient to optimize relationship between circulation types and precipitation (Beck et al., 2016).

Methodology a. Extreme Precipitation Events (EPEs) and spatiotemporal analysis
EPEs are identified based on the 99 th percentile (P99) of the daily distribution, considering all available data for each grid cell. It should be stated that EPEs defined on seasonal data percentiles are not considered, as the aim of this work is to analyse EPEs that are closely associated with negative consequences to society. Moreover, because of significant differences in precipitation amounts and number of dry days between the various locations of the domain (especially north vs south and highlands vs lowlands), wet-days-derived EPEs and fixed-threshold-derived EPEs are not given preference. It is worth noting that during the conducted analysis percentiles of lower frequency (e.g. P95) were also tested, providing no substantial differences in the conclusions. Thus, the presented results are solely based on the P99 EPEs analysis.
The identified events are analyzed based on their seasonality, in order to quantify the importance of seasons in the occurrence of EPEs. As the data used cover the period January 1979 -December 2019, the available number of months is the same for all seasons (4 x 3 months x 41 years).
The degree of temporal dependence is a crucial factor, as multiple EPEs within short temporal intervals can significantly increase flood and landslide risks. In this work, four different temporal intervals are analyzed. The EPE percentage is calculated at each grid cell that occurred within the selected interval from the preceding EPE at the same grid cell. The intervals are 1, 3, 7, and 15 days, to understand the persistence of EPEs at short, medium and extended range temporal scales. Each of these scales yields different impacts and is associated with different meteorological and climatological drivers. Other intervals have been tested and would not lead to different conclusions.

b.
Large-scale patterns EOF analysis on the covariance matrix (Wilks, 2011) is performed on the derived daily anomalies, independently for each of the selected variables. These anomalies are from now on simply referred to by the variable name. The square root of the cosine of latitude is used for weighting the data and giving equal-area weighting on the covariance matrix. The climatology of these variables is calculated with a 5-day smoothing window. To derive the anomalies, each day of the 1979-2019 period was subtracted from its corresponding daily climatology. The necessary number of modes (principal components) that explain at least 90% of the total variance, was retained to obtain a compressed dataset that provides most of the available information.
Non-hierarchical K-means clustering (Hartigan and Wong, 1979) is implemented on the data projections on the retained principal components, and the Euclidean distance is used as the similarity measure. The analysis is conducted with the Python package scikit-learn (Pedregosa et al., 2011).
Clustering is performed independently for each variable, as well as for all different combinations of the three variables used. For the latter, the projections are pre-processed to show comparable units, and weights relevant to the explained variance. More specifically, standardization was implemented so that all projections have a standard deviation of 1. In the next step, each projection was multiplied with the square root of the percentage of the total variance of the corresponding variable that it explains, so that the projections are weighted based on their relative importance. The 95% uncertainty intervals are computed according to Lee et al. (2019), using cluster persistence and the effective sample size (Wilks, 2011) based on self-transition probability.
The number of clusters can be constrained and selected based on (semi)objective criteria, but there is also a level of subjectivity introduced (Gong and Richman, 1995). Moreover, the particular use of the derived clusters is crucial for the final selection. Given that this work aims at conditioning EPEs for further use of the results on subseasonal forecasting, a large number of clusters is not recommended, since only a broad indication of the expected synoptic-scale pattern is possible for such timescales (Neal et al., 2016). Therefore, K-means clustering is performed by generating 7 up to 12 clusters. Smaller number of clusters are not as useful, since connections to EPEs (explained below) are weaker.

c. Connection of large-scale patterns to EPEs
The importance of each cluster in EPE occurrence was assessed, based on the conditional probability of EPEs for each cluster. In this context, we refer to the largest conditional probability over all clusters (for each combination of used variables and selected number of clusters) as the Maximum Conditional Probability (MCP). The conditional probability is highly beneficial for this study since it considers any potential differences in the occurrence probability of the identified regimes. Moreover, it can be directly compared with the nominal probability of the selected EPEs (meaning 1%). Furthermore, to obtain a more holistic understanding of the classification benefits for the different variables and the number of clusters, additional indicators were analyzed, namely the EPE percentage allocated to each cluster (e.g. as in Yiou and Nogaj, 2004), and the percentage of grid cells that exhibit statistically significant connections with at least one cluster. Statistically significant connections between EPEs and clusters are assessed with the two-tailed 95% confidence interval of binomial distribution (e.g. Olmo et al., 2020). Although the interest of this work lies on positive relationships between EPEs and clusters, two-tailed test is preferred over one-tailed test, as the latter has higher type 1 errors. The probability of occurrence that is introduced for the significance testing is the upper 95 th confidence interval, so that strict criteria are used due to the inherent uncertainties associated with clustering.

Results and Discussion
a.

Spatiotemporal characteristics
As expected, results indicate the strong influence of orography ( Figure 1a) on precipitation intensity (e.g. Atlas Mountains, Alps, coast of west Balkans), as well as the importance of latitude, with locations closer to/on the sinking air masses of the Hadley cell (north Africa) having significantly smaller thresholds than the locations at northern latitudes. Figure 1b presents the derived P99 EPEs thresholds within the domain. The spatial EPE pattern, as well as their magnitudes are very similar to the study of Cavicchia et al. (2018) who used the E-OBS dataset (gridded dataset from observational data) with same spatial resolution, and identified wet-days 99 th percentile intensities. The greater intensity differences in the southern Mediterranean compared to the rest of the domain can be attributed to the large number of dry days in that region. Since only the wet days were used in the work of Cavicchia et al. (2018), it is expected that the derived percentile magnitudes over those areas will be larger compared to the current study that uses all daily values. The EPE seasonality demonstrates a west/east divide, with most EPEs occurring during winter (autumn) for the eastern (western) parts of the domain (Figure 2, Figure S1Error! Reference source not found.). These findings agree with results from other studies using different datasets (Cavicchia et al., 2018;Raveh-Rubin and Wernli, 2015), and indicate that different synoptic-scale configurations are likely to generate EPEs at the different regions of the domain (Raveh-Rubin and Wernli, 2015).
More than 70% of the EPEs in parts of southeast Mediterranean occur during winter, whereas for parts of west Mediterranean, Italy and west Balkans, over 60% of events occur during autumn. These two seasons interchange between 1 st and 2 nd place in terms of EPE occurrence for most of the domain. The north Balkans and southeast Europe exhibit a different pattern, with summer and spring being the two seasons of the highest EPE occurrence for most of the area. This indicates that the Mediterranean Sea has less influence in those areas, where EPEs are associated with different largescale patterns. This can also be explained by orography, with the mountain ranges over the south Balkans minimize direct interactions between the Mediterranean Sea and north Balkans/southeast Europe. Spring and summer are important seasons for mountainous locations within the domain (e.g. Alps, Pyrenees). The above can be attributed to EPEs of convective nature that occur in such areas (Romero et al., 2001). These events, resulting from thermal low pressure systems over the region (Campins et al., 2000), are further enhanced by orography. Because of the generally small spatial scale of such convective events, this was not explored in the study of Raveh-Rubin and Wernli (2015), who used ERA-Interim with 1 o spatial resolution and additionally implemented spatiotemporal smoothing. This shows that recent advances in the resolution of reanalysis data, due to increased computational efficiency and computation power, can bring additional benefits and more realistic information even for finer scales. Finally, spring is also important for north Africa and is the season of highest/2 nd highest occurrence of EPEs for most of this area. Temporal EPE dependence is strong even for the 1-day interval, especially so for the dry parts of the domain in northeast Africa and adjacent parts of the Mediterranean Sea ( Figure 3). Orography enhances such dependencies, with locations in Atlas Mountains, Alps, and west Balkans coast, differentiating from their neighboring locations in many of the presented results. This can be attributed to orographic lifting and forced convection that occurs on the windward side of the mountains. These processes can trigger EPEs even when large-scale systems are in distant areas, as long as moisture advection is directed towards the mountains (Pfahl, 2014). Thus, for 1-and 3-day intervals, which are associated with eastward propagation of synoptic-scale weather, these regions strongly differ from surrounding locations. The event percentage median values that occur within 1 day of a preceding P99 EPE is about 11%. These values increase to about 21% for 7 days interval, and almost 30% for 15 days interval. Such strong dependencies indicate persistent meteorological (e.g. troughs, cut-off lows, storm tracks, cyclones) and climatological conditions (e.g. weather regimes; Vautard, 1990, Barnes andHartmann, 2010).

b. Connection to large-scale patterns
Before presenting EPE connections to large-scale patterns, it is worth commenting on the EOFs for Z500, with the first 4 of them explaining more than 77% of total variance ( Figure S2). The first component explains one third of the total variance. It shows that east and west Mediterranean have opposite behaviour, with the core of the EOF being located over France. This east/west divide that is additionally depicted in the EPEs analysis, is also noticeable in the 2 nd EOF, while the 3 rd EOF exhibits an omega pattern with the central parts of the domain having an opposite behaviour from the east and west subdomains. These 4 EOFs are associated with the Mediterranean Oscillation (MO; difference in SLP anomalies between Gibraltar and Lod, Israel) and WeMO (difference in SLP anomalies between San Fernando, Spain and Padova, Italy). It should be stated that EOF analysis based on winter half and summer half year, had no substantial differences in the patterns and the percentage of expected variance, while the EOF based on SLP has similar patterns, but is influenced by the orography (not shown). Figure 4 presents the results on the connection of P99 EPEs to the identified weather regimes (clusters) for all studied variables and number of clusters. Figure 4a presents the Maximum Conditional Probability (MCP) and Figure 4b refers to the EPE percentages that occur in the cluster of MCP for each grid cell. Both plots show the median value from all grid cells with statistically significant connections to the generated weather regimes. As can be noticed from Figure 4a, the conditional probability is increased for all variables by increasing the number of clusters. Clustering based on the combined information from SLP and Z500 outperforms all other clustering results. The other variables perform similarly, except for clustering based on T850, which is substantially worse. This is expected, as many EPEs over the Mediterranean are connected to troughs and cut-off lows, which mainly have a strong signal in geopotential height and surface pressure. Such patterns also show a signal in the temperature fields due to the frequent generation of cold and warm fronts. Yet, as these formations are of smaller spatial extent, clustering based solely on temperature fields for such a large domain is less effective compared to clustering based on the other variables.
The EPE percentage associated with the cluster of MCP decreases with a higher number of clusters. Besides SLP and T850, which have a weak connection, all other variables perform relatively similarly. This tradeoff between conditional probability and percentage of EPEs is expected, as by increasing the number of clusters, the connection between EPEs and some of the derived clusters becomes stronger. Yet, at the same time, as clusters correspond to smaller number of days, the associated EPE percentages decrease. It can be noticed that for SLP-Z500 combination the MCP saturates from 9 to 12 clusters. At the same time, 9 clusters perform very similarly to 8 in terms of EPE percentages associated with the MCP cluster. Thus, the combination of SLP and Z500 for 9-clusters K-means clustering is selected as the preferred classifier to connect EPEs to large-scale patterns. This selection is further justified by results shown in Figure S3, presenting two additional indicators for the clusters and variables studied. These are the percentage of grid cells that have statistically significant connections with at least 1 cluster (subplot a), and the percentage of EPEs (median value from all grid cells with statistically significant connections) that are significantly connected with any of the clusters (subplot b). For both indicators, the order of magnitude of all variables is generally the same and results do not change substantially by changing the number of clusters. This means that the final selection of the preferred combination is not affected by these two indicators. It is worth noting that at least half of the total P99 EPEs are significantly associated with preferential weather regimes for most of the grid cells and for all number of clusters and selected variables (expect SLP and T850). Finally, for the selected combination of SLP-Z500 and 9 clusters, the generated composites (cluster centroids, defined by averaging all data corresponding to each cluster) do not change substantially in spatial pattern and magnitude when more clusters are considered (not shown). This indicates that the clusters demonstrate coherent similarities between individual samples. Detailed results about this classifier are presented below.

Figure 4 a) Maximum Conditional Probability (MCP) of P99 EPEs for the studied variables and number of clusters. b) Percentage of P99 EPEs associated with the cluster corresponding to MCP. For both plots, values represent the median value of all grid cells that have statistically significant connection to the generated weather regimes.
Figure 5 presents the composites of the 9-classes K-means clustering based on the combined SLP (colour shading) and Z500 (contours) fields. The derived composites have a naming convention based on the location of negative anomalies (if existing), so that the connections between clusters and EPEs are more intuitive. The composites exhibit noticeable differences in type, magnitude, or location of the large-scale patterns. The importance of the Atlantic, and the storms generated over that area is very clear. It can be noticed that clusters 1 (Atlantic Low), 2 (Biscay Low), and 3 (Iberian Low) are associated with negative anomalies that relate to unstable conditions centred over/near the Atlantic. Cluster 4 (Sicilian Low) has a negative anomaly centred over the central Mediterranean, although of low magnitude, and for cluster 5 (Balkan Low) the negative anomalies are centred over (west) Balkans. Cluster 6 (Black Sea Low) has a dipole structure with positive anomalies over west Mediterranean and negative over east, while cluster 7 (Mediterranean High) corresponds to positive anomalies and stable conditions over the whole domain. Finally, clusters 8 (Minor Low) and 9 (Minor High) correspond to negative and positive anomalies of low magnitude over most of the domain and are associated with days that do not indicate distinct cyclonic or anticyclonic conditions of synoptic scale over the area. They can be therefore considered as the "no-anomaly" clusters in the studied domain.
Seasonal and annual variability in the occurrence of most clusters is very high ( Figure S4). Atlantic Low and Sicilian Low have median occurrences of similar magnitude throughout the different seasons, while Minor Low, Minor High, and Mediterranean High show very high seasonal variability. It should be noted that Mediterranean High is very frequent during winter. This is crucial for the occurrence of cold spells (not studied in this work), which can be generated when such conditions persists for many days (Ferranti et al., 2018;Grams et al., 2017). Biscay Low, Iberian Low, Balkan Low, Black Sea Low and Mediterranean High, with negative/positive anomalies over large parts of the domain, barely occur in summer. Atmospheric characteristics associated with these clusters are mainly driven by storm tracks and unstable conditions that prevail during winter and the intermediate seasons of autumn and spring. The high annual variability for all clusters indicates that their occurrence is modulated by climatic variability and larger-scale phenomena, e.g. NAO (e.g. Dünkeloh and Jacobeit, 2003;Xoplaki et al., 2012). Statistically significant clusters of highest (MCP; a) and 2 nd highest (b) conditional probability of P99 EPEs for each grid cell are presented in Figure 6. The top panels present the associated cluster. The legend indicates the percentage of grid cells allocated to each cluster. Subplots c and d present the corresponding conditional probabilities, and subplots e and f the percentage of total EPEs for each grid cell that occur during the associated cluster.
The clusters corresponding to the highest conditional probability for each grid cell show very smooth spatial behaviour, with neighbouring regions allocated mainly to the same cluster. Few discontinuities (e.g. west Italy and west Balkans vs East Italy, west Greece and west Turkey vs east Greece and Aegean) are mainly associated with orography (explained below). Only 0.38% of the total grid cells (located at the edge of the domain in the Middle East) cannot be significantly related with any of the derived weather regimes, while this percentage increases to 22.96% for the second "best" cluster. Results for the 2 nd highest conditional probability are patchier, yet closely relate to the results of subplot a. Less than 25% of the total grid cells are significantly associated with more than 2 clusters (not shown), with conditional probabilities for each weather regime at the statistically significant grid cells being presented in Figure S5.
Cluster composites can physically explain connections with EPEs. Negative anomalies of pressure and geopotential height centred over the Atlantic (Atlantic Low) correspond to high chance of EPEs over the west Iberian Peninsula and southern France. As these anomalies move further east (Biscay Low and Iberian Low), the most impacted locations are the west and west-central Mediterranean. Iberian Low has significant connections over a large area. It is the cluster of the highest/2 nd highest conditional probabilities for regions spanning from the Atlantic (west Mediterranean) to the west Balkans (east Mediterranean), and for both high and low latitudes. Moreover, the significant importance of orography is demonstrated in this cluster. As cyclonic flow and existing moisture approaches the mountains of Picos de Europa, Atlas, Alps, Apennines, and west Balkans, the windward locations are preferentially affected by EPEs, and the conditional probability for EPEs over such regions is more than 7 times higher from the nominal probability of 1%. Such results can be further explained by the importance of the moisture advection towards the mountains in the generation of extreme precipitation (Pfahl, 2014). Such orographic influence is also demonstrated in the association of Sicilian Low with EPEs. In this cluster, the central Mediterranean, and locations east of the Apennines and east of the mountain ranges of the west Balkans (Dinaric Alps, Pindos) are affected. Balkan Low and Black Sea Low are highly associated with EPEs over the Balkans and Turkey, respectively, while Mediterranean High, with the anticyclonic flow centred over the domain, generates EPEs over eastern north Africa and the Middle East. Minor Low and Minor High show minor association with EPEs over the domain.
The conditional probabilities associated with the MCP cluster are almost 50% higher compared to the 2 nd cluster in general, with the latter still showing twice the probabilities compared to the nominal of 1% for most grid cells. Moreover, it can be noticed from subplots e and f that for most grid cells 30% (20%) of the total EPEs occur during the cluster of MCP (2 nd MCP). This indicates that such classification does not only significantly increase the conditional probabilities but can also explain a large amount of the total EPEs, even when using only 2 of the 9 clusters.
Coming back to the results of clusters of highest conditional probability (Figure 6a), it is worth noting the connection between western Italy and the west Balkan coast. Not only do these regions correspond to the same cluster, but there is also a high degree of temporal overlap (Figure 7). More than 30% of EPEs over parts of west Italy occur on the same day as EPEs over the coast of Croatia and Montenegro (and nearby areas). This is the effect of the Apennines that block the westerly airflow and force the moisture to precipitate in west Italy. Thus, a strong connection in the co-occurrence of EPEs between west Italy and west Balkans is established, whereas east Italy and the Adriatic Sea have a clear dissociation. It can be concluded that orography is a key modulator of the spatiotemporal connections between locations.

Conclusions
Spatiotemporal characteristics of daily EPEs (being defined based on the 99 th percentile, P99) over the Mediterranean are analyzed, using the ERA5 dataset. Their connection to large-scale atmospheric flow patterns is quantified by using EOF analysis and subsequent K-means clustering of the atmospheric variability in the lower and middle troposphere (SLP, T850, and Z500 anomalies).
The results indicate that the Mediterranean region is divided into two domains in terms of EPE seasonality, with autumn being the prevailing season for the western domain, and winter for the eastern one. Spatiotemporal connections between EPEs are very strong and are furthermore modulated by orography and regional climate. For most of the analyzed grid cells, at least 20% of their EPEs occur up to 7 days after a preceding EPE at the same grid cell. This indicates persistent meteorological conditions. Such connections are even stronger at high-altitude locations (e.g., Alps and Atlas Mountains), as well as regions characterized by their dry climatic conditions (north Africa and Middle East). West Italy and the west Balkans demonstrate a remarkable temporal EPE connection with more than 20% of EPEs occurring on the same day; an effect of the Apennine mountains.
Clustering, based on selected atmospheric variables, demonstrates that a combination of SLP and Z500 anomalies leads to the highest association of EPEs with the derived clusters. 9-class K-means clustering is the preferred classifier. Clusters correspond to negative or positive anomalies over west, central, or east Mediterranean, and can be associated with cyclonic and anticyclonic conditions on the synoptic scale. These clusters show a clear connection with the observed EPEs over the vast majority of the studied domain; a relationship that can be explained by atmospheric dynamics associated with each cluster. The conditional EPE probability increases by more than 3 times for most locations, compared to the nominal probability of 1% for the studied EPEs. Additionally, more than 30% of the P99 EPEs preferentially occur during the cluster of highest conditional probability for most grid cells. Orography further enhances these relationships. Locations in windward parts of the Apennine mountains, Picos de Europa, and Atlas Mountains, show MCP of P99 EPEs over 7%, and more than 40% of the EPEs occur during the associated cluster.
Information from this study delivers additional benefits when analysing EPE predictability at extended-range forecasts. It has been demonstrated that many EPEs can be associated to large-scale atmospheric flow patterns. This is important, since NWP models are more skillful in predicting such large-scale patterns over extended-range forecasts. Optimal selection of weather clustering does also depend on NWP model performance for the selected lead time. Various combinations of domain extent, number of clusters, and atmospheric variables should be further analyzed to maximize the usefulness of the currently-used NWP models, when assessing the performance of extended-range forecasts. This will be studied in future works. This work uses ERA5 reanalysis data to delineate EPEs. ERA5 is produced using a high-resolution model that incorporates state of the art physics and assimilates multiple observations to calculate the various atmospheric variables. Despite challenges of reanalysis data (e.g. limitations mentioned at Hersbach et al., 2020), this provides the opportunity to use a physically-consistent dataset for the analysis performed. As precipitation is a complex phenomenon, highly varying across spatiotemporal scales, it would be useful to perform this analysis using observational data and assess whether finer spatiotemporal resolutions have similar characteristics. The percentages in the parenthesis, refer to the percentage of grid cells that have statistically significant connections with each weather regime. The boxplots are as in Figure 1, but only considering the statistically significant grid cells.