The regional MiKlip decadal prediction system for Europe: Hindcast skill for extremes and user‐oriented variables

Regional climate predictions for the next decade are gaining importance, as this period falls within the planning horizon of politics, economy, and society. The potential predictability of climate indices or extremes at the regional scale is of particular interest. The German MiKlip project (“mid‐term climate forecast”) developed the first regional decadal prediction system for Europe at 0.44° resolution, based on the regional model COSMO‐CLM using global MPI‐ESM simulations as boundary conditions. We analyse the skill of this regional system focussing on extremes and user‐oriented variables. The considered quantities are related to temperature extremes, heavy precipitation, wind impacts, and the agronomy sector. Variables related to temperature (e.g., frost days, heat wave days) show high predictive skill (anomaly correlation up to 0.9) with very little dependence on lead‐time, and the skill patterns are spatially robust. The skill patterns for precipitation‐related variables (e.g., heavy precipitation days) and wind‐based indices (like storm days) are less skilful and more heterogeneous, particularly for the latter. Quantities related to the agronomy sector (e.g., growing degree days) show high predictive skill, comparable to temperature. Overall, we provide evidence that decadal predictive skill can be generally found at the regional scale also for extremes and user‐oriented variables, demonstrating how the utility of decadal predictions can be substantially enhanced. This is a very promising first step towards impact‐related modelling at the regional scale and the development of individual user‐oriented products for stakeholders.


| INTRODUCTION
Short-term climate predictions for the next 1-10 years are gaining importance in the climate science community (e.g., Meehl et al., 2009). These so-called decadal predictions can also be of high value for impact modellers and decision makers in politics, economy and society (Vera et al., 2010;Meehl et al., 2014). Within the Coupled Model Intercomparison Project Phase 5 (CMIP5; Taylor et al., 2012), a set of globally coordinated climate model experiments has been consolidated, comprising simulations for the recent past, decadal simulations and climate change projections. Based on this experience, decadal predictions also play an important role in the Decadal Climate Prediction Project (DCPP) as part of CMIP6 (Boer et al., 2016;Eyring et al., 2016).
Since 2007, the number of studies assessing the predictive skill of decadal systems has strongly increased (cf. Meehl et al., 2014). While most studies focus on the global scale (e.g., Kim et al., 2012;Müller et al., 2012Müller et al., , 2014Doblas-Reyes et al., 2013;Bellucci et al., 2015;Kadow et al., 2016), some studies also investigate indices at the regional scale (e.g., Kruschke et al., 2016;Moemken et al., 2016). Overall, the global warming trend enables a certain predictability by itself, especially for temperature-related climate indicators. On decadal timescales, this climate trend is of the same order of magnitude as the climate variability of global mean temperatures (Flato et al., 2013;Hartmann et al., 2013;Chen and Tung, 2018). On the regional scale, the variability might be even stronger and therefore could provide additional predictability. Specifically, the North Atlantic has been identified as a hot spot of decadal predictability (Sutton and Hodson, 2005;Latif and Keenlyside, 2011;Müller et al., 2012;Kadow et al., 2017). This predictability arises from the long-term variability pattern of the Atlantic Meridional Overturning Circulation (AMOC; Matei et al., 2012) and the Atlantic Multidecadal Variability (AMV; Zhang et al., 2019). Most global decadal prediction systems provide a high hindcast skill for the AMV Index as well as North Atlantic sea surface-and 2 m-temperatures up to 9 years ahead (Doblas-Reyes et al., 2013;Bellucci et al., 2015). A strong potential for skilful decadal predictions has also been identified for Europe Reyers et al., 2019). For example, Ghosh et al. (2017) provided evidence that an increased heat flux due to higher sea surface temperatures in the North Atlantic is triggered by the positive phase of the AMV. This mechanism induces a wave-like response in the sea-level pressure field and blocking-like situations downstream over Europe, thus, leading to changes in temperature and precipitation. These mechanisms are regionally and seasonally dependent, leading to different levels of predictive skill for different climate indicators for Europe and inducing a lead-time dependency of skill.
The German research consortium MiKlip ("Mittelfristige Klimaprognosen"; www.fona-miklip.de; Marotzke et al., 2016) developed a global decadal prediction system based on the Max-Planck-Institute Earth System Model (MPI-ESM) and produced several hindcast ensemble generations. In addition, MiKlip provides the first systematic efforts to establish a regional component of the decadal prediction ensemble for Europe, which culminated in the dynamical downscaling of a full global hindcast ensemble . Mieruch et al. (2014) and Reyers et al. (2019) previously analysed sub-samples of this regionalised hindcast set. All three studies show added value by the downscaling approach for basic variables like surface temperature for large parts of Europe. Their results also indicate a reduction of bias and an increase of reliability in the regional system compared to the global hindcasts (cf. Feldmann et al., 2019). Other efforts used statistical-dynamical downscaling for selected quantities like wind energy potentials (Moemken et al., 2016) and wind gusts (Haas et al., 2016). Both studies indicate predictive skill for Europe, especially for the first years (1-4) after initialisation, related to variations in the frequency of westerly weather patterns.
Mean temperature and mean precipitation are the most commonly analysed variables in decadal hindcast studies (e.g., Doblas-Reyes et al., 2013;Bellucci et al., 2015). The MiKlip decadal prediction website (www.fona-miklip.de/decadal-forecast/decadal-forecast-for-2019-2028/) offers operational global decadal predictions of mean temperature. However, such general products are of limited value for potential users outside the science community, like stakeholders in civil services, governments or economy (Lemos et al., 2012;Hackenbruch et al., 2017;Schipper et al., 2019). Recent user-oriented projects (e.g., the EUPORIAS project for seasonal predictions; Buontempo et al., 2018) suggest that user needs are rather specific across different sectors, and that individual prediction products are highly needed to enable applicability in the "real world." Building on these conclusions, some efforts within MiKlip focussed on the needs of private companies and public authorities. While developing individual products for different sectors remains the aim, specific climate indices or extremes related to the basic climate variables temperature, precipitation, and wind at regional resolution are needed as an intermediate step (cf. Copernicus Climate Change Service [C3S]; Buontempo et al., 2019).
In this study, we analyse the predictive skill of the regional MiKlip system focusing on extremes and climate indices related to temperature, precipitation, wind or connected to the agronomy sector. Thus, we aim to identify potential fields of application for regional decadal predictions. Thereby, our work extends previous regional studies like Feldmann et al. (2019) and Reyers et al. (2019). Methods and datasets are described in Section 2. Section 3 focusses on the results, while a summary and discussion concludes this paper in Section 4.
The global hindcasts are dynamically downscaled to the EURO-CORDEX domain (Giorgi et al., 2009; see Figure S1) using the regional climate model COSMO-CLM (CCLM; Rockel et al., 2008) at a spatial resolution of 0.44 . The downscaling is applied to all baseline1 ensemble members for all starting dates in 1960-2017, resulting in the identical number of 5,800 simulation years (58 starting dates × 10 years × 10 ensemble members) as for the global ensemble. A thorough evaluation of the regional decadal prediction system focussing on the basic variables temperature, precipitation, and wind speed can be found in Reyers et al. (2019) and Feldmann et al. (2019).
For the evaluation of temperature and precipitation related indices, we use the observational dataset E-OBS (V14; Haylock et al., 2008) with daily temporal resolution at a regular 0.5 grid. For wind-related variables, no gridded observational dataset is available for Europe. Thus, we use a CCLM simulation with ERA40 and ERA-Interim boundary conditions as reference. For this simulation, CCLM is applied in the same setup as for the downscaling of the decadal prediction system (cf. Feldmann et al., 2019).
To estimate the added value of initialisation, an ensemble of seven regional uninitialized simulations (historicals) is used as reference. This ensemble was implemented by applying the same dynamical downscaling approach to a set of global historical runs. The global uninitialized ensemble uses the same MPI-ESM model version and setup as the global hindcasts. The runs are started from a pre-industrial control simulation and consider, among others, aerosol and greenhouse gas concentrations for the period 1850-2005 (e.g., Müller et al., 2012). Note that only the first seven (out of 10) hindcast ensemble members are used to ensure a fair comparison to the available uninitialized simulations.
Based on the collection of potential user needs within MiKlip, a promising first step towards the development of individual user products is the analysis of different extremes and user-oriented variables. Most variables are defined in the "Expert Team of Climate Change Detection Indices" (ETCCDI; Van Engelen et al., 2008;Zhang et al., 2011). The complete set of analysed variables/indices is given in Table 1. We focus here only on a subset, as some indices are closely linked to each other and/or already discussed in other studies (e.g., Moemken et al., 2016;Feldmann et al., 2019).

| Data processing and skill metrics
All datasets used in this study are processed in the same way to enable comparability ( Figure 1). First, yearly time series for all indices/variables are derived for hindcasts, historicals, and observations. In the next step, the raw hindcast time series are recalibrated against observations using the Decadal Climate Forecast Recalibration Strategy (DeFoReSt) developed within MiKlip by Pasternack et al. (2018). DeFoReSt accounts for three features characteristic for decadal datasets: lead-and start-yeardependent unconditional and conditional bias, as well as ensemble dispersion. With this aim, DeFoReSt combines the parametric drift correction by Kruschke et al. (2016; lead-time dependency) and the non-stationary model drift correction of Kharin et al. (2012; start-year dependency). The method uses first-order polynomials (linear trend) to capture start-year-dependent errors as well as third-order polynomials for lead-year dependency. In addition, Pasternack et al. (2018) implemented a parametric adjustment of the conditional bias and the ensemble spread by using third-and second-order polynomials, respectively. A cross-validation is included in DeFoReSt. More details can be found in Pasternack et al. (2018). Please note that the approach is univariate. Thus, all indices are calculated prior to recalibration.
The performance of the regional system is evaluated following the verification methods by Goddard et al. (2013) and Feldmann et al. (2019). Skill scores are calculated for all 4-year periods from lead-time years 1-4 (LT1-4) to LT7-10, to account for the lead-time dependence of the decadal hindcast skill following the DCPP protocol. All lead-times cover the identical analysis period 1967-2016. This is achieved by shifting the respective starting dates for the calculation, that is, from decades 1966-2012 for LT1-4 to decades 1960-2006 for LT7-10 (see also Paxian et al., 2019;their Figure 2). We apply two types of skill scores: the Anomaly Correlation Coefficient (ACC) to estimate the overall skill, and the Ranked Probability Skill Score (RPSS; Wilks, 2011;Ferro, 2014) as measure for the reliability. The ACC is a measure for the relationship between the ensemble mean of the hindcasts/historicals and observations, ranging from −1 (perfect anti-correlation) to 1 (perfect correlation). A positive RPSS (perfect score 1) indicates that the hindcasts have a higher probability to predict an observed category than a reference dataset, and vice versa for negative RPSS. Both skill metrics use T A B L E 1 List of indices, which show decadal predictive skill. Indices marked with asterisks are discussed in detail in this study  (Goddard et al., 2013) is determined using a block-bootstrapping (here 500 times) with a random re-sampling of the time series with replacement.

| Temperature-related variables
The mean near-surface temperature shows a high predictive skill in decadal hindcasts, both on the global and the regional scale (cf. Doblas-Reyes et al., Meehl et al., 2014;Bellucci et al., 2015;Feldmann et al., 2019). The main reason for this high skill is the contribution given by the climate trend, which is more pronounced for temperature than for wind or precipitation. In this section, we analyse how far the skill for mean temperature extends to other temperature-related variables, with examples for warm and cold extremes. Feldmann et al. (2019) could show that the predictive skill of temperature in the regional MiKlip system is higher for the warm season than for the cold season (their Figures 9  and 10). Therefore, we chose the daily maximum summer temperature (TASMAX; Table 1) as a first example. TASMAX is an indicator for summerly heat conditions relevant for, for example, heat stress (Honda et al., 2007). Figures 2a, b depict spatial ACC and RPSS plots for TASMAX for LT2-5, using the climatology as reference. For most of Europe, the correlation is high and significant, reaching values close to one for the coastal Mediterranean regions. However, correlations drop below 0.5 for Scandinavia, the British Isles and Greece. The RPSS exhibits similar spatial patterns compared to the ACC, but with lower values. Nevertheless, RPSS is positive everywhere except for Greece. In general, the skill scores and the fraction of significant grid points exhibit nearly no lead-time dependence ( Figure S2). While the skill of TASMAX is already high and significant in the raw data, the recalibration leads to a further improvement ( Figure S3). Skill scores are much lower and less significant when using the regional historicals as reference ( Figure S4-S8). While no added value of initialisation is detected for the British Isles and Greece, the hindcast initialisation improves the predictive skill for the rest of Europe (particularly for RPSS with values up to 0.6).
The results for the so-called summer days (SU, maximum temperature above 25 C; Table 1)

Downscaling
(CCLM) where the temperatures seldom exceed the fixed threshold (e.g., Scandinavia) and grid points are often marked by missing values after recalibration. In these areas, percentile-based thresholds and indices may be more appropriate. Nevertheless, the recalibration is able to increase the overall predictive skill of SU ( Figure S3), mainly by adjusting the hindcast trend to the observed one ( Figures S9-S10). A more complex phenomenon related to warm temperature extremes is heat waves. So far, we analysed the predictive skill for the number of heat wave days per year (HWDS; Table 1). The skill pattern of ACC (Figure 2e) shows a North-South-gradient, similar to the spatial patterns of SU: Correlations are positive, significant for Southern, Central and Eastern Europe, and negative for Scandinavia and the British Isles. In general, skill scores for HWDS are lower compared to SU or TASMAX.

Regional data Indices
Finally, frost days (FD, minimum temperature below 0 C; Figures 2g, h) were also analysed as an example related to cold temperature extremes, revealing skill for large parts of Europe. Compared to warm temperature extremes, skill scores are lower (ACC of up to 0.7) for Southern, Central, and Eastern Europe, but higher (and positive) for Northern Europe. While the ACC is significant for large parts of Europe, the RPSS shows significant skill only for Scandinavia and parts of Italy and Spain.
The results hint at the potential of the regional decadal system to predict not only mean temperatures but also the likelihood of both cold and warm temperature extremes, especially when considering the climatology as reference. Skill scores using the historical ensemble as reference are generally lower and less significant ( Figures S4-S8). Nevertheless, large parts of Europe exhibit an added value of initialisation, particularly for warm temperature extremes. For all indices, the skill scores vary only slightly with the choice of lead-time (- Figure S2), leading to the conclusion that the skill is mostly attributed to the forcing and not the initialisation. Results for tropical nights (TR) and ice days (ID) are comparable to those of SU and FD, but with an increased fraction of missing values due to the higher thresholds (not shown).

| Precipitation-related variables
Regarding precipitation-related variables, expectations are generally lower than for temperature-derived parameters (cp. Mieruch et al., 2014or Reyers et al., 2019. In this section, we focus on heavy precipitation events, as extremes are more relevant for stakeholders than mean values. Figure 3a shows the ACC for heavy precipitation days (R10mm; Table 1) for LT2-5. The skill pattern is more heterogeneous compared to temperature-related variables. Correlations are positive (up to 0.8) and significant for Scandinavia and parts of Central and Southern Europe. Again, spatial patterns of the RPSS look similar, but with slightly lower values (Figure 3b). Similar results can be found for a percentilebased precipitation index aiming at the identification of Black dots indicate significant skill at the 95% level precipitation events potentially causing floods in major European river catchments (RM7P95; Table 1). Both the ACC and the RPSS (Figures 3c, d) are comparable to those of R10mm.
The regional decadal prediction system is less skilful for variables related to precipitation than for temperaturebased indices. However, results are promising for several European regions (e.g., Scandinavia or the Mediterranean), where the hindcasts beat not only the climatology, but also the uninitialized ensemble ( Figures S4-S8). The recalibration improves the predictive skill and reduces the lead-time dependence for both R10mm and RM7P95 ( Figures S2 and S3). Skill scores for other precipitationrelated quantities (Table 1) like very heavy precipitation days (R20mm) or simple daily intensity index (SDII) are similar to the results presented above (not shown).

| Wind-related variables
For wind, only a few studies focus on decadal predictability. For example, Haas et al. (2016) and Moemken et al. (2016) found some predictive for wind gust and wind energy potentials over Central Europe, while Kruschke et al. (2016) found some skill for Northern hemispheric cyclones. Here, we extend these studies and analyse further potentially user-relevant wind-related variables. The first example is the mean surface wind speed in the extended winter season (ONDJFM). This period is of interest as, for example, the probability of windstorms affecting Central Europe is higher than in summer (e.g., Donat et al., 2010). The ACC for LT2-5 ( Figure 4a) is positive (up to 0.6) for the Mediterranean region, the North and the Baltic Sea, and parts of Central Europe. However, the skill is not significant for most of the grid points. The RPSS shows larger regions with negative skill (Figure 4b). For both skill metrics, the skill averaged over Europe decreases for longer lead-times in spite of the recalibration (e.g., LT6-9; Figure S2).
We also estimated the predictive skill of a simplified storm severity index (e.g., Pinto et al., 2012) for the extended winter season (SFCWIND98W; Table 1). Both ACC and RPSS (Figures 4c, d) show positive skill scores for the Mediterranean region, Eastern Europe and parts of North and Baltic Sea, with highest values (ACC up to 0.8) for the Adriatic Sea. However, only the ACC values are significant in certain regions.
In general, variables related to extreme wind speeds are less skilful than temperature or precipitation-based indices when using the climatology as reference. Nevertheless, some European regions show predictive skill. In addition, wind-related variables exhibit a stronger leadtime dependence ( Figure S2). The applied recalibration method is able to enhance the predictive skill ( Figure S3) and even turns the on average negative ACC values into slightly positive ones. Results look different if the historical ensemble is considered. Here, both indices show a significant added value of initialisation for several European  Figures S4-S8).

| Indices related to the agronomy sector
Finally, we analysed two indices relevant for the agronomy sector, namely, the length of the growing season (GSL) and the growing degree days (GDD; Table 1). Both parameters are based on temperature. Therefore, a certain predictive skill can be expected. Nevertheless, they focus on different aspects compared to the variables analysed in Section 3.1. GSL is often used to determine which crops can be grown in a specific region. With this climate indicator, we analyse if shifts in the timing and length of the growing season show predictive skill on decadal timescales. Figure 5a depicts the ACC of GSL for LT2-5. Correlations are positive (up to 0.6) and mostly significant for Northern, Central, and Eastern Europe. GSL shows no skill for parts of France, Spain, and the British Isles. RPSS values are lower and less significant, with the negative skill extending towards Eastern and Northern Europe (Figure 5b).
GDD addresses the integrated temperature over the summer half-year and is a frequently used measure for describing and predicting the growth and development processes of various crops. In general, the predictive skill is much higher for GDD than for GSL. Both ACC and RPSS (Figures 5c, d) show high positive (up to 0.9 for ACC) and significant values for Southern, Central, and Eastern Europe. Except for Scandinavia, the spatial pattern is comparable to TASMAX. This region exhibits partly negative skill scores, especially for RPSS and longer lead-times.
The regional decadal prediction system provides promising results for parameters relevant for the agronomy sector (see also heating degree days [HD] in Feldmann et al., 2019). Skill scores are comparable to both mean and extreme temperatures, irrespective of the applied reference dataset (Figures S4-S8). Additionally, the skill scores show practically no lead-time dependence when averaged over Europe ( Figure S2). In contrast to other presented variables in this study, both GSL and GDD do not seem to benefit from the applied recalibration ( Figure S3).

| SUMMARY AND DISCUSSION
In this study, we assess the potential for decadal predictability for user-oriented variables and extremes using the regional MiKlip decadal prediction system. This effort extends previous work, which focussed on basic variables Reyers et al., 2019). The main conclusions are (see also Figure 6): • All temperature-related indices show high-predictive skill (mean ACC for Europe up to 0.8) for all leadtimes, indicating that skill extends beyond mean Black dots indicate significant skill at the 95% level temperature. The skill patterns are spatially robust and often significant ( Figure 6). In general, the predictive skill is higher for variables related to summer and high temperatures (in line with Mieruch et al., 2014;Feldmann et al., 2019). The predictive skill of TASMAX is comparable to that of daily mean temperature . The variables SU and HWDS address more heat-related aspects, and show high predictive skill from Central to Southern Europe, where the MiKlip system might provide useful information on the expected heat stress for a given period. Notably, HWDS, which indicates the likely duration of heat waves, hints at the potential for valuable climate information for users. The skill of SU and HWDS is lower for the British Isles and Scandinavia, where a lower number of hot days and thus lower heat stress is generally found, making this type of climate information less valuable. For cold extremes, FD shows a lower but still significant predictive skill for most of Europe ( Figure 6). This information might be valuable, for example, for de-icing activities (Schipper et al., 2019). • As expected, results are more mixed for precipitationbased parameters: The spatial skill patterns are more heterogeneous and the skill scores are generally lower (mean ACC for Europe up to 0.3; Figure 6). This is in line with findings by Mieruch et al. (2014) and Reyers et al. (2019) for mean precipitation in a sub-sample of the full-regional MiKlip ensemble. The selected indices (R10mm and RM7P95) address more extreme precipitation events and might thus be useful to users, as such events may, for example, trigger floods in large river catchments. • Skill patterns are spatially diverse and often not significant for the analysed wind-related indices. Skill scores are highest for Southern and parts of Eastern Europe, but generally lower compared to both temperature and precipitation (mean ACC for Europe up to 0.1; Figure 6). Reyers et al. (2019) found similar results (with lower values) for mean wind speed in a subsample of the regional system. This is in partial contradiction with previous studies on wind energy potentials and wind gusts (Haas et al., 2016;Moemken et al., 2016), which had found higher predictive skill for Central Europe with statistical-dynamical approaches and for shorter lead-times (LT1-4). However, their results exhibited a pronounced lead-time dependence. • Results look promising for indices connected to the agronomy sector, as skill scores are high and often significant for large parts of Europe (mean ACC up to 0.75; Figure 6). The skill patterns are comparable to those of temperature extremes. Indicators like GDD are calculated as integrals over longer periods and might therefore compensate potential timing errors in the annual cycle. Such integrated indices might thus provide more robust information than threshold-based indices. • The majority of the discussed indices show a negligible lead-time dependence in their predictive skill (Figure 6), especially after recalibration. Feldmann et al. (2019) concluded that in some regions there is a superposition of skill due to (a) the initialisation in the first years and (b) the long-term climate trend in later years (their Figure 5). Particularly for temperature-based indices, the high skill originates mainly from the strong trend and neither from the represented variability nor from the initialisation (see Figures S9-S10).  • The applied recalibration approach is generally able to improve the skill of the regional decadal prediction system. The level of improvement depends on the variable and region. This is in line with results by Feldmann et al. (2019) for the surface temperature at the regional scale. On the global scale, DeFoReSt improves the prediction skill for the surface temperature (Pasternack et al., 2018) as well as the GPCC drought index (Paxian et al., 2019). Regarding the different components of the recalibration method, the hindcasts seem to benefit mostly from the adjustment of their trend towards the observed one ( Figures S9-S10). • The focus on the climatology as reference is the most common choice in climate services. In general, skill scores are lower and less significant when considering historical simulations as reference ( Figures S4-S8). Nevertheless, several indices analysed here (e.g., SFCWIND or RM7P95) show a clear added value of initialisation for various regions in Europe. Future work could extend the current study by using other widely used references as, for example, persistence forecasts or statistical models (e.g., Suckling et al., 2017).
We conclude that the regional MiKlip decadal prediction system has potential for several user-oriented variables and climate extremes ( Figure 6). The predictive skill depends on both the variable and the region. While the temperature-based indices are skilful for most of Europe, significant predictive skill for precipitation-and wind-related variables is limited to certain European regions. These regions include Scandinavia and large parts of the European coastal areas for precipitation-based indices, and the Mediterranean and parts of Eastern Europe for wind extremes. The regional differences affect the applicability of the decadal prediction system. Nevertheless, keeping these limitations in mind, the MiKlip system can provide valuable information for users outside the science community, for example, decision makers in politics, economy or society. Potential areas of application for temperature extremes are the health sector (action plans to reduce mortality), the humanitarian risk reduction (forecast-based financing to reduce effects of events) or the IT sector (heat dependent server outages, cooling systems). For precipitation extremes, fields of application may be the hydrology sector (water management) or inland shipping (river water levels), while the predictability of wind extremes might be important for forestry (removal of storm damages) or insurance companies (wind damages to buildings).
Our results illustrate in general terms how the utility of decadal predictions can be enhanced substantially with little additional effort. It is often not necessary to drive an impact model with climate-model output to establish the predictive skill for a climate-impact variable. Instead, it suffices to establish the skill for a climate index that is empirically known to relate to important applications (Table 1). This strategy may well require defining climate indices that are not a priori obvious from a climatephysics standpoint. But once an index is established, investigating its predictive skill from an existing suite of climate predictions is straightforward.
The development of custom-tailored (user-driven) indices that show decadal predictive skill and the development of more specific user products in close cooperation with individual stakeholders is strongly needed and has already started within MiKlip and other projects (e.g., Falloon et al., 2018;Paxian et al., 2019). However, it has to be differentiated between analysing the predictive skill of user-oriented variables and the actual applicability of information from decadal predictions. In order to create valuable information based on decadal predictions, that is, the information is integrated in the established workflow of stakeholders, their features need to be specified from the users' (and not the scientific) point of view (Buontempo et al., 2019). These features include beneficial skill measures, prediction types (ensemble mean, probabilistic predictions), temporal or spatial requirements (lead-time, reference period, temporal aggregation, area of interest), reference datasets, and information formats (numerical, graphical, text). To succeed in identifying what drives the interest of potential users in the rather unknown field of decadal predictions, work must focus on cooperative communication, mutual understanding, and trust building between users and scientists. Nevertheless, the present work is an indispensable first step towards pushing the science of decadal predictability to actual "real world" applicability.

AUTHOR CONTRIBUTIONS
Moemken, Feldmann, and Pinto conceived and designed the study, and wrote the original paper draft. Moemken, Feldmann, Buldmann, and Laube performed the data analysis. Kadow contributed with software and analysis tools. Paxian and Tiedje organized the user workshops and gathered information on the user requirements for tailored products. All authors discussed the results and contributed with manuscript revisions.