A seamless filter for daily to seasonal forecasts, with applications to Iran and Brazil

A digital filter is introduced which treats the problem of predictability versus time averaging in a continuous, seamless manner. This seamless filter (SF) is characterized by a unique smoothing rule that determines the strength of smoothing in dependence on lead time. The rule needs to be specified beforehand, either by expert knowledge or by user demand. As a result, skill curves are obtained that allow a predictability assessment across a whole range of time‐scales, from daily to seasonal, in a uniform manner. The SF is applied to downscaled SEAS5 ensemble forecasts for two focus regions in or near the tropical belt, the river basins of the Karun in Iran and the São Francisco in Brazil. Both are characterized by strong seasonality and semi‐aridity, so that predictability across various time‐scales is in high demand. Among other things, it is found that from the start of the water year (autumn), areal precipitation is predictable with good skill for the Karun basin two and a half months ahead; for the São Francisco it is only one month, longer‐term prediction skill is just above the critical level.


INTRODUCTION
Since about a decade, the field of seamless weather prediction has experienced quite active development (Palmer et al., 2008;Bauer et al., 2015;Brunet et al., 2015;Vitart and Robertson, 2018). This was due to both a deeper understanding of the underlying nonlinear dynamics of the system, as well as greater computational capacity to run the corresponding numerical weather prediction (NWP) models. Originally used for the interface between weather and climate (Palmer et al., 2008;Shukla et al., 2009), the word "seamless" now refers to efforts of unifying prediction systems (initialization, parametrization, numerics) across all time-scales. This study emphasizes predictions ranging from days to seasons, specifically from lead times of 1 to 180 days, as produced by the latest seasonal prediction system SEAS5 of the European Centre of Medium-range Weather Forecasts (ECMWF) (Johnson et al., 2019). Potential atmospheric predictability at seasonal time-scales is grounded in, mostly coupled, atmosphere-ocean processes, such as the El-Niño-Southern Oscillation or the Madden-Julian Oscillation, that are themselves predictable at those time-scales (Rowell, 1998). Land processes such as soil moisture, snow cover and vegetation also play a role (Robertson and Vitart, 2019).
Depending on variable and region, to each prediction lead time there tends to be associated a characteristic time-scale whose variations are optimally predicted. As shown in fig. 1d of Reichler and Roads (2003) or fig. 2 of Buizza et al. (2015) for the 30 days lead time, for example, the 60 days average is better predicted than the 30 days average, as the former obviously better eliminates unpredictable features that propagate with periods less than those 60 days. A systematic study of the dependence of forecast skill on time-averaging and lead-time was undertaken by Zhu et al. (2014); they employ for a lead time of a time averaging window of width . A similar approach was taken by Ford et al. (2018); their averaging window is given by the Poisson distribution, with the single parameter as lead time. Since mean and variance of the distribution are equal to , their approach resembles that of Zhu et al. (2014). Finding the optimum aggregation scale is a non-trivial problem, since very often the underlying process time-scales are uncertain, and the problem comes down to a trial-and-error exercise. But that is not the subject here. Instead, similar to the Zhu et al. (2014) and Ford et al. (2018) approaches (cf. also Wheeler et al. (2017)), but more flexible, the scales are prescribed in terms of a seamless filter (SF), as detailed in section 2.2. Basically, the SF would be a standard low-pass filter that retains frequencies only below some cut-off value, but its characteristics change with growing lead time, in a way so as to filter, in the standard setting, for increasingly lower frequencies with growing lead time. But in contrast to the Zhu et al. (2014) and Ford et al. (2018) approach, the rate of increase in the SF is fully configurable and governed either by expert knowledge or by user demand.
A more objective method of specifying time-scales exists for larger-scale problems, such as hemispheric or global, by decomposing the spatial and lead-time information into dominant modes of predictability (DelSole and Tippett, 2009a;2009b). Our focus regions are too small for this to be applied, however.
Early on, it was understood that seasonal predictability is concentrated on the Tropics (Stockdale et al., 2018). But extratropical predictability, mainly through influence from stratospheric variability, has recently come into the focus as well (Kirtman and Pirani, 2008;Sigmond et al., 2013;Sun and Ahn, 2015). Here we study daily to seasonal predictability in two regions in or near the tropical belt, with climates that are partly arid: the São Francisco basin (SFB, [41 • W, 11 • S], 617,812 km 2 ) in northeast Brazil and the Karun basin (KB,[49 • E,32 • N], 65,230 km 2 ) in southwest Iran. Weather and climate of the SFB are mainly influenced by the South American monsoon system (Silva and Kousky, 2012); the semi-arid climate of the KB, on the other hand, sees influence from Mediterranean cyclones and the Indian summer monsoon (Alijani, 2002;Yadav, 2016). For both regions the rainy season, during which all the reservoirs are being refilled, lasts from October-November to March-April, for which corresponding predictions are therefore most relevant.

METHODS AND DATA
The downscaling of the seasonal predictions follows a perfect-prognosis approach, which belongs to the standard procedures of climate downscaling: Using a set of observed local station data (section 2.4) and observed (i.e. analysed) atmospheric fields (section 2.5), a statistical model is calibrated (section 2.1). To obtain the local seasonal predictions, the model is applied to the same set of fields, but as predicted by an atmosphere-ocean model (section 2.5) (which is assumed to contain a perfect representation of the atmospheric fields). Finally, the seamless filter (section 2.2) is applied to each station record, both as observed and as predicted, and their areal mean values are compared to estimate the prediction skill.

Expanded downscaling
It is well known that using large-scale, gridded atmospheric fields directly is not recommended when comparing with station data, but to employ some form of adjustment instead, such as downscaling. Here we use Expanded Downscaling (XDS), which is a regression-based approach that simulates local events as close to and as consistent with the prevailing atmospheric circulation as possible. In deviation from classical regression, the error minimization is done under the constraint to preserve local covariability (of variables and stations). This preservation of covariability renders XDS particularly useful for applications related to hydrological extremes, such as floods and droughts (Bürger, 2002; see also Bronstert et al., 2007). XDS has been thoroughly validated, in numerous climate impact studies as well as for local weather forecasts, and it performs relatively well in comparison to other methods (Bürger et al., 2012). In those studies, for local station data such as temperature and precipitation, a mix of atmospheric upper-level and surface predictor (regressor) fields has been proven most appropriate, as detailed in section 2.5. With a focus on extremes, XDS is most appropriate for the daily time-scale. In this study the daily local and atmospheric series described in sections 2.4 and 2.5 are used as predictands and predictors, respectively.

The seamless filter
It is common folklore that daily weather becomes fundamentally unpredictable beyond about 2 weeks lead time, according to the famous result on nonlinear error growth known as the butterfly effect (Lorenz, 1972). But as mentioned in section 1, predictions beyond this limit are nevertheless possible if adequate aggregations in time (or space) are formed. For seamless predictions, therefore, one would like to have this done automatically, that is, increasingly aggregating data with growing lead time. Zhu et al. (2014) achieve this by categorizing lead times and aggregation levels into a few classes, such as "1d1d" and "2w2w", referring to lead-and aggregation times of 1 day and 2 weeks, respectively. Instead of using categories, we have designed a digital filter that does the same continuously for all lead times. This seamless filter (SF) is now described in more detail.
Noting that any aggregation level can be realized by a smoothing (low-pass) filter with characteristic cut-off period, we define a mapping, A: The mapping parameters are the two maxima of lead time T x and cut-off period A x , plus one shape parameter that governs the growth of A with d: It is designed so that A(T x ) = A x and, for = 1, A(1) = 1. Similar realizations of the mapping are certainly possible; they only should roughly follow the dependence as depicted in Figure 1. In general, the SF design requires a balance between forecast skill and forecast detail (amplitude), which would translate to balancing expert knowledge and user needs. For example, the filter with (T x = 180 days, A x = 180 days, = 1) is essentially the setting used by Zhu et al. (2014) and Ford et al. (2018). In this study, "standard" parameters of T x = 180 days (6 months), A x = 150 days, and = 2.4 are used; what this means concretely will be shown in an example further below.
From this basic aggregation law A, the SF is constructed as follows: suppose a sequence of forecasts, x(d), is given with growing lead time d, 1 ≤ d ≤ T x . For each d we form the filtered series̃d(t), t = 1, … ,T x , which is obtained from x(d) by applying a conventional low-pass filter with cut-off period A(d). We thus obtain T x filtered copies of the original series x(d) or, by taking those as rows of a quadratic matrix, a F I G U R E 2 Example of the SF as applied to a 180-day forecast of areal P for Iran, issued on 1 December. Top: Raw and low-pass filtered series with three different cut-off periods, using a zero-phase (non-causal) Butterworth filter. Bottom: The SF (heavy black) of the series. The dots illustrate how SF is constructed from the filtered series above. The SF parameters are: defines the seamless filter SF. Note that by operating on the whole series, Equation 2 cannot be realized as a linear digital filter (like most smoothing filters themselves, e.g. Butterworth), it is not even finite, nor is it time invariant; its computational complexity is nevertheless feasible. The desired result of applying SF to a sequence of forecasts is that with increasing lead time, the forecasts are increasingly smoothed up to the longest forecast with maximum smoothing at cut-off period A x . The single low-pass filters are realized by using a Butterworth filter of order 3, run in forward-backward mode (Gustafsson, 1996) to avoid any time shifts (phase lags). The content of Equation 2 is illustrated in the example of Figure 2. As promised, I give an example of a forecast issued on 1 November and valid for 1 December, i.e. with a lead time of 30 days. For that lead, the above SF characteristic (T x = 180 days, A x = 150 days, = 2.4) corresponds to a cut-off period of 52 days. This means that the user or expert is interested, for that specific lead, in the rainfall amount F I G U R E 3 The study regions of the Karun, Iran, and the São Francisco basin, Brazil, along with their respective precipitation stations of a 52-day period centred about 1 December (i.e. between 4 November and 26 December). This amount is obtained from the 180 days predictions of 1 November as provided by SEAS5/XDS, by applying the SF to each predicted station. The expected skill of the areal average forecast can be read from the figures below (this being the subject of the current study).
The display of SF-based, or seamless, predictability depending on lead time, as presented here, may at first look somewhat surprising, and may violate one's expectations of prediction skill decreasing monotonically with lead time. By looking at a range of time-scales, one is faced with the complicated superposition of different physical processes that have their own predictability. For example, sea-surface temperature in the tropical Pacific has a fairly high predictability on seasonal time-scales due to the El Niño-Southern Oscillation, likely higher than corresponding variability on the daily or weekly scale. Seamless predictability, hence, does not automatically decrease with lead time; it may indeed increase, as our examples show.
One may object that smoothing will inflate the reported skill by using future information that is not available at issue time. This argument does not apply. At issue time, the (SEAS5) forecasts themselves only use information that is F I G U R E 4 P climatology (mm⋅day -1 ) for the focus regions available then (this applies strictly only to the operational mode; in reforecast mode, future information is available but not used). The same applies to the downscaling. For the skill evaluation one can use all information from the forecasts, including data from beyond the valid time (e.g. for filtering). While such data are available for the (SEAS5/XDS) predictor F I G U R E 5 Observed and ERA5/XDS-downscaled areal P for the KB, for the year 1995 at issue time, for the predictand they become available only later, which is when the smoothing filter can be applied (i.e. in hindsight).
Note that our notion of "lead" is centred about the point of interest, extending both into the past and into the future (the reason we used a zero-phase filter). This can be circumvented by formulating a filter that is causal in a reversed sense, so that only future information (relative to valid time) is used, meaning: for the valid time 1 June and a 3-month average, use only the months June, July and August (which would be 3m3m in the terminology of Zhu et al. (2014) for forecasts issued on 1 March). Likewise, by using a causal filter, a "1-month lead" involves only past information (relative to valid time), like forecasting average May rainfall on 1 May, valid 1 June. As the examples show, the question of lead time is basically a semantic one.

Potential predictability
As a validation for the downscaled ensemble spread, the approach of Buizza (1997) to estimate potential predictability (PP) may serve. It rests on the assumption that at any given lead time, the ensemble spread is a perfect representation of the prevailing uncertainty. In other words, the actual observations are indistinguishable from any of the ensemble members. This allows the assessment of PP in the ideal case of a perfect model with perfect initial uncertainty. Conversely, discrepancies between potential and actual predictive skill give a measure of the deviations from those ideal conditions. It is worth noting that while the meteorological ensemble is designed to represent the true atmospheric uncertainty, including initial conditions and physical parametrizations, the local ensemble, as obtained from coupling a downscaling scheme to the atmospheric fields, may well diverge from the true local uncertainty (whose structure is anyway different from the grid-point errors), for example, if the downscaling scheme is mis-configured. To estimate PP, a "perfect ensemble" system is formed from the original ensemble by selecting one member at random as the verifying analysis, cf. Buizza (1997); this provides a representative distribution of the potential skill of the prediction.

Focus regions
We use two focus regions, the Karun basin in southwest Iran and the São Francisco basin in Brazil, as shown in Figure 3 along with the weather stations used in this study. As evident from Figure 4, the rainy season lies between October-November and March-April for both regions, with a very pronounced seasonal cycle, making their climate semi-arid. During the wet period reservoirs are re-filled, and that is the time for which seasonal predictions are most important. Both basins are near the tropical belt of best seasonal predictability. For the KB, between 1983 and 2015, 43 stations measure daily minimum (T n ) and maximum (T x ) temperature and precipitation (P). For the SFB, between 1981 and 2017, daily mean temperature (T) is also measured, altogether for 65 stations. We include temperature here because XDS simulates P-T covariability, which provides an extra constraint on the local data.

ERA5 and SEAS5 from ECMWF
As fully described by Molteni et al. (2011), the integrated forecasting system (IFS) of the ECMWF is a fully coupled model that predicts the global atmosphere-ocean system up to 180 days in advance. The new ERA5 reanalysis (Hersbach and Dee, 2016) (also based on IFS) serves to provide the currently best approximation of what can be called atmospheric observations. Uncertainty is modelled using a sophisticated set of perturbations, both of initial conditions and parameters, and simulating an ensemble of 25 members. During a hindcast or forecast experiment by any NWP model, no observational input is allowed or available, respectively, to serve as a corrective. The model will therefore tend to evolve from its initial state at zero lead time into its own climate, which will more or less differ from observations. This climate drift becomes visible already after just a few days, and increasingly so with the forecast horizon approaching seasonal. This is also the case for SEAS5, although the drift is already smaller than in the predecessor version SEAS4 (Stockdale et al., 2018). Obviously there is a need to correct for this drift, which was done as described in Appendix S1, cf. Figures S1-S4.
SEAS5 data are available for the years 1981 to the present (or in our case 2018). As predictors for the downscaling (cf. section 2.1) we use upper-level and surface fields from ERA5 and SEAS5: at pressure levels of 700, 850 and 1,000 hPa we use temperature, specific humidity, and the wind vectors, together with precipitation at the surface (that is, gridded ERA5 and SEAS5 precipitation). For the KB we use the area between (40 • E, 20 • N) and (60 • E, 40 • N), and for the SFB the sector between (50 • W, 35 • S) and (25 • W, 5 • S), all in 1 • resolution. The fields are projected onto the dominant principal components, and from the resulting predictor time series the most important ones, 260 for SFB and 124 for KB, are selected using a least angle regression (Hastie et al., 2005) for the areal mean of all local variables. These serve as input to the XDS downscaling.

RESULTS AND DISCUSSION
Downscaling validation is done for the respective areal mean P series. The seamless seasonal predictions are evaluated using the corresponding SF versions. For all predictions (issued 1 January, 1 February, etc.) the skill of each lead time (i.e. a particular day) is accordingly estimated from those years between 1981 and 2018 when observations and predictions are available for that particular day; for the KB and the SFB that gives a sample size of 33 and 37 days, respectively, in all years. We may assume that the 33 and 37 data pairs are independent (ignoring multi-annual random variations), which results in a critical correlation value of = .40 and = .38 at a significance level of 1% (which we use throughout). Figure 5 shows for the KB and the year 1995 daily areal P as observed and as downscaled by ERA5/XDS. The strong annual cycle is well matched, even single events are reproduced with high accuracy; the actual scale of events is sometimes over-and sometimes underestimated, a consequence of the built-in feature of XDS to preserve local covariability, cf. section 2.1 and section S2 in Appendix S1. For the full validation period (1983)(1984)(1985)(1986)(1987)(1988)(1989)(1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000), this results in a correlation of = .72 for daily de-seasonalized and normalized P using the Normal Quantile Transform (NQT) (cf. Appendix S1) P. The analogous result for downscaled T is shown in Figure S5 in Appendix S1, showing a very high accuracy with a correlation of = .96.

ERA5/XDS validation
For completeness, a comparison is made between the downscaled and the "raw" gridded P fields from ERA5. As shown in Figure S8 in Appendix S1, compared to the grid-point-based areal mean P, based on the convex hull of the station locations, the downscaled version performs considerably better, as revealed by correlations of = .53 vs. = .85, underpinning the need for downscaling.

SEAS5/XDS local predictions
For the KB, the local authorities issue monthly-to-seasonal predictions (or "outlooks") in rolling releases per month, starting with the beginning of the water year in October (in the form of monthly ("O"), seasonal ("OND"), or semi-annual ("ONDJFM") forecasts. This roughly corresponds to the lead times of 15 days (SF = 27 days), 45 days (SF = 74 days), and 90 days (SF = 121 days). We start by showing in Figure 6 the correlation skill obtained for the 180 days area and ensemble mean forecasts issued on the first of October and November. Unsurprisingly, the SF-processed forecasts are essentially all above the corresponding unfiltered daily scores (those fall below the significance level of 0.4 already after a week or so). The filtered October predictions remain skilful ( > .6) for almost a season, dropping below significance only in January and staying there. The November predictions start with a somewhat higher skill ( > .8), then drop off to insignificant levels in December, but increase again to fairly large skill values ( ∼ .7). Moreover, like in all other cases, after about 4 months lead time the skill curves do not change much anymore. This is a simple consequence of the SF characteristic, for which the smoothing effect becomes less distinguished after that F I G U R E 6 Correlation skill for the raw (thin) and seamless (heavy) areal and ensemble mean forecasts issued on 1 October (top) and 1 November (bottom). For each issue and lead time, correlations are estimated from the years 1983-2015, which results in a critical value of 0.4 (see text), displayed by the grey confidence band. The SF characteristic curve is shown as inset time; to be able to distinguish greater lead times, a flatter SF characteristic must be used.
Why are the two predictions valid for December so different? Looking at the SF characteristic, one sees that the October predictions employ almost a 100-days smoothing for December, whereas that filtering is below 50 days for the November predictions. So apparently one can predict the season around 1 December more skilfully from October than (approximately) the month of December from November. This is supported by looking at seasonal predictions (with very strong smoothing, cf. Figure S9 in Appendix S1); they show a much higher skill especially for the November forecasts. Note, however, that due to the strong smoothing the predicted amplitude is much less in that case. The higher skill is evident from the time series shown in Figure 7; it shows normalized observations and ensemble-mean predictions valid for 1 December; due to the different smoothing (95 days vs. 54 days), observations slightly differ between the panels. It is obvious that the longer-lead October predictions are much better than those of November. Especially around the years 1985, 2000 and 2010, differences are noticeable, with large discrepancies in the November predictions.
It turns out that the failure to predict December rainfall from November is actually caused by the downscaling. As shown in Figure S10 in Appendix S1, there is no drop in skill for the (raw) gridded SEAS5 rainfall data, even without drift correction. Although this failure seems to be an exception (see Appendix S1), further analysis is needed to understand that discrepancy.
It is of interest whether and how strong the skill depends on the fact that the ensemble mean is considered. To that effect, we re-plot the correlation skill for the October ensemble mean predictions, and compare it to the skill of the single members. For comparison, we do the same for the February predictions. In Figure 8 the October predictions show that for short leads of under about a week, the ensemble mean performs like an average member, as is to be expected. After that, however, up until about 2 months, the ensemble mean is better than any single member (except for one short period for one member around late October). The single and ensemble mean skills settle to their final value after about 4 months, which for the mean is not significant ( < .4) and, probably randomly, within the spread of the single skills. With respect to PP, the initial skill is expectedly much larger, this gap indicating the error in initial conditions. The November-December high skill of the mean is strikingly different from the fairly low PP skill, pointing again to a mismatch between real and predicted uncertainty. In a way, the February ensemble mean prediction shows the reverse behaviour: starting with very good initial skill ( > .8), the skill drops early below significance (also for single members) but regains significant values after mid-April and ultimately converges to high values of about 0.6. Unlike the October case, the PP skill is highest at all lead times, which indicates an overall higher consistency between real and predicted uncertainty (i.e. prediction error and ensemble spread). Most interestingly, for long leads the ensemble mean again outperforms all the members, whereas in the shorter range (mid-February to March) the mean has average skill, similar to the long-lead October predictions. It appears, thus, that skilful prediction of the ensemble mean is directly linked to that mean being more skilful than all members.
This may be verified when looking at all months, as in Figure 9. One notices that when the ensemble mean skill F I G U R E 8 Comparison of seamless correlation skill of single members (grey) vs. the ensemble mean (thick black), for the 1 October (top) and 1 February (bottom) predictions. For comparison, the potential predictability skill is also shown (thin black) is significantly high for a range of leads, as in the February, March, October and November predictions, it is indeed also higher than all members. Vice versa, if it is low, then it tends to be lower than several members, as is evident from the long-lead predictions of June, August and December. For almost all months and lead times, PP skill is at least as high as the ensemble mean, with two major exceptions: the shorter-lead October and the longer-lead November predictions (see above), which again points to inconsistent prediction ensembles for those months. Figure 10 shows observed and ERA5/XDS simulated areal P for the SFB. It is obvious that the seasonal cycle is somewhat weaker as compared to the KB (with its extended dry period), but still considerable, and it is well reproduced by the downscaled ERA5. This includes the main events, which are sometimes over-and sometimes underestimated, like for the KB. The respective de-seasonalized, NQT-normalized correlations for the validation period 1981-1999 are: = .70 (P), = .78 (T n ), = .78 (T), = .79 (T x ). Time series are shown in Appendix S1 (Figures S6 and S7).

SEAS5/XDS local predictions
Compared to the important October/November skill of the KB, corresponding skill for the SFB (cf. Figure 11) is less satisfactory. As skill is already low for zero lead time, this may indicate a rather bad initialization for those months; note that predictions issued in autumn are by far the worst for all variants (gridded with and without drift-correction, XDS downscaled, cf. Figure S10 in Appendix S1). The medium-term November forecasts are somewhat better, while the long-term forecasts are just above the significance level.
For the other months, skill values are often better, as demonstrated in Figure 12. Predictions are skilful when issued in the months from February to June, almost equally for all leads (except for the February case), in accordance with previous studies (Hastenrath et al., 2009 and references therein).
The mentioned behaviour, that the ensemble mean prediction skill is significantly high if and only if it is higher than all members, is violated only for the May and July predictions, which are skilful but slightly smaller than some members. PP skill is higher in practically all significant cases. As an example, we show in Figure 13 a comparison of the predictions valid for 1 November, issued in June (SF = 148 days) and October (SF = 55 days). The higher skill is visible for the longer-lead June forecasts. The medium-term October forecasts are especially bad in the years before 2000.
A figure comparing the ensemble mean skill to that of the single members and to the PP, as in Figure 8, is shown in Appendix S1 ( Figure S12).

Discussion
We need to understand the high prediction skill for long leads that is seen for both basins. A good example where both predictions are similarly high is for February. Both start quite skilful ( > .8), then level off in March, and regain high skill ( > .6) up until the end of the forecast period. Forecasting for June involves a lead-time of 4 months or ∼120 days, which translates into an SF smoothing of about 140 days, representing a valid time from spring to the summer months. Inspecting climatology, rainfall for that period is to a larger portion determined by spring rainfall. If that is well predicted, the corresponding skill would be extended into the summer months and "smoothed" with their own skill. We have checked prediction skill confined to April alone, by employing a seamless filter with a much weaker, roughly monthly, smoothing, using (A x = 30, = 100.0), and compare it to our standard filter (A x = 150, = 2.4) in Figure 14. For the KB, there is indeed monthly skill for March and, marginally, for July. For the SFB, on the other hand, monthly skill is quite strong throughout from February to July. So at least for SFB, the high skill values that we have seen for the original SF can be explained from the monthly skill already.

F I G U R E 9
For KB, all 12 monthly seamless prediction skills for the ensemble mean (thick black) and potential predictability (thin black), along with the range of the single members (grey) F I G U R E 10 Similar to Figure 5, for the SFB

CONCLUSIONS
The seamless filter (SF) is introduced here as a tool to present daily to seasonal forecasts in a concise and continuous way, with lead time and time-scale varying consistently. The rule after which they vary -how time-scale (smoothing) increases with lead time -needs to be specified beforehand from expert knowledge or user demand. Applicability of the SF, and corresponding skill, was tested using ECMWF's SEAS5 system and 180-day forecasts for two river basins in Iran (KB) and Brazil (SFB). Being station-based, the forecasts, after F I G U R E 11 Like Figure 6, for the SFB the required downscaling (here done by XDS), fit easily to existing water management strategies.
The water year for both basins starts in October-November and ends in March-April the following year. Ideally, one would like to have skilful long-term predictions starting some time in autumn. According to Johnson et al. (2019), fig. 20, both regions lie at the boundary of skilful tropical predictions (DJF, 1-month lead). This is consistent with Figures 6 and 12 here. For the KB, predictability extends further in to the water year with increased smoothing up until April. For the SFB, the same is not true since predictability is not clearly above the significance level. Greater predictability is seen starting with the January forecasts and higher medium-term skill, and extending to the full 180-day range (and strong smoothing) for the February to June predictions.
The character of this study is mostly methodological, yet a word of caution is in place with respect to so many forecast skill outcomes, across multiple locations, lead times, and time-scales, that entail enough statistical degrees of freedom from which "skilful" predictions are easily detected based on pure chance. One must shy away from overemphasizing them, or if not, scrutinizing them carefully with respect to physical plausibility and internal consistency. This was done only briefly here (e.g. Figures 7 and 13) but should be done elsewhere with greater diligence.
The study can be extended in three obvious directions. (a) By focusing solely on areal mean precipitation, spatial patterns varying within the season and, correspondingly, within the forecast period are ignored. For example, in the SFB (with more than 20 • latitude) the rainy season travels from Tres Marias, then to Sobradinho, and finally to Itaparica (personal communication, Francisco Vasconcelos Júnior, FUNCEME); this should be accounted for in a refined assessment. (b) The current study evaluates ensemble mean forecasts, which helped to somewhat reduce the complexity of the problem. The next step is to extend the evaluation to F I G U R E 12 Like Figure 9, for the SFB F I G U R E 13 Similar to Figure 7, the June (top) and October (bottom) seamless predictions valid 1 November, for the SFB a fully probabilistic setting that assesses local forecasts along with their uncertainty. (c) The SF aggregation is currently formulated as a simple form of averaging (or Butterworth smoothing). When a greater focus on real events is intended, that may not be enough. In that case, one may employ a more F I G U R E 14 For both basins, seamless February forecast skills with two different SF characteristics, one with strong (solid) and one with weak smoothing (dashed) refined statistic, such as the likelihood of events per time range, or their expected intensity, or any statistical measure that may fit to that time range. With the current set-up this can be implemented easily.