Informativeness of wind data in linear Madden–Julian oscillation prediction

Linear inverse models (LIMs) are used to explore predictability and information content of the Madden–Julian Oscillation (MJO). Hindcast skill for outgoing longwave radiation (OLR) related to the MJO on intraseasonal timescales in the tropics has been examined for a variety of LIMs using OLR and optionally 200 and 850 hPa zonal wind information channels. The dependence of OLR hindcast skill on wind channels was evaluated by randomizing in time, averaging in space, or omitting data entirely. Results show positive prediction skill (relative to climatology) up to 3 weeks and wind information, mostly at the largest scales, adds 1–2 days of skill.


Introduction
The Madden-Julian Oscillation (MJO) is an intraseasonal zonally propagating atmospheric signal in tropical rainfall and related fields (Madden and Julian, 1971;Madden and Julian, 1972;Zhang, 2005;Wang, 2006;and Lau and Waliser, 2011). Besides impacting weather directly in the tropics it also has impacts in the extra-tropics through teleconnections and can impact short-term climate events (Martin and Schumacher, 2011;Zhang, 2013). The long timescale of the MJO suggests it could be a source of extended-range predictability.
Linear statistical models can have comparable MJO prediction skill to dynamical models (Newman et al., 2009;Xavier et al., 2014;Klingaman and Woolnough, 2014) and offer unique opportunities for decomposition that may reflect on the MJO's incompletely understood dynamics. Cavanaugh et al. (2014, hereafter C14) explored the skill of linear inverse models (LIMs) in hindcasting the MJO, and this article extends and complements that work methodologically and scientifically. Klingaman and Woolnough (2014) include LIM results from both C14 and the methods described here, as a baseline for evaluating numerical model hindcasts. Those hindcasts as well as C14 were scored in the time-longitude (latitudinally averaged) space of Wheeler and Hendon (2004)'s Realtime Multivariate MJO index (RMM) encompassing Outgoing Longwave Radiation (OLR) and zonal wind at 850 and 200 hPa (u850 and u200). The relevance of including wind field information in addition to OLR has been questioned, for both MJO definition and diagnosis (Kiladis et al., 2014) and for aspects of forecasting such as initiation of a new MJO event (Straub, 2013), whose final paragraph notes that "the RMM index is dominated by its circulation components. However, the clouds and rain are of special interest for impacts, so we will score all hindcasts in terms of OLR anomaly (OLR').
The goals of this article are: (1) to illustrate the workings of LIMs in a more intuitive physical channel space (time-longitude sections), rather than C14's truncated space of empirical orthogonal functions (EOFs); (2) to explore how close the resulting large number of channels brings us to the problem of statistical overfitting (the usual justification for such EOF truncation approaches); and (3) to estimate the value of wind information and small-scale information in statistical predictions of intraseasonal cloudiness signals, and consider the implications as a potential partial clue to MJO dynamics.

LIM summary and statistical forecasting issues
Linear inverse modeling (Penland and Sardeshmukh, 1995) is a generalization of the simple idea that anomalies in a stationary time series decay with time -exponentially, in the case of a postulated system obeying dx/dt = −BX + noise. In a multi-channel LIM (where a 'channel' is meant in the sense of an information stream, i.e. a time series or a column in a dynamical state vector), anomalies can oscillate and Wind information in Madden-Julian oscillation prediction 363 propagate among channels as they decay, because the complex exponential function has those behaviors in addition to the simple decay of the real exponential function.
In fitting a LIM from data, one postulates that those data came from a linear stochastically forced system with the form: where the state vector X comprises m columns of anomaly values and B is an m × m matrix. All linearly predictable dynamical interactions among the system variables are represented in the linear operator B, also known as the deterministic linear feedback matrix or the system sensitivity matrix (Shin et al., 2010). It can be shown for a system of form (Equation (1)) that if the noise term is white (uncorrelated in time, but not necessarily uncorrelated among channels) and Gaussian, then for any specific time lag 0 , B is related to the time-lagged covariance matrix C( 0 ) by: This result is formally identical to how one would estimate a decay coefficient from lagged autocorrelation in a univariate ODE, but here C and B are matrices and the ln[·] function is the matrix generalization of the ordinary logarithm. The optimum forecast (indicated by the caret) for such a system, optimal in the sense of minimizing squared error, is: where G is known as the propagator matrix that evolves initial anomalies, x(t), forward by any desired lead time ( ). When working from a finite, real-world data sample, we must view the B obtained from Equation (2) as an estimate, and view Equation (1) as a postulate of how the real world (which generated the data) acts. One can estimate B from Equation (2) for various training lags 0 . The similarity of these various estimates for B has been viewed as a test (called the 'tau test') of the validity of the postulate that the form (Equation (1)) characterizes the real system adequately (Penland and Sardeshmukh, 1995). Referring again to the simpler univariate case: if the autocorrelation decay rate estimated at different lags is similar, then indeed the decay curve must be close to exponential, which bolsters the case that the data-generating system acts like the simple linear decay equation being postulated (or fitted). We found that Equation (3) gives similar hindcast skill using B matrices estimated from 0 values ranging from 1 to 4 days in Equation (2) (not shown), supporting the LIM approach. Only a little more skill is gained by using lagged regression (LR) rather than LIM (not shown). In LR, one estimates C separately for each forecast lead time , so that Equation (3) simply becomeŝ LIM results offer similar scientific lessons to LR (not shown), but with more elegance and simplicity, and so will be the main focus of this article.

Data and experiments
This study utilizes time-longitude sections of daily data from 1979 to 2011, including interpolated outgoing long wave radiation (OLR) observed from satellites (Liebmann and Smith, 1996) along with zonal wind u at the 850 and 200 hPa levels derived from the NCEP-NCAR Reanalysis project (Kalnay et al., 1996). Each variable was averaged from 15 ∘ S to 15 ∘ N. A 25-year composite annual cycle  was removed from each variable to produce anomalies, and a 120-day mean prior to each day was subtracted to remove low frequency signals, following Wheeler and Hendon (2004), which means that usable data begins 120 days into the time series. Each channel in the training set thus consists of a time series of more than 7000 daily observations. These high-passed anomalies are denoted with a prime, for example OLR'. These data contain many kinds of variability, but the hindcast skill here mostly bears the hallmarks of the MJO (timescale and eastward propagation), so we have used that moniker in the text and title.
It is helpful to define a LIM baseline or control case: all three variables, in 15 degree longitude bins, using 0 = 2 days. Each of the 24 longitude bins thus contains three channels consisting of a daily time series of anomalous MJO index variables (OLR', u850' and u200'). In summary, the baseline LIM has a total of 72 input channels, 3 for each longitude bin. For clean comparisons to this baseline, including wind information denial experiments, we choose to score the hindcasts based on the twenty-four 15-degree OLR bins only. When other channels (u850', u200') are used, their impact is evaluated only in terms of the OLR prediction skill. Likewise, when additional longitude fine structure is included, we score its effect only on 15 degree scale OLR.
In many LIM studies (including C14), principal component series truncation has been used to minimize channel numbers while maximizing the variance represented. However, this encoding of the information channels makes a LIM's workings somewhat opaque. Following the examples of Shin et al. (2010) and Hakim (2013), we leave our channels as spatial boxes, and furthermore they are ordered by adjacent longitudes, so LIM forecasts and their errors can simply be contoured in longitude-time space.
We train LIMs on data from 1979 to 1999 and verify on the independent set 2000-2011, thus eliminating any chance of artificial skill (DelSole and Shukla, 2009). Seasonal masks are also optionally applied to the training period to seek an optimal LIM construction for verification in that specific season.

Longitude dependence of OLR' predictability
LIM skill can be displayed as a function of longitude ( Figure 1). Figure 1 illustrates the correlation coefficient between predicted OLR anomalies from the baseline LIM and observed OLR anomalies for the 2000-2011 period. Regional differences in skill are evident. The highest correlation at all lead times is found in the region of the maritime continent between 110 ∘ E and 130 ∘ E. Here, the correlation remains above +0.6 for 6 days and remains above +0.5 for 13 days. By contrast, the east Pacific region (longitude 230 in Figure 1) has the lowest one week hindcast correlation skill (r < +0.1). Summarizing Figure 1, three hindcast skill hot-spot peaks are identified within the central Indian Ocean, the maritime continent and central Pacific, with areas of low predictive skill from the east Pacific to the Pacific coast of Central America. These results are consistent with the notion that the MJO is the basis of long-range prediction skill, as presupposed in our title, even though the prediction is really for OLR' including all phenomena.
To have a single scalar skill score, we define the verification error score (to be minimized) in future experiments as a global sum of squared OLR' hindcast errors. The no-skill asymptote of this score is the global climatological variance, which is the skill of a forecast of zero anomaly every day (climatology used as a forecast).

Impacts of using a large number of channels
Is a 72-channel LIM too large? That is, will statistical overfitting of so many coefficients from finite training data samples lead to poor skill when tested on independent data? The skill of our baseline results, comparable to results from C14's reduced EOF space (Klingaman and Woolnough, 2014), suggests that the answer is no. To further address this question, we push the numbers much further by doubling the number of longitude bins from 24 to 48, making each longitude bin 7.5 rather than 15 degrees wide. This doubles the number of channels from 72 to 144, quadrupling the number of coefficients in the G and B matrix estimates. Furthermore, we reduce the training set into two independent and consecutive training periods to estimate the effects of sampling error in our final results graphical space. This experiment reduced the number of data points used per coefficient estimated from 50 to 25. * The change from 15 degree to 7.5 degree longitude resolution has a negligible effect on hindcast error (black curves are only slightly above the red curves in Figure 2), thus providing little evidence of overfitting at 7.5 degree longitude bin resolution. However, overfitting becomes steeply worse once longitude resolution increases to 2.5 degrees (8 data values per coefficient, not shown).

Estimated value of wind information: illustrating randomization method
Excluding u850 and u200 from the LIM provides physical insight regarding the impact of wind anomalies on the prediction of OLR anomalies associated with the MJO (Figure 3). The skill score omitting or randomizing wind channels during training and hindcasting (blue and green curves) is compared to the baseline LIM (red curves repeated from Figure 2). Figure 3 indicates that the wind information in the baseline LIM does indeed contribute unique and valuable information content to the LIM for OLR' prediction out to 15 days, a result clearly significant with respect to sampling noise (the gap between the red curves). Furthermore, we can conclude that omitting vs. randomizing the wind channels has a very similar effect, indicating again that our LIMs are far from the danger of overfitting, with the modest number of channels and large amounts of training and verification data used here.

Decomposition of the value of wind information
We also want to know what aspects of the wind field contain the important information content for predicting OLR: the zonal mean wind, the first two zonal wavenumbers (indicative of large-scale convergence and divergence) or other aspects? To address this question, we construct a 30-channel LIM built from an anomalous state vector with OLR' for each 15 degree longitude bin (24 channels), the zonal mean from u850' and u200' (two channels), and the first two wavenumbers of u850' and u200' (four channels). We apply the channel randomization technique introduced in the previous section to test the impact of various wind channels on the prediction of OLR'.
To partition the wind channels' information content, Figure 4 shows results from four 30-channel LIMs: (1) OLR data only (randomized zonal mean and the first two wavenumbers of u850 and u200), (2) OLR and the first two wavenumbers (randomized zonal means), (3) OLR and the zonal means (randomized wavenumbers 1 and 2), and (4) OLR with all six wind components. For reference, the 'baseline' LIM with all wind information at 7.5 degree scale is repeated in red, with two lines trained from independent halves of the training epoch as an indicator of the statistical significance of difference due to finite-sample effects.
The OLR' only LIM exhibits the worst skill (greatest error, magenta curve in Figure 4 up to 2 weeks lead time) while the 'baseline' is the best (red curves). The cases in between essentially map the amount of useful information content in various aspects of u850' and u200'. Results may be summarized as follows: . Squared error for five LIMs calculated using 15 degree longitude bins and trained on the following data channels. Listed in order from highest error to lowest error at a 10-day hindcast lead time are (1) OLR only (magenta), (2) OLR and wave numbers 1 and 2 for u850 and u200 (green), (3) OLR and zonal mean wind (black), (4) OLR, zonal mean winds, and wave numbers 1 and 2 (blue), and (5) the baseline LIM with all wind information (double red curves, for two non-overlapping half-length training periods). The vertical dashed grey line provides a visual reference at the 10-day lead time. Zoomed in insert provided as well.
• Value of all wind information ('baseline' vs. OLR only): 1.5-2 days of additional skill between 5 and 10 day hindcast lead time (red curves on Figure 4; as in Figure  The contributions of 'about 1/2' and 'about 1/3' need not sum to unity, because the channels are not orthogonal, merely linearly independent. In particular, wavenumbers 1 + 2 may contain sample-specific noise as well as robustly useful signal, and overfitting of that noise in the training period could yield lower skill (merely 1/3 instead of 1/2 of the total value of all wind information) in the evaluation period.

Seasonality of predictability
The MJO is seasonally strongest in the northern autumn and winter seasons, with summer intraseasonal variability sometimes given a different name such as MISO or BSISO (Kikuchi et al., 2012;Sharmila et al., 2013). Our findings so far suggest that our dataset is plentiful enough to give robust results even if subdivided. Might OLR' hindcast skill be increased if the training set and verification are confined to certain seasons, rather than pooling all data? To test this notion, the baseline LIM from Figure 2 was subdivided into an all season and boreal winter (DJF) datasets for both training and evaluation. OLR' hindcast skill is much better for DJF training and scoring, compared to the 72 channel all-season LIM (black vs red in Figure 5). If a normalized squared error of 0.85 is used as a no-skill threshold, that is achieved after 7 days for the all season LIM and 9 days for the DJF LIM. Alternatively, if differences are measured vertically on the graph, for a 1-week lead time prediction, the control LIM has a 7% increase in OLR anomaly hindcast error compared to the DJF trained LIM from a 1-week hindcast lead time. All squared errors asymptote toward 1 after a 14-day hindcast lead time. In summary, it is preferable to train the LIM on less data, but on the proper season (DJF), rather than a using a larger set of training data including data from all seasons.

Conclusions
Hindcast skill for 15 ∘ N-15 ∘ S averaged OLR' in time-longitude space has been examined for a variety of LIM models using daily OLR data and optionally 200 and 850 hPa zonal wind from 1979 to 2011. Results show some positive prediction skill (relative to climatology) up to 3 weeks, consistent with Cavanaugh et al. (2014) who used a LIM built with a reduced channel space of EOFs from maps (not just latitude belt averages). Klingaman and Woolnough (2014) shows that the present approach, with many more channels but simpler spatial interpretation, is just as skillful, or even more so for the Year of Tropical Convection intercomparison case presented there.
The dependence of OLR' anomaly prediction skill on information in other channels of the dynamical state vector was evaluated. LIM predictive skill is robust to the number of input channels up to 144, giving similar skill whether excess channels are omitted or randomized in their time ordering. Using 15-degree longitude bins for the 3-variable LIM results in a 72 × 72 lagged covariance matrix consisting of 5184 coefficients fitted to two training periods between 1979 and 1999 (3652 days × 72 channels), or about 50 data values per coefficient. Even with double the number of channels (7.5 degree bins; 1/2 as many data-per-coefficient), no skill loss (evidence of overfitting) was evident. Skill loss and possible overfitting is finally evident with 2.5 degree longitude bin channels or in this case at about 8 data values per coefficient.
Wind data (u850' and u200') adds 1.5-2 days of additional skill. Wavenumbers 1 and 2 contribute less than that in isolation (perhaps because they also contribute 'noise' distractions: sample-dependent patterns that are not repeatable in the verification period). All higher wavenumbers contribute negligibly (the blue line is statistically indistinguishable from the red lines in Figure 4).
In general, winds and OLR are correlated predictors, so their information content is mixed and they are not cleanly separable despite the labels which sound like they are two independent physical quantities. The results here do not necessarily shed light on fundamental MJO dynamics. Combined-variable indices always have debatable relative normalizations for the different variables (Liu et al., 2016). From this study's point of view, predicting anomalous clouds and rain (OLR), the wind information may help the system avoid being misinterpreted by happenstance occurrences of MJO-shaped equatorial cloud patterns that are not actually part of a predictable wave in the real physical memory variables (inertia or water vapor or perhaps SST). Predictability can be limited as much by the strength of distractions and noise as it is by the dynamics of the predictable subsystem, so results about predictability may be results about such noise, not about dynamics. LIM OLR' hindcast prediction error is better during the DJF season and in Indo-Pacific longitudes, both indicative of the region and season where the MJO contributes the most to OLR anomalies and where tropical OLR variance is greatest.