Prediction of the ENSO and EQUINOO indices during June–September using a deep learning method

The Equatorial Indian Ocean Oscillation (EQUINOO) and El Niño Southern Oscillation (ENSO) are important climatic oscillations over the Indian and Pacific oceans influencing the inter‐annual variation of the Indian monsoon. The study of these indices, including their relationship and influence over various climatic phenomena, is the main focus in the literature. However, an attempt is made here to predict the indices for different temporal periods. Though ENSO prediction is established by many statistical and numerical models, the prediction of the EQUINOO index is not much studied. A deep‐learning method using an autoencoder is proposed for the prediction of the EQUINOO and ENSO. An autoencoder assists in feature learning. The learned features are ranked using linear and nonlinear correlation studies. This assists in identifying a set of potential predictors, used for indices prediction, with an ensemble of regression trees and decision forest models. Predictors identified by nonlinear correlation are observed to predict with better accuracy as compared with linear correlation. The predicted indices show high correlation against the observed values. The EQUINOO prediction is provided with a high lead of 7 months with a 0.88 correlation co‐efficient (p < 0.001) and the ENSO with a lead of 1 month with a 0.87 correlation co‐efficient (p < 0.001) between the observed and predicted indices. Moreover, the proposed method proves efficient in predicting the positive or negative index values with an appropriate sign. The ENSO prediction by the proposed approach is observed to be comparable with the existing models.


| INTRODUCTION
The inter-annual variation of the Indian monsoon is influenced by the alteration in the Equatorial Indian Ocean Oscillation (EQUINOO) and El Niño Southern Oscillation (ENSO) (Gadgil et al., 2004). The Indian monsoon is an important climatic phenomenon and the southwest monsoon season that starts from June-September accounts for > 75% of annual rainfall. A variability of 10% in rainfall from its long-period average can lead to extremes in the monsoon. Thus, the study of these two influencing climatic indices is extremely important for an agriculture-based country such as India.
The ENSO resembles the irregularly repeated alteration in atmospheric winds and sea-surface temperature over the tropical eastern Pacific Ocean. The southern oscillation corresponds to the atmospheric part of the ENSO, and El Niño corresponds to the change in sea-surface temperature. The associations between El Niño (the warmer phase of the ENSO) and drought, and La Niña (the cooler phase of the ENSO) and excess precipitation in India have been well established in the literature (Rasmusson and Carpenter, 1983).
The EQUINOO index is the oscillation in convection between the Western equatorial Indian Ocean (WEIO) (covering 50 E-70 E and 10 S-10 N) and the Eastern equatorial Indian Ocean (EEIO) (covering 90-110 E and 10 S-0 Equator). The negative phase of the index, which is enhanced convection over the EEIO, is connected with the westerly anomalies of equatorial zonal wind and the positive episode of the index is connected with the easterly anomalies of zonal wind at the equator. The EQUINOO is found to have a significant role in the variability of the Indian monsoon (Gadgil et al., 2004. Webster et al. (1999) have shown that the variation over the Indian Ocean is characterized by the internal mode of its oceanic system, despite the presence of external phenomenon such as the ENSO. Ihara et al. (2007) have shown that a combined index of the EQUINOO and ENSO can explain the variability of the Indian monsoon to a greater extent as compared with the individual counterparts. Vinayachandran et al. (2009) have elaborated on the connection between the EQUINOO index with the climate of the Indian Ocean and its surroundings. A new index based on outgoing long-range radiation for the EQUINOO, named as EQUINOLR, has been introduced by Francis and Gadgil (2013).
The mutual relationships between the ENSO, EQUINOO and Indian monsoon have been widely explored. Gadgil et al. (2004) have shown a strong association between the extremes of the monsoon with the composite index of the EQUINOO and ENSO. Ihara et al. (2007) have found a negative association between the EQUINOO in the presence of the warm phase of the ENSO and the Indian monsoon, but no strong association is noted for the cooler phase. Pokhrel et al. (2012) have studied the periodicity and coherence between the Indian Ocean dipole and the ENSO. The consolidated index of the EQUINOO and ENSO has been able to explain 54% of the variance of the Indian monsoon, indicating the acute dependence of the performance of monsoon prediction over the efficiency of predicting the indices (Surendran et al., 2015). Several numerical models (including coupled models) have also been used to simulate the links between the two indices and the Indian monsoon. Vishnu et al. (2019) have shown that the majority of the EQUINOO events simulated by National Centers for Environmental Prediction (NCEP) coupled forecast system model version 2 (CFS v2.0) are associated with ENSO, which results in high correlation between the EQUINOO and ENSO in their simulation (instead of the poor correlation between the two in the observation) and, thus, the influence of the EQUINOO on the Indian monsoon has been overshadowed by the co-occurring ENSO events. Nanjundiah et al. (2013) have shown that the state-of-the-art coupled models fail to simulate the observed links between the EQUINOO and the Indian monsoon. These studies reinforce the requirements for the use of some advanced methods to predict the ENSO and EQUINOO indexes. The predicted index should be closer to the observed value and can further assist in the better prediction of variability of the Indian monsoon.
In this paper, a deep-learning method is proposed for the prediction of the ENSO and EQUINOO. Data-driven machine learning and deep-learning methods have shown good promise in addressing problems in climate sciences owing to the availability of huge amounts of climatic data (Liu et al., 2015;Saha et al., 2016). They have been found to be efficient in predicting the Indian monsoon for aggregate and regional parts (Saha et al., 2016(Saha et al., , 2017. Studies on the EQUINOO have mainly focused on both its characterization and relationship to the other climatic phenomenon to date (Gadgil et al., 2004Pokhrel et al., 2012). The proposed approach in the present paper attempts to predict the EQUINOO. An autoencoder model (Baldi, 2012) has been used for unsupervised feature learning from climatic variables and identifying potential predictors for the indices. Ensemble prediction models have been used to predict the ENSO and EQUINOO. The prediction of indices is performed for the summer monsoon period of June-September, and for each of the individual months. The prediction of the ENSO and EQUINOO with high accuracy is important for the better oversight of the related climatic phenomenon.
The paper is structured as follows. Section 2 elaborates the indices and the input variables. Section 3 explains the proposed data-driven deep-learning approach for the prediction of the indices. Section 4 elaborates the prediction results of the ENSO and EQUINOO. Lastly, Section 5 concludes.

| THE ENSO, EQUINOO AND INPUT CLIMATIC VARIABLES
The paper has focused on two important climatic indices, namely, the ENSO corresponding to El Niño 3.4 (henceforth the "NINO index") over the Pacific Ocean, and the EQUINOO calculated using the outgoing longwave radiation (EQUINOLRCI) over the Indian Ocean. The NINO index is an indicator for La Niña and El Niño events, that is, the cooling and warming phases of the east-central tropical Pacific Ocean (5 N-5 S, 170-120 W). Francis and Gadgil (2013) defined an index (EQUINOLR) as representative of the EQUINOO, considering outgoing long wave radiation (OLR). A correlation co-efficient of 0.77 between the EQWIN (which is the index for the EQUINOO as defined by Gadgil et al., 2004) and the EQUINOLR is shown by Francis and Gadgil (2013). A similar index is also defined in Francis et al. (2007) for the analysis of daily variation in cloudiness in the eastern and western Indian Ocean in the context of triggering of positive Indian Ocean dipole events. The EQUINOLRCI, which considers cloudiness, is calculated from the daily OLR, where the OLR of each grid is subtracted from 200 and summed over the resultant positive OLR. It is performed to obtain a measure of high convective activity. Daily values are further added to obtain monthly ones. These are performed for both western and eastern regions of the Equatorial Indian Ocean. Finally, the index is evaluated as the difference in the resultant OLR of the two regions with normalization (Equation where y and m denote the year and month; and mn(X) and std(X) signify the mean and standard deviation of X. Different climatic variables are considered in order to study their relationship with the NINO and EQUINOLRCI. They are either an individual variable or a combination of variables. Seven climatic variables were considered, namely, air temperature (AT), geopotential height at 200 hPa (HGT), sea level pressure (SLP), u-wind at the surface (UWND), v-wind at the surface (VWND), sea-surface temperature (SST) and OLR. All these variables are considered owing to their importance and influence on the indices. Seven different combinations of variables are also considered: combination of AT and HGT, AT and VWND, HGT and VWND, UWND and VWND, SLP and SST, SLP and UWND, and SLP and VWND. They are also used to identify important predictors for forecasting the NINO and EQUINOLRCI. Climatic variables AT, HGT, UWND, SLP and VWND are accumulated from the reanalysis-derived data of the NCEP (2.5 × 2.5 ) (Kalnay et al., 1996). SST data are obtained from the source, namely the National Oceanic and Atmospheric Administration (NOAA) Extended Reconstructed V3 data (2.0 × 2.0 ) (Smith et al., 2007). The mentioned variables are considered for the period 1958-2016. Finally, the OLR data (2.5 × 2.5 ) are obtained from the NOAA-interpolated OLR (Liebmann and Smith, 1996) for the period 1982-2013 (available till 2013), and from uninterpolated OLR (NOAA/OAR/ ESRL PSD) for 2014-2016. The input climatic variables are considered over the whole world and are post-processed before fitting the autoencoder (see Section 3.1). All the climatic variables are considered at the monthly scale and converted to monthly anomaly data (Equation 2): where climatic variable y m is the variable in the m-th month of the y-th year; and mn(climatic variable m ) is the average of the m-th month over the years.
The NINO is obtained at the monthly scale from the NOAA/PSD for the period 1958-2016. The EQUINOLRCI index is calculated from the daily OLR data from the uninterpolated OLR data (NOAA/OAR/ESRL PSD) using Equation 1.

| METHODOLOGY
The proposed approach to the prediction of the indices comprises two significant steps: first, the identification of potential predictors using a deep-learning-based autoencoder model and, second, the prediction of indices using identified predictors with an ensemble prediction model. The block diagram is shown in Figure 1. The methodology followed is similar to the approach elaborated by Saha et al. (2016), but has significant differences and improvement over that. The work of Saha et al. was designed for the study and prediction of the Indian monsoon, but the current study is focused in a broader sense in which it predicts climatic indices over the Indian and Pacific oceans. The current work also focuses on the study of both linear and nonlinear relationships of the identified predictors with the indices, whereas Saha et al. examined only the linear relationship. The study of nonlinear relationship assists in the study of variation between the potential predictors and the climatic indices. The evaluation measures are more exhaustive in terms of measures such as the confusion matrix, precision, recall and F1 score in order to evaluate the prediction results. The proposed methodology is now briefly described, highlighting the additional aspects.

| Feature learning by autoencoder
Feature learning is an unsupervised step with no prior knowledge of the output indices. A deep-learning-based autoencoder model is used to identify potential predictors. A single-layer autoencoder is an artificial neural network with a single hidden (internal) layer along with the input and output layers. The encoder works from the input to the hidden layer, which learns the nonlinearity of the data, and the decoder works from the hidden to the output layer by reconstructing the input. The input layer is the same as the output and, thus, the hidden layer learns the complex, nonlinear nature of the data.
The inputs to the autoencoder models are the individual climatic variables and their combinations (described in Section 2). All three layers consist of several nodes, which are all connected to the nodes of the subsequent layer. The input to hidden layer node ratio is 100:20, and 100:10 for autoencoders designed for an individual variable and its combinations. The input corresponds to the variable averaged over 10 latitude × 20 longitude. Thus, the number of input nodes to the autoencoder is 324 ((180/10) × (360/20)) for individual variables, and it is 648 (324 + 324) for combined variables (as all combination consists of two individual variables).
Different autoencoders are trained separately for the variables (seven for the seven individual variables and seven for the combinations). The autoencoder designed for all the individual climatic variable (except SST) has 324 nodes in the input layer; the hidden layer has 65 nodes; and the output layer has 324 nodes. Feature learning is performed by training with input variables, where the model tries to minimize the reconstruction errors for its input at the output layer. A nonlinear hyperbolic tangent activation function is considered for the input to the hidden layer. The function used for the hidden to the output layer is linear. The bias and weights are learned during training of the model.
Formally, say node i 2 R n represents the input (n nodes in the input layer), the activation of the neuron in the hidden layer (hidden j ) is shown in Equation 3), where j varies as 1, …, m (where m is the neuron count of the hidden layer).
where func z ð Þ = e 2z − 1 e 2z + 1 represents the hyperbolic tangent function; hidden(node j ) 2 R m is the learned node of the hidden layer; the weight matrix Weight hid has dimensions of (m × n) from the input to the hidden layer; and bias hid 2 R m denotes the bias of the hidden layer.

Oscillation (ENSO) indexes
The activation function for the hidden to the output layer is shown in Equation 4: where gunc(z) = ((a * z) + b) denotes the linear activation function; a and b are the constants; d node 2 R n is the node in the output layer; Weight out is the weight matrix; and bias out 2 R n denotes the bias of the output layer.
The inputs are the individual climatic variables (e.g. AT, SLP, SST) and combinations (e.g. SLP + SST, AT+HGT). The features learned in the hidden layer are representative for the potential predictors of the indices.
The final step for the identification of potential predictors is applying the threshold over the weights learned. A threshold is applied over the weights learned by the edges between the input and hidden nodes. Weights > 2 SD above the mean of all the weights are only considered when evaluating the features of the nodes in the hidden layer. The threshold is ascertained such that at least 10% of input nodes participate in the calculation of features at the hidden node. All these hidden nodes represent the potential predictors.
The predictors are calculated from the features learned at the nodes in the hidden layer after applying the threshold to weights (Equation 5). Equation 5 shows the potential predictor evaluated from the feature learned at the i-th node of the hidden layer, where Weight k:i denotes the weighted edge learned between the k-th node at the input layer and the i-th node at the hidden layer. Input k resembles the k-th node at the input layer, with threshold i representing the ascertained threshold for the i-th node at the hidden layer: 3.2 | Feature ranking and correlation study Feature ranking is performed using both linear and nonlinear correlation studies between the predictors and indices. As the feature learning considers the nonlinearity within the data to identify potential predictors, it is worthy to study the nonlinear relationship between the indices and predictors. The correlation is studied considering a lead of 1-12 months to select the month (having the highest correlation co-efficient) of the predictor with the index. When one considers the cumulative average index for the period June-September, a lead of 1 month signifies the correlation of the predictor in May with the average index (starting in June).
One linear and three nonlinear correlations were considered: 1. Pearson correlation (μ): To study the linear relationship between the predictors and indices: where var1 year month and var2 year month resemble the indices (NINO or EQUINOLRCI) and potential predictor of the month-th month in year-th year; var1 month and var2 month are the mean for the month-th month; and num is the total number of years.
2. Kendall correlation (τ): A nonlinear metric measuring the strength of the dependence between two variables based on the ranks of data: where num concordant and num discordant denote the number of concordant and discordant sample pairs. All values of v i and w i are unique, for (v 1 ,w 1 ),(v 2 ,w 2 ), …, (v n ,w n ) belonging to the set of observations of var1 and var2, respectively. A pair of observations (v i ,w i ) and (v j ,w j ) is concordant if the ranks for both elements agree, that is, if both v i > v j and w i > w j ; or both v i < v j and w i < w j , where i 6 ¼ j. The pair is discordant, otherwise.

Mutual information (MI):
Provides the information about one variable from the other: where p(v1,v2) denotes the joint probability function; and p (v1) and p(v2) represent the marginal probability distribution of var1 and var2. 4. Spearman's rank correlation (ρ): Quantifies the statistical dependence between the ranking of two variables. Formally, ρ for a sample of n size needs to be calculated as follows. First, n raw scores for the variables var1 and var2 are converted to ranks, denoted by rankvar1 and rankvar2.
It changes all the var1 i and var2 i to rankvar1 i and rankvar2 i , respectively, for i = 1, 2, Á Á Án.
where cov(rankvar1, rankvar2) denotes the covariance of the ranked variables var1 and var2; and σ rankVar1 and σ rankVar2 are standard deviations.

| Building predictor sets
Predictors are identified from individual variable and their combinations using the proposed deep-learning method (discussed in Sections 3.1 and 3.2). Four predictor sets, namely, predSet1, predSet2, predSet3 and predSet4, which consist of the top correlated 5, 10, 15 and 20 predictors at their best lead month (the lead month having the highest correlation with the indices), are built. The prediction is provided at a month equal to the least lead of any predictor in the set (e.g. if there are three predictors with the lead month as January, March and April, then the prediction can be provided in April).
For each index, 56 different predictor sets, corresponding to one type of correlation study (four different predictor sets for each of 14 different types of climatic variables, i.e. 4 × 14), were built; these 56 predictor sets were built for all four correlation studies.

| Statistical prediction models
Two different machine-learning-based ensemble models, namely, an ensemble of a regression tree with a bagging technique (RegTreeEns) and a random decision forest model (DecForest), are used. The input to the models is the predictor sets designed and the output are the NINO or EQUINOLRCI indexes. The RegTreeEns model works on the principle of melding prediction from multiple regression tree models (Loh, 2008). These multiple regression tree models are designed or trained using a bagging algorithm, which works by selecting different input training data with replacement. This selection avoids over-fitting of the model. The final prediction of the indices by the RegTreeEns model is provided as a weighted sum of the prediction provided by the individual regression model, where weights are assigned in accordance to the performance of the model over the validation set. The DecForest model is based on an ensemble of decision tress for modelling the regression problem (Liaw and Wiener, 2002). The model depends on the working of the regression tree for building the individual trees of the forest. The number of features selected for building each tree of the forest is random and depends on the decision split for the selection of the features from the predictor set. The model is highly efficient, and the prediction provided by the DecForest model is the weighted average of individual tree models.

| RESULTS AND ANALYSIS
This section focuses on evaluating the proposed approach for the prediction of the NINO and EQUINOLRCI indexes. The prediction is presented for an aggregate period of June-September and for June, July, August, and September, individually. A total of 14 different climatic variables (individual and combined) are considered; the study is based on four types of correlation (Pearson, Kendall, Mutual information and Spearman). Moreover, four predictor sets are built for all the cases. Thus, for both the indexes for a particular time span (an aggregate or individual months), there are 224 sets (14 climatic variables × 4 correlation type × 4 predictor sets) of prediction results. Exhaustive prediction by all the sets is performed; however, the best accuracy prediction for the indices will be presented.

| Pearson correlation
The Pearson correlation is considered a prime evaluation measure. It was used to report the correlation between the predicted and the observed values of indices. The greater the correlation (Equation 10) is closer to 1, the better the closeness of the predicted value to the observed index.
where predicted year and observed year are the predicted and observed indices of year-th test year; predicted and observed are the corresponding mean; and the total number of test years is k.

| Confusion matrix
A confusion matrix (Table 1) with the observed (observations) sign of the index. The positive index (positive) is synonymously mentioned as the positive class, and the negative index (negative) as the negative class. True positive (TrP) denotes the test year count where predicted and observed indices are positive classes; true negative (TrN) denotes the year count where predicted and observed indices are negative classes; false positive (FaP) represents the year count where the predicted index is positive, whereas the observed belongs to a negative class; and false negative (FaN) represents the count where the observed index is positive class, but it is predicted as negative. Other measures are defined as follows: • Sensitivity: Ratio of years correctly predicted as positive class out of the total observed positive classes (TrP/ (TrP + FaN)).
• Specificity: Ratio of years correctly predicted as negative class out of the total observed negative classes (TrN/ (TrN + FaP)). • Precision: Ratio of the positive class that is predicted correctly out of the total number of predicted positive classes (TrP/(TrP + FaP)). • Negative predictive value: Ratio of the negative class that is predicted correctly out of the total number of predicted negative classes (TrN/ (TrN + FaN)). • Accuracy: Ratio of years when it is correctly predicted to be the same as the observed classes ((TrP + TrN)/(TrP + TrN + FaP + FaN)). • F1 score: Harmonic mean of precision and sensitivity ((2 * TrP)/(2 * TrP + FaP + FaN)).

| Predicted and observed NINO
The NINO index for June-September is predicted with high accuracy by predSet4 in May using the DecForest model with predictors of VWND. For this case, the linear correlation, that is, the Pearson correlation, works better than the nonlinear correlations. A correlation co-efficient of 0.87 with p < 0.001 (statistically highly significant) is found between the observed and predicted NINO index. Figure 2 shows the variation of the observed and predicted NINO index during the test period. Most of the test years are close to the 45 line, which signifies a good prediction. The model failed to capture the high NINO value of 2015. Figure 3a  The NINO for June is predicted in May, with predSet3 of SST deduced features, with mutual information (nonlinear) correlation using a RegTreeEns model. For all test years, the positive or negative class is correctly predicted as the observed. The correlation co-efficient between the observed and predicted NINO for June is noted as 0.88 with p < 0.001. The prediction of the NINO for July is best performed with combined variables of SLP + VWND with linear correlation, predicted at 3 months lead in April. A high correlation co-efficient of 0.88 with p < 0.001 is observed using predSet2 with a RegTreeEns model. The predicted index for all test years is observed to be closer to the observed values, except for the clique present for 2015. The NINO for August is predicted in April, with predictors derived by an autoencoder approach from combined variables of SLP + VWND. The predictor set with the top correlated 15 predictors (predSet3) predicts the index with highest accuracy. Spearman correlation works better than the other for predicting the August NINO. The RegTreeEns model performs the NINO index prediction, which is observed to have a correlation co-efficient of 0.84 with p < 0.001. It is observed that for 2010 and 2015, the model could not predict the index variability as the observed index. This signifies improving the model further in order to capture the extremes values more accurately. The combined variable of AT + VWND predicts the NINO for September with a lead of 5 months (April), with better accuracy (correlation co-efficient = 0.88; p < 0.001) compared with other. Spearman correlation is the considered measure for predictor ranking and the predSet3 predicts the index for September using the DecForest ensemble prediction model.

| Evaluation measures for the predicted NINO
The confusion matrices for NINO prediction of June-September, June, July, August and September are presented in Table 2. The confusion matrix highlights the correctly predicted positive and negative NINO classes, which helps at visualizing the efficiency of the model to identify correctly the phase of the index. All the positive and negative indices are predicted correctly for June. For July, all the negative indices are predicted with the correct sign and a single positive index is predicted incorrectly. The NINO index for August has a little less precision as compared with the other months as it incorrectly predicted three from 16 test years. Finally, the 14 NINO indices for September are predicted with the correct sign. Table 3 presents the measures defined in Section 4.1. The accuracy of predicting positive or negative NINO classes for June-September is observed to be 81.2%. The model shows high precision with 0.87 and an F1 score of 0.82, which symbolizes the high performance of the prediction model. The accuracies of predicting positive or negative classes are observed to be 100, 93.7, 81.2 and 87.5% for June, July, August and September, respectively. The NINO index for June is predicted with the highest accuracy and an F1 score = 1. The F1 scores for other 3 months are 0.94, 0.80 and 0.87, respectively. These results verify the strength of the deep-learning approach.

| Prediction of the EQUINOLRCI index
The proposed models are trained for the period 1982-2000 (according to the availability of OLR data from 1982), and predict the EQUINOLRCI index for 2001-2016.

| Predicted and observed EQUINOLRCI
The EQUINOLRCI index is predicted with correlation coefficient of 0.88 with p < 0.001, which is statistically significant for June-September with the predictors for UWND. The prediction of the EQUINOO for June-September is provided at a lead of 7 months in November of the previous year. The top correlated 10 predictors (predSet2) predict the EQUINOLRCI index using Pearson correlation and the RegTreeEns model. The predicted EQUINOLRCI index for June-September is presented against the observed index in Figure 4. The variation of the observed and predicted EQUINOLRCI for individual monsoon months during the period 2001-2016 is presented in Figure 5ad. The EQUINOLRCI for June is predicted in May with the highest accuracy by AT + VWND. The nonlinear Kendall correlation is used for predictor ranking and prediction is given by predSet3 using the DecForest model. A high correlation coefficient of 0.94 with p < 0.001 is noted. The combined F I G U R E 4 Variation in the observed and predicted EQUINOLRCI index for June-September during the test-period 2001-2016 variables of SLP + VWND (predSet4 with Kendall correlation by RegTreeEns) predicts the EQUINOLRCI for the July index with a correlation co-efficient of 0.86 and p < 0.001 in January. The index for August is predicted in May using predictors derived from AT + VWND with predictor ranking using Spearman correlation. The predSet2 predicts the EQUINOLRCI for August using DecForest with a correlation co-efficient of 0.91 and p < 0.001. The EQUINOLRCI for September is predicted in October of the previous year with predSet1 using Spearman correlation. The combined variables of AT + VWND predicts the EQUINOLRCI with a correlation co-efficient of 0.88 and p < 0.001. All results are statistically significant.

| Evaluation measures for the predicted EQUINOLRCI
The confusion matrix for the EQUINOLRCI prediction in terms of a correctly predicted positive or negative index is presented in Table 4. All the positive and negative EQUINOLRCI (June-September) are correctly predicted with the same sign. From 16 test years, 15, 14, 15 and 14 years are correctly predicted as positive or negative EQUINOLRCI indices for individual months. Other measures are presented in Table 5.
The prediction provides an accuracy of 100% in classifying the positive or negative index classes for aggregate monsoon months. The accuracies of predicting positive or negative classes are 93.7, 87.5, 93.7 and 87.5% for June-September, respectively. The F1 scores for these 4 months are 0.92, 0.88, 0.94 and 0.90, respectively.

| Comparisons of the NINO prediction by the proposed and existing models
The NINO prediction by the proposed approach is compared with existing numerical and statistical models. A total of F I G U R E 5 Variation in the observed and predicted EQUINOLRCI index for the individual months of (a) June, (b) July, (c) August, and (d) September during the test period [2001][2002][2003][2004][2005][2006][2007][2008][2009][2010][2011][2012][2013][2014][2015][2016] 13 different NINO prediction models, namely, NASA-GMAO, JMA, LDEO, AUS/POAMA, ECMWF, KMA-SNU, CPC-MRKOV, CDC-LIM, CPC-CA, CPC-CCA, CSU-CLIPR and UBC-NNET, are considered. The NINO predictions by these referred models are obtained from the International Research Institute for Climate and Society (Earth Institute, Columbia University). The NINO prediction during the period 2002-2016 (the available predictions by existing models) was compared for two different periods: June-August and July-September. The proposed deeplearning model predicted the NINO for June-August and July-September in May with correlation co-efficients of 0.91 and 0.87 with the observed. The mean absolute errors for the NINO prediction by the proposed deep-learning approach and other existing models for June-August and July-September are shown in Figure 6a,b, respectively. The deep-learning technique shows lower mean absolute errors in NINO June-August and July-September predictions, as compared with many other models. This is a positive indication for the technique. The proposed approach is noted to be the best for predicting the NINO for June-August. However, the JMA and ECMWF models show a greater precision in predicting the NINO for July-September.
The variations of the observed and predicted NINO by the deep-learning approach and other existing models for June-August and July-September are shown in Figure 7a,b, respectively. The NINO June-August and July-September predictions by the deep-learning technique are observed to follow the phase of the observed NINO indices, even though the magnitudes vary to a certain extent.

| Prediction of the NINO and EQUINOLRCI, 2017-2018
The NINO index for June-September 2017 is predicted as

| CONCLUSIONS
Predictors of the El Niño Southern Oscillation (ENSO) corresponding to El Niño 3.4 (NINO) index and the Equatorial Indian Ocean Oscillation (EQUINOO) calculated using the outgoing longwave radiation (EQUINOLRCI) are identified using a deep-learning method. These predictors are used to forecast the indices using statistical models. Predictors identified by nonlinear correlation ranking are observed to be superior as compared with that by linear correlation. The NINO and EQUINOLRCI indices for June-September are predicted with high correlation co-efficients of 0.87 and 0.88 with p < 0.001 in May and November of the previous year, respectively. Elaborating individually for the months, the NINO prediction was comparable for June, July and September with correlation co-efficients of 0.88, 0.88 and 0.88, respectively. However, the skill was less for August with a correlation co-efficient of 0.84 between the observed and predicted indices. The EQUINOLRCI was predicted with higher skills for individual months, June with a correlation of 0.94 and August with 0.91. Predicting July and September's EQUINOO shows a lower performance with correlations of 0.86 and 0.84, respectively. The NINO predictions for June-August and July-September are also found to be comparable with existing numerical and statistical models.
The future scope of the work includes the use of other deep-learning models such as a convolution neural network and a generalized adversarial network for identifying more complex influencing predictors to improvise the prediction. The other direction can be the use of a long-short-term memory network to study the NINO and EQUINOLRCI indices at finer temporal periods and then opt for their prediction from their past time-series values.