Comprehensive assessment of 12 soft computing approaches for modelling reference evapotranspiration in humid locations

Owing to the lack of lysimetric data in many regions, the standard Penman–Monteith equation‐adopted by Food and Agriculture Organization of the United Nations (FAO56‐PM model) is usually used to calculate the reference evapotranspiration (ETo). However, as this model needs many meteorological parameters that cannot be easily obtained in many regions, other simple models along with the soft computing models are used to obtain ETo values. The paper presents a comprehensive comparison of 12 soft computing models: gene expression programming (GEP), neuro‐fuzzy with grid partitioning (NF‐GP), neuro‐fuzzy with sub‐clustering (NF‐SC), multivariate adaptive regression spline (MARS), boosted regression tree (BT), random forest (RF), model tree (MT), support vector machine (SVM), SVM‐firefly algorithm (SVM‐FA), extreme learning machine (ELM), neural network‐particle swarm optimization (NN‐PSO) and neural network‐differential evolution (NN‐DE), to estimate daily ETo values in humid regions. Therefore, daily meteorological data from two weather stations (during a 12 year period) were used to assess the models. The obtained results revealed that very good efficiency was obtained from all the applied methods. The temperature‐based SVM‐FA (root mean square error (RMSE) = 0.324 mm, Nash–Sutcliffe co‐efficient (NS) = 0.960) and NF‐GP (RMSE = 0.272 mm, NS = 0.974) models generally provided the best accuracy when estimating the ETo of Sari and Bablosar, two humid stations in northern Iran, respectively. The accuracy ranks of the other models (from best to worst) were found to be: NN‐PSO, NF‐SC, ELM, NN‐DE, MARS, GEP, RF, SVM, BT and MT. Among the radiation‐based models, the NF‐GP provided the best accuracy at estimating the ETo of both stations. The other models were ranked as: ELM, SVM‐FA, NN‐DE, NN‐PSO, MARS, RF, BT, NF‐SC, SVM and MT, respectively.


| INTRODUCTION
The reference evapotranspiration (ET o ), as introduced by Allen et al. (1998), denotes the evapotranspiration from a hypothesized reference crop with the height of 12 cm, surface resistance of 70 sÁm -1 and albedo of 0.23. The ET o estimation is needed for the computation of irrigation water requirements, water resources management, hydrology, environmental issues and determination of the water budget, especially under arid conditions where water resources are scare and fresh water is a limited source. Different meteorological parameters affect the ET o , including air temperature, humidity, radiation and wind speed. The accessibility to those data is very difficult in most cases, especially in developing regions. Accordingly, measuring the direct ET o using field observations is a difficult and time-consuming process, and even in some regions it is not possible owing to financial issues. Therefore, many attempts have been carried out so far to use indirect methods for estimating the ET o .
In this respect, some researchers have tried to derive some mathematical/regression-based relationships between the ET o and meteorological parameters. Some of these relationships have found universal applications, for example, the Hargreaves (Hargreaves and Samani, 1985), Priestley-Taylor (Priestley and Taylor, 1972), Makkink (Makkink, 1957), Turc (Turc, 1961) and so forth models. Given a specified climatic conditions, these empirical models can be applied to calculate the ET o using different meteorological parameters. A crucial drawback of such models would be the necessity of a local calibration against lysimetric or standard ET o values at different regions.
Some other developed methods based on a pure physics basis are called combination models, among which the standard Food and Agriculture Organization of the United Nations (FAO's) FAO56-PM model (Allen et al., 1998) has been approved for use in different climatic conditions without the need for local calibration. The model has been commonly used by agronomists, irrigation engineers and other scientists of different disciplines as the standard ET o calculation model (e.g. Cai et al., 2007;Shiri, 2017). However, this model has its own shortcomings owing to its need for many meteorological parameters which are not easily measured factors in general.
Besides mathematical and physical basis methods, a promising indirect alternative to estimate the ET o is by the use of soft computing approaches. On the other hand, various soft computing approaches have been applied by researchers worldwide to estimate the ET o using meteorological parameters. Among others, artificial neural networks (ANN), neuro-fuzzy systems (NF), gene expression programming (GEP), support vector machine (SVM) and model tree (MT) could be mentioned as the most applied techniques in this context (Guven et al., 2008;Guven and Kisi, 2011;Shiri et al., 2014;Karimi et al., 2017;Kiafar et al., 2017;Landeras et al., 2018;Mattar, 2018).
As a brief literature review, Kim and Kim (2008) combined generalized regression neural networks (GRNN) and genetic algorithms to model the alfalfa evapotranspiration in 14 meteorological stations in the Republic of Korea. Wang et al. (2017a) used three types of soft computing techniques (fuzzy genetic (FG), ANFIS with grid partitioning (ANFIS-GP) and M5Tree) for evaporation modelling. Wang et al. (2017b) compared six heuristic computing methods in order to model pan evaporations in China and found the multilayer perception ANN as the superior model in all climatic contexts. Wang et al. (2017c) compared the FG, least square support vector regression (LSSVR), M5Tree, MARS and multi-variate linear regression (MLR) models to analyse climatic factors on the pan evaporation modelling process and introduced the LSSVR and FG models as superior models over the rest. Gavili et al. (2018) applied the ANN, ANFIS and GEP to model ET o in arid and semi-arid climates. Sanikhani et al. (2018) investigated six artificial intelligence models (multi-layer perception neural network (MLPNN), GRNN, radial-basis neural network (RBNN), ANFIS-GP, ANFIS-SC and GEP) to model ET o in arid areas. Keshtegar et al. (2018) used the subset ANFIS, M5 MT and ANN to estimate the daily ET o . In another research, Rezaie-Balf et al. (2019) employed two hybrid models using soft computing techniques (SVM and MT) to forecast monthly pan evaporation.
The overall findings of these reports indicate that the use of soft computing techniques in comparison with other indirect methods for modelling evaporation process is very promising and further studies incorporating these techniques are recommended. In this regard, the current study makes a comprehensive comparison between different data-driven approaches, namely, the GEP, NF with grid partitioning (NF-GP) and subclustering (NF-SC) identification methods, neural networksextreme learning machine (NN-ELM), neural networks-particle swarm optimization (NN-PSO), neural networks-differential evolutions (NN-DE), SVM, SVM-firefly algorithm (SVM-FA), multivariate adaptive regression spline (MARS), random forest (RF), MT and boosted regression tree (BT), using different kinds of learning algorithms to estimate the daily ET o in humid locations. To the best of the authors' knowledge, this is the first study to make comprehensive comparisons between heuristic models in the ET o estimations for Iran.

| Used data
Meteorological inputs, for example, air temperature (T), relative humidity (R H ), solar radiation (R S ) and wind speed (W S ), for a period of 12 years (2001-2012) from two humid stations in northern Iran (Babolsar [latitude 36 43 0 N, longitude 52 39 0 E, elevation −21 m with respect to mean sea level] and Sari [latitude 36 33 0 N, longitude 53 0 0 E, elevation 23 m above mean sea level] stations) were used to construct the applied models. For each station, a k-fold testing technique was used in data-partitioning mode, so at each stage the available patterns that corresponded to one year were considered as the test data, while the data from the other 11 years were used for training. The process was repeated until all available patterns were involved in both training and testing stages (phases). Table 2 summarizes the statistical characteristics of the used daily data set including the mean (X mean ), maximum (X max ), minimum (X min ), standard deviation (SD), co-efficient of variation (C V ) and skewness co-efficient (C SX ). At both stations, the wind speed has high variation (in terms of the C V in Table 1), while the R H values show minimum variations. C V values for the FAO56-PM ET o values in Sari are relatively greater than those of wind speed.
The ET o values calculated here by the FAO56-PM model were considered as benchmark values for calibrating/ assessing the other applied models. As advised by the literature, otherwise the lysimetric data do exist, the application of the FAO56-PM as a benchmark model is common and used in such kinds of studies (Allen et al., 1998): where ET o is the reference evapotranspiration (mmÁday -1 ); Δ is the slope of the saturation vapour pressure function (kPaÁ C -1 ); γ is the psychometric constant (kPaÁ C -1 ); R n is the net radiation (MJÁm -2 Áday); G is the soil heat flux density (MJÁm -2 Áday); T mean is the mean air temperature ( C); W S is the average 24 hr wind speed at 2 m height (mÁs -1 ); e S is the saturation vapour pressure (kPa); and e a is the actual vapour pressure (kPa).
The used data were introduced to the applied models by use of the following two input combinations: T max ,T min ,T mean , R a temperature based ð Þ T max ,T min ,T mean , R a ,R S radiation based ð Þ The following statistical evaluation measures were also applied to assess and compare the models: 1. Co-efficient of determination (R 2 ): 2. Root mean square error (RMSE): 3. Scatter index (SI): 4. Mean absolute error (MAE) 5. Nash-Sutcliffe co-efficient (NS): T A B L E 1 Statistical characteristics of the data set for the studied locations where ET io is the ET value observed at the i-th time step; ET iM is the corresponding simulated ET value; n is the number of time steps; ET o is the mean of the observations (the ET values of the FAO56-PM models); and ET M is the mean of the simulations. The squared Pearson correlation coefficient (R 2 ) only analyses the linear interrelations between the simulations and the benchmark ET o values (Legates and McCabe, 1999), so application of other statistical measures would be necessary to assess modelling accuracy. Meanwhile, the MAE scours the average magnitude of the errors despite their direction. Moreover, the RMSE and its weighted variant (SI) show the average magnitude of the errors by giving more weight to large errors. Finally, the NS compares the ratios between the models' errors and the variance of the observed data, with the optimal value of unity.

| Methods
In general, soft computing techniques consist of four main categories of (1) fuzzy logic, (2) evolutionary computation, (3) machine learning and (4) probabilistic reasoning. The present paper carries out a comprehensive study by considering eight different soft computing models and four different meta-heuristic learning algorithms for modelling the ET o . Table 2 describes the main category and characteristic of the applied models. Moreover, each model's description is briefly presented in the following subsections.

| Gene expression programming (GEP)
The GEP is known as an evolutionary algorithm with the potential to create simulative/predictive models. It uses a parse tree architecture to search solutions and can extract explicit expressions between the input target set of the studied cases by using several operators (Koza, 1992). Like grid partitioning (GP), the GEP uses a similar kind of diagram representation, but it obtains the advantages of expression trees in order to express a genome. In general, similar to the other evolutionary algorithms, the GEP uses populations of individuals, chooses them based on their fitness, chooses the best controlling expressions using fitness magnitudes and then conducts genetic variation by exploiting a unique or multiple genetic operators (Ferreira, 2006). One of the supremacies of the GP (i.e. GEP) is in bearing the direct formulation of the input target connections of a specified phenomenon. For more detailed information on the modelling procedure of the GEP, see, for example, Ferreira (2006).

| Neuro-fuzzy systems (NF-SC and NF-GP)
Getting the benefit of both supervised neural networks and fuzzy inference system (FIS) and integrating them would create a robust and efficient artificial intelligence model called a neuro-fuzzy system (also known as an adaptive neuro-fuzzy system). In the NF system, the adjusting parameters of the FIS will be identified with the aid of the neural networks. In classic NF systems, the IF-THEN rules of the FIS are optimized via the hybrid optimization algorithm, which is a combination of the back-propagation gradient descent error and least squares error approaches. Different FIS types such as Mamdani and Sugeno can be used for an NF system. In the present study, the Sugeno FIS is used to derive the values in the output vector (target parameter) from the input vector (Takagi and Sugeno, 1985). In order to set up the model to map the relationships between the input and vector outputs, the GP identification and sub-clustering identification methods were used. For more detailed information on the NF, see Jang (1993).

| Artificial neural networks (ANNs)
The ANNs, which are inspired by biological neural networks, are fundamentally parallel information-handling systems. They are known to be capable and flexible models for the identification of patterns in data and simulating/ predicting nonlinear phenomena. Several types of ANNs have been introduced and successfully applied to date. In the following, three types of applied ANNs with different learning algorithms, including the ELM, particle swarm optimization (PSO) and differential evolution (DE), are described.

| Neural networks-extreme learning machine (NN-ELM)
The ELMs are categorized as learning algorithms for feedforward neural networks, which can reduce the training time of simulation and provide more accurate outcomes (Huang et al., 2003(Huang et al., , 2004Shiri et al., 2016). The ELMs can be constructed with a single or multiple layers of hidden nodes, but unlike a standard feed-forward neural network, in the ELM the parameters of hidden nodes need not be tuned. Actually, the hidden nodes can be randomly assigned and never adjusted. Since the output weights of hidden nodes in the ELM are usually learned in a single step of a linear model, it acts faster than a traditional feed-forward neural network. Fort a complete description of the ELM, see, for example, Liang et al. (2006) and Singh and Balasundaram (2007).

| Neural networks-particle swarm optimization (NN-PSO)
In addition to standard mathematical learning algorithms such as gradient descent, the feed-forward neural network can also be trained using different meta-heuristic methods such as the PSO algorithm. The PSO algorithm is basically inspired by the behaviour of animals' movement in groups (e.g. schools of fish or flocks of bird), which includes a group of particles that purify their knowledge of the search space. Any particle has two main characteristics related to its position and speed (Kennedy and Eberhart, 1995). An iterative method is employed by the PSO to obtain the optimal solutions based on the fitness magnitudes of any particle that can be identified by using an optimization function. Each particle modifies its own path through following two pieces of information: (1) the best visited position (Pbest) and (2) the global extremum achieved by the species (Gbest) (Assareh et al., 2010). Each particle is speeded up at any generation in respect to the past Pbest and the Gbest locations of the particle. A new speed is computed for each particle based on its current speed and its distance from the past Pbest and Gbest. The renovated speed is then used to compute the next location of the particle in the search space. This iterative action is continued a set number of times, or till a minimal error is obtained. For more details on the PSO algorithm, see, for example, Kennedy and Eberhart (1995) and Shi and Eberhart (1998).

| Neural networks-differential evolution (NN-DE)
Besides the PSO algorithm, other types of meta-heuristic techniques such as the DE technique can be coupled to neural networks. The DE is a population-based stochastic search approach used to solve continuous optimization problems. The DE algorithm includes three main operators, that is, mutation, crossover and selection. Mutation is the simplest genetic operator and overturns bits in a binary string genome from 0 to 1, or vice versa, randomly. It upgrades the algorithm by introducing new solutions that do not exist in the population. Once the algorithm is started, this operation is used with respect to each individual target x i (t). A mutant vector V i (t) is then identified by the present population through this relation (Storn, 1996): where i 1 , i 2 and i 3 denote randomly selected indices within the range {1, 2, …, NP}. A crossover operation-the procedure of varying the DNA of chromosomes by altering some of their sections-is applied once the mutation is finished. This operator applies to each x ij (t) (pair of the target vector) and its related mutant vector v ij (t) to produce u ij (t) as an offspring vector: Even when P r = null, at least one of the offspring's parameters will be different than the parent (as a result of forcing j = r). Several individuals from the existing generation are chosen (based on fitness) to reproduce a new generation. Therefore, the fitter an individual is, the more likely it is to be selected. A weak individual still has a chance to be selected, and this helps to keep the diversity of the population high. The trial vector U i (t) = {u i1 (t), u i2 (t), …, u iD (t)} is going to be chosen as a member of the population, including the next generation after being compared with the corresponding target vector x i (t). The selection operator for the next target vector is now presented: These steps are repeated until a prespecified stopping criterion is satisfied .

| Support vector machines (SVM and SVM-FA)
The SVMs are supervised learning algorithms that can be used for classification purposes and regression analysis (Kisi et al., 2015). The SVM formulation consists of the structural risk-minimization basis that minimizes an upper bound on the expected risk (Vapnik et al., 1997;Gunn, 1998). The basic impetus of the SVM is carrying out data correlation through nonlinear mapping. The SVMs have the ability to perform efficiently a nonlinear classification/modelling using the kernel method (e.g. polynomial, string, basis kernel function (RBF)). The SVM chooses the kernel functions that implicitly transform the original data to a higher dimensional feature space. Among different kernel functions of the SVM, the RBF has shown superiority over the others, as reported previously (e.g. Rajasekaran et al., 2008;Yang, 2009;Kisi et al., 2015). It works with the solution of a set of linear equations in its training phase (Shamshirband et al., 2014). Given this superiority, the RBF kernel function was used in the current study for the SVM-based estimation of the ET o . The general form of this kernel might be considered as: where x i and x j are the input space vectors; and the gamma (γ) parameter is the inverse of the standard deviation of the RBF kernel. Several algorithms can be used to find the optimum gamma. In addition to the standard version of the SVM, the present study also applied the firefly algorithm (FA) for training processes.
The concept of luminance production by fireflies leads to the development of algorithms in order to solve different optimization issues (Yang, 2009). With reference to Fister et al. (2013) and , the FA is more efficient at finding local and global optima. The main goal in the establishment of FA is the formulation of the objective function and the variation of light intensity. For instance, in the optimal design issue comprising the maximization of an objective function, the fitness function would be proportional to brightness (the amount of light emitted by the firefly). Therefore, any reduction in the light intensity caused by the distance between fireflies will result in variations of intensity and reduce the attractiveness among them (Kisi et al., 2015). For detailed information about the FA algorithm, see, for example, Yang (2009).

| Multivariate adaptive regression spline (MARS)
The MARS is a non-parametric regression model that is regarded as an accompaniment to linear models that automatically simulates the nonlinearities and interactions between parameters (Friedman, 1991). Its algorithm comprises a forward and a backward stepwise plans. In the forward stepwise plan, the intemperate forward stepwise choosing plan makes a complex and over-trained model after several splits (Andres et al., 2010), which have a lower performance accuracy. Therefore, the backward stepwise plan ignores the non-obligatory variables that have been previously selected. This procedure projects variable X to variable Y by using the following two basic functions: a knot or the value of a variable that defines a conjunction point according to the input range (Sharda et al., 2006;Adamowski et al., 2012). The MARS uses piecewise linear basis functions with the forms (xt) + and (tx) + . The suffix "+" stands for the positive part only. Therefore: The general form of the MARS equation reads: The model is a weighted sum of function B i (x). The c i shows the constant co-efficients that can be computed by minimizing the residual sum of squares in a standard linear regression procedure. The co-efficients might be assumed as weights that depict the importance of each variable.

| Random forest (RF)
RFs (also known as random decision forests) are categorized as group-learning algorithms that manage high-dimensional regression problems. An RF is formed as some regression trees based on a collection of random variables. An algorithm in which each tree is selected randomly from the variables is developed to build and grow the trees. In other words, in the RF the training data are taken as random samples. The final decision is obtained by averaging the outputs of the fitted single trees (Breiman, 2001;Hastie et al., 2009).

| Model tree (MT)
Introduced by Quinlan (1992), the M5 MT, which is structured based on binary decision trees, has some linear regression functions at the leaves (terminals). Two distinct stages are considered when constructing the M5 MT. In the first stage the data are split and categorized into several subsets according to the standard deviation reduction (SDR) function at the nodes and form the tree.
Based on the domain-splitting criterion, several approaches such as the M5 model have been frequently employed to generalize an MT technique. The splitting process might create a large treelike structure, which can cause poor performance or overfitting of the model. Therefore, at the second stage, the overgrown tree will be pruned. Finally, the linear regression functions will be substituted in place of the subtrees (Wang and Witten, 1996;Quinlan, 1992). For more details on the MT, see, for example, Goyal andOjha (2011), Goyal (2014) and Mesbahi et al. (2017).

| Boosted regression tree (BT)
The BT is known as data-mining model that uses the integration of two modelling techniques of (1) gradient boosting and (2) classification and regression tree. The boosting process is used in the BT to improve model accuracy by averaging several rules to a single one with the most accuracy. In regression trees, decision rules are set according to the relationship between the input and output variables (Mousavi et al., 2017). Although the general procedure of the BT is complex, it can be applied to various simulation/prediction problems and have better simulation/prediction accuracy than the most conventional models (Elith et al., 2008;Toming et al., 2016). For detailed information about the basic theory of the BT, see Freidman et al. (2000) and França and Cabral (2015).

| RESULTS AND DISCUSSION
In the present study, temperature-and radiation-based models were developed using the GEP, NF-GP, NF-SC, MARS, BT, RF, MT, SVM, SVM-FA, ELM, NN-PSO and NN-DE methods and the results were compared with each other with respect to the R 2 , RMSE, SI, MAE and NS criteria. The indicators shown in Tables 3 and 4 are referred to as the complete data set. For example, the outcomes of each stage of k-fold testing (here, 12-fold per station and input combination) were merged and a global matrix T A B L E 4 Accuracy ranks of the applied models containing the simulations of all stages was built to compute the error statistics. Instead, a weighted average of the indicators of each test stage (year) might be taken to produce the global error statistics. However, as the weights of test stages are the same in the present case (each test stage is one year), the weights when averaging would be same for all stages. Nevertheless, combining the simulations of all test stages to produce the global statistical indices would provide a thorough insight into the models' global performance accuracy, as discussed by Shiri et al. (2014). Table 3  In the applications, 12 different methods and five different comparison criteria were used. Therefore, the accuracy ranks of the applied models with respect to the applied criteria are also provided in Table 4 in order to see the differences better. According to Table 4, the SVM-FA model outperforms the other models and its accuracy is respectively followed by the ELM, NF-GP, NN-PSO, MARS, NF-SC or NN-DE, or RF, SVM, BT, GEP and MT for Sari station. In Babolsar station, however, the NF-GP has the best accuracy and it is  Table 4 shows that the sum of the averages was prepared to give an idea about the best model(s) for both stations. According to the total averages, the NF-GP has the best rank, closely followed by the SVM-FA and NN-PSO methods. The NF-SC, ELM and NN-DE are also good candidates when modelling the ET o of both stations using only temperature data compared with the MARS, GEP, RF, SVM, BT and MT methods. The global statistical indices of the radiation-based models and corresponding ranks are given in the lower parts of Tables 3 and 4, respectively. Table 3 Table 4, the NF-GP has the best accuracy in modelling the ET o of Sari station, followed by the SVM-FA, ELM, NN-PSO, NF-SC, MARS, GEP, RF, SVM, BT, NN-DE and MT. At Babolsar station, the NF-GP also outperforms the other models, and it is followed by the GEP, NN-DE, MARS, NN-PSO, ELM, RF, BT, SVM-FA, MT, NF-SC and SVM. Total averages indicate that the NF-GP has the best accuracy in modelling the ET o of both F I G U R E 2 Scatterplots of the observed versus simulated reference evapotranspiration (ET o ) values for temperature-based models at Sari station, northern Iran stations using temperature and solar radiation data. It can be seen from Tables 3 and 4 that each method shows different accuracies for different data sets. For example, the temperature-based NF-GP, GEP and NF-SC or NN-DE models have the first, second and third best accuracies in modelling the ET o of Babolsar station, while they are in third, ninth and second or seventh places for Sari station, respectively. Moreover, the radiation-based NF-GP, GEP and NN-DE models have the first, second and third ranks for Babolsar, while their accuracy ranks are first, seventh and 11th for Sari station, respectively. The main reason for this might be the basic assumptions and learning algorithms used for each method, as also summarized in Table 2. This confirms the statement: "In [the] application of machine learning algorithms to hydrological or hydrogeological data, no algorithm is superior to all problems and periods of data" (Bardossy and Singh, 2008). However, the NF-GP method can be suggested to model the ET o of the Babolsar station using only temperature or temperature and radiation data as inputs, while the SVM-FA, NF-GP and ELM methods are good alternatives for Sari station. The ET o estimates of the applied temperature-and radiation-based models are illustrated in Figures 1-4 for both stations. From the graphs, it is obvious that radiation-based models have less scattered estimates than the temperature-based models, which indicates the effect of R S data when modelling the F I G U R E 3 Scatterplots of the observed versus simulated reference evapotranspiration (ET o ) values for radiation-based models at Babolsar station, northern Iran ET o . For Babolsar station, the slope and bias co-efficients of the fitted line equations are closer to 1 and 0, with a higher R 2 values for the radiation-based GEP and NN-DE, which were selected as the best models compared with the corresponding temperature-based models. Also, for Sari station, the difference between the temperature-and radiationbased models can be clearly observed for the SVM-FA, NF-GP and ELM, which were selected as the best models for this station.
Overall, all the applied models showed very good efficiency (NS > 0.900) in modelling the ET o (Moriasi et al., 2007). Most of the models provided better accuracy at Babolsar station compared with Sari (Tables 3 and 4 and Figures 1-4). The reason may be the differences in the variation and distribution of the data sets used. It is clear from Table 1 that the T data have a higher variation at Sari than at Babolsar. Moreover, the R S and ET o of Sari station have more skewed distributions than those of Babolsar (see the C SX values in Table 1).
Summarizing, it might be stated that the applied models showed good capabilities for modelling the ET o in both studied locations, although there were obvious differences among the performance accuracies of the models in some cases that should be taken into consideration. Moreover, it should be noted that the present comprehensive comparison was made using data from two humid locations, where the variations in the ET o magnitudes would be less than those belonging to arid regions. Therefore, these specific outcomes might not warrant the promising applicability of these models in any arid regions, where separate studies should be F I G U R E 4 Scatterplots of the observed versus simulated reference evapotranspiration (ET o ) values for radiation-based models at Sari station, northern Iran carried out to check and confirm those outcomes. Nevertheless, only two weather stations were considered in the present study using only a local assessment of the models because the computational cost of using data from more stations as well as regional assessment might not be assumable there. This point might be strengthened in future studies using more stations along with a regional assessment of the models (but by using fewer models than in the present case). Finally, the presented assessments were totally based on a basic common assumption of using the FAO56-PM model as the benchmark for evaluating the other models, so altering the benchmark with lysimetric data might change some specific outcomes obtained here, as discussed by Marti et al. (2015). However, as advised by the literature, otherwise the lysimetric data do exist, the application of the FAO56-PM as a benchmark model is common and true in such kinds of studies.

| CONCLUSIONS
The present paper assesses the capabilities of 12 soft computing approaches for estimating daily reference evapotranspiration (ET o ) in humid locations. Using daily data from two humid stations in northern Iran covering a period of 12 years, the k-fold testing data-assigning method was adopted to assess the applied models at each location. Based on the performed analysis, following conclusions were reached: • All the applied models showed good capabilities (generally Nash-Sutcliffe co-efficient (NS) ≥ 900) when obtaining the ET o values in the studied locations. • Among the temperature-based models, the support vector machine with firefly algorithm (SVM-FA) (root mean square error (RMSE) = 0.324 mm, scatter index (SI) = 0.210, mean absolute error (MAE) = 0.225 mm, co-efficient of determination (R 2 ) = 0.960, NS = 0.960) and neuro-fuzzy-grid partitioning (RMSE = 0.272 mm, SI = 0.100, MAE = 0.203 mm, R 2 = 0.973, NS = 0.974) models provided the best accuracy when estimating the ET o of Sari and Babolsar stations, respectively. • The great fluctuation of the methods' accuracy with respect to the test data indicated the necessity of k-fold testing in order to evaluate better the various data-driven methods in estimating the ET o . • The outcomes of the present study will encourage more comparative evaluations of soft computing models in other regions with different climatic contexts in order to strengthen the conclusions.