This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
An Artificial Neural Network including a Radial Basis Function (RBF) and a Time Delay Neural Network (TDNN) was used to predict total dissolved solid (TDS) in the river Zayanderud. Water quality parameters in the river for ten years, 2001â€“2010, were prepared from data monitored by the Isfahan Regional Water Authority. A factor analysis was applied to select the inputs of water quality parameters, which obtained total hardness, bicarbonate, chloride and calcium. Input data to the neural networks were pH, Na+, Mg2+, Carbonate (CO3âˆ’2), HCO3âˆ’1, Clâˆ’, Ca2+ and Total hardness. For learning process 5-fold cross validation were applied. In the best situation, the TDNN contained 2 hidden layers of 15 neurons in each of the layers and the RBF had one hidden layer with 100 neurons. The Mean Squared Error and the Mean Bias Error for the TDNN during the training process were 0.0006 and 0.0603 and for the RBF neural network the mentioned errors were 0.0001 and 0.0006, respectively. In the RBF, the coefficient of determination (R2) and the index of agreement (IA) between the observed data and predicted data were 0.997 and 0.999, respectively. In the TDNN, the R2 and the IA between the actual and predicted data were 0.957 and 0.985, respectively. The results of sensitivity illustrated that Ca2+ and SO42âˆ’ parameters had the highest effect on the TDS prediction.
The river Zayanderud is the life of Isfahan province. Therefore, protecting the water quality of the river for drinking, agriculture and industry is vital. The first step in the proper and sustainable management of water resources is to analyze water quality, changes in the time and place and to identify the main sources and types of water pollutants. The main sources of pollution in river Zayanderud are agricultural land uses, domestic and industrial wastewater. Agricultural lands provide major cations, anions, nitrogen and phosphorus to the River. Domestic sewage and industry add pollutants such as phosphorus and heavy metals to the pollution in the river .
Salinity in the surface waters is a significant issue of concern in various agricultural purposes and domestic consumption. The amount of total dissolved solid (TDS) is as an indicator of salinity and determining the changes that occurred over time are very important to the planning and management of water usage. The Riverâ€™s salinity is a problem of great importance and sensitivity in all Iranian Rivers and can be caused by several factors, for example, minerals in river water and catchment soils that contain both suspended and soluble particles .
Interpretation of water quality is a very important part of water quality management. Several methods are available to analyze water quality data. Maier and Dandy  used the Artificial Neural Networks (ANNs) to predict salinity in the Murray River in South Australia. Zhang et al.  applied the ANN to predict water quality in the North Saskatchewan River. Huang and Foo  used the ANN for assessing variations in salinity in the Apalachicola River in Florida. Misaghi and Mohammadi  studied the water quality of the river Zayanderud, Iran, using an ANN. They employed a Generalized Regression Neural Network (GRNN) to predict the Biochemical oxygen demand (BOD) and the Dissolved oxygen (DO) of the river. Kanani et al.  used a Multilayer Perceptron (MLP) and an Input Delay Neural Network (IDNN) to predict TDS in the Ajichay River. Their results indicated a good performance and acceptable accuracy to predict salinity in the river. Asadollahfardi et al.  applied a MLP and a recurrent neural network (RNN) to predict TDS in the Talkheh Rud River, Iran. They reported that the results of the RNN had a good agreement to the field monitoring. Nemati et al.  studied the salinity of the Siminrud River in Iran using an ANN. They concluded that magnesium had a high impact in predicting salinity.
The objective of our study was to predict TDS in the river Zayanderud, Iran, using a Radial Basis Function (RBF) and a Time Delay Neural Network (TDNN). We also applied a sensitivity analysis to find the effect of each parameter in predicting of TDS.
1.1. Study Area
The river Zayanderud basin includes the southwestern region of Iran located between 31Â°30â€²N and 33Â°32â€²N and 49Â°30â€²E and 49Â°52â€²E. The area includes four cities including Shahrekord, Frieden, Lenjan and Isfahan and covers part of the Chahar Mahal Bakhtiari province (1.7% of the total area). Fig. 1 indicates the study area . Average annual rainfall varies from 1,600 mm in the Zard Kuh Mountains to less than 40 mm in the eastern regions of Isfahan .
Generally, the amount of rainfall in the catchment area of the river Zayanderud decreases from West to East. The mean annual air temperature in the northwestern highlands reaches 3.5Â°C and in the eastern parts of the central region temperatures can reach 21.5Â°C. The relative humidity in January is at its highest and at its lowest in July. From a geological perspective, the studied area consists of three main geological zones: Zagros, Sanandaj-Sirjan and Central Iran. Virtually each of these zones has affected the area according to their specific characteristics. Jurassic metamorphic and sedimentary rock and new Quaternary alluvium are the most abundant constituents of river rocks in the area. Two aspects of fine-grained rock (sedimentary and metamorphic) have great importance. First, the geochemistry of the rock is partly effective in increasing of natural concentrations of minerals in the river Zayanderud, and second erodibility has an important role in increasing fine-grained particles in the river. In general, fine-grained particles cause an uptake of potentially toxic elements in sediments in the river bed due to a high adsorption capacity. The Iranian Central region, including the river Zayanderud basin from the western highlands to the eastern regions consists of areas with mild, cool and dry summers, areas with cold winters and very hot and dry summers.
2. Materials and Methods
We used monthly water quality data from the river Zayanderud from 2001 to 2010, about 120 data for each variable. The data were monitored by the Isfahan Regional Water Authority and the Department of Environment (Iran). Table 1 presents the summary of statistical data in the Mosian monitoring station.
Selection of suitable input parameters for an artificial neural network is a very important step. One of the techniques to identify the relationship between water quality parameters is a factor analysis method.
2.1. Factor Analysis
A factor analysis is a technique for reducing the input parameters to the artificial neural network and finding a more effective method to predict TDS parameter. Factor analysis variables (water quality parameter) are located in the factors, so that the first factor decreases the next variance factor. The variables that are located in the first factor are the most effective ones. To perform a factor analysis, we applied SPSS version 21 (2014). Before performing a factor analysis, we must ensure that we apply appropriate variables of the factor analysis. For this purpose, we used a Kaiser-Meyer-Olkin (KMO) index, Eq. (1) .
Where rij is the correlation coefficient of indicator xi and indicator of xj;aij is the offset correlation coefficient of index xi and indicator xj. KMO values of close to one indicate that the correlation between pairs of variables can be explained by other variables. Therefore, justifying the application of variable factor analysis is provable. The following steps should be carried out for factor analysis.
Creating a matrix of correlations between the water quality parameters which is a square matrix of correlation coefficients.
Determining KMO to demonstrate the suitability of factor analysis.
Factors should be partially rotated around the origin, to obtain a new position.
Finally, the number of factors equal of the correlation matrix that is considered to be greater than one [11â€“15].
2.2. ANN Modeling
An ANN is a computing technique to help the learning process and tries to map the input space (input layer) and a favorable environment (output layer) by using processors called neurons by identifying inherent relationships between data . A hidden layer receives data from the input layer, processes it and sends it to the output layer. Each network receives training through examples. Network learning is carried out when the connection weight between the layers change in a way that the difference between the predicted and the measured values are within acceptable limits. Achievement to this condition fulfills the learning process. This expresses the weight of memory and network knowledge. Trained neural networks can be used to predict outputs corresponding to actual new data . Due to the structure of ANNs, features such as high-speed processing, template learning ability by a template pattern method, generalization of ability after learning, flexibility against unexpected failures and lack of significant disruption on the part of the connection are due to network weight distribution .
2.3. Architecture of the Network
We used two hidden layers of TDNNs with a sigmoid tangent transfer function and linear output. The number of neural in the hidden layer can be determined by the output component. The only lasting selection in network architecture would be the number of hidden layers which we selected according to a minimum error between one, two and three hidden layer neural network. To determine the number of neural in the hidden layer, the method of trial and error was applied to arrive at best network. Another point is the simplicity of the network. Between two alternatives, the one which has less neurons in the hidden layers was selected for TDNNs to avoid network complication. For adequacy of model during training, Mean Bias Error (MBE) and Mean Squared Error (MSE) were used. The MBE indicates the adequacy of the model. If MBE equal to zero; our model is adequate and if MBE less than zero, the model is underestimated. The MBE greater than zero shows overestimating of the network. MBE is a tool to prevent overestimating of neural network. In the training process, the weights and biases were adjusted using momentum methods to minimize the network performance, and the performance was evaluated with MSE between the network outputs and the target outputs. If the calculated MSE found small enough and stable at the end of each learning epoch, by adjusting the learning rate, epochs, number of hidden layers and neurons, the parameter set is determined and a post-process is carried out.
In learning parameters of a prediction function and testing, a model would just repeat the labels of the samples that it has just seen would have a perfect score but would fail to predict anything useful on yet-unseen data. This situation is called overfitting. To avoid it, k-fold cross validation is common practice. In this method, the original sample is randomly split into k almost equal sized subsamples. A model is trained using k-1 of the folds as training data; the resulting model is validated on the remaining part of the data as testing data for calculating the accuracy of the model. The cross-validation process is then repeated k times, with each of the k subsamples used exactly once as the validation data. The k results from the folds can then be averaged (or otherwise combined) to produce a single estimation. This approach can be computationally expensive, but does not waste too much data, which is a major advantage in problem such as inverse inference where the number of samples is small.
As usual, the true error is estimated as the average error rate on test examples, Eq. (2).
Where N is the number of subsamples.
2.4. The TDNN Network
A time delay neural network is a multilayer neural network that is able to deal with the dynamic nature of sample data and to hold input signals. A TDNN consists of three layers whose weights are coupled with time delay cells. Each cellâ€™s transfer function has a TDNN Sigmoid tangent function and a D (N + 1) weighted input. A TDNN is a dynamic network output, which depends on the networkâ€™s previous inputs and outputs in addition to its current inputs. Since networks have dynamic memories, they can be used for learning time-varying, sequential patterns. A TDNN operator receives an input signal and keeps it for a time step and in the next time step, the input signal emerges as an output result. By connecting an N series to a TDNN operation, a Tapped Delay Line (TDL) will be obtained. The output is a vector with N + 1 component. The N + 1 components include the inputs in the current time step and the previous N time steps . The present study evaluated the modeling of a new TDNN network with a training function trainlm to predict TDS variation trends. We applied a new TDNN function network, a sigmoid tangent transfer function, a training trainlm function and two hidden layers to predict TDS using 10 years of data.
2.5. RBF Neural Network
The Gaussian RBF neural network is a non-normalized form of a Gaussian distribution nonlinear function and has good features for enhanced learning. Gaussian neural networks that used for complex mapping can also learn, identify, synchronize and to control, nonlinear dynamic systems .
An RBF network is naturally derived from an interpolation problem. The RBF has a non-linear input layer and a Gaussian hidden layer. Fig. 2 depicts a view of an RBF neural network.
According to Fig. 2, an RBF neural network input layer is directly connected to the hidden layer. The output of the j-th hidden layer is obtained from Eq. (3) .
Where hj is the output of the j-th neuron; Ï• is a nonlinear function of RBF; X is an input vector; cj is a neuron center and Î´j is the neuronâ€™s central span. Nonlinear function is due to Î¦ functions. Neurons have a linear function in the output layer and the output of yk in neuron k in the output layer is obtained from Eq. (4) :
Where Wkj is a synaptic weight connecting of j-th of hidden layer and neuron k of output layer and m is the number of hidden layer neurons.
We used a newrb function network and a radbas transfer function. The number of neurons in the network can be added sequentially and continue until MSE approaches the set target.
2.6. Model Efficiency
To compare different prediction results, forecast errors of different periods need to be considered to be regarded as benchmark comparisons. Among these criteria, we used the root mean squared error(RMSE)(Eq. (5)). An MBE is used to calculate the adequacy of the model. (Eq. (5)) .
Where N is the number of data; At is the actual data and Ft is the predicted data.
Also the coefficient of determination (R2) and the Index of Agreement (IA) indicate the reliability of the model . R2 and IA can be illustrated as follows:
Where Ä€ and FÌ„ are the means of the actual data and the predicted data, respectively.
3. Results and Discussion
The first step is to identify the parameters which contribute to the prediction of TDS. We used factor analysis to reach this objective. The first stage in factor analysis is to construct the correlation parameter matrix. If the correlation coefficients are less than 0.3, using factor analysis is questionable [13, 23]. However, according to Table 2, many of the correlation coefficient between the water quality parameters in the Mosian Station are larger than 0.3.
According to Table 3, the data are suitable for factor analysis because the KMO index of all stations is greater than 0.5 . Bartlettâ€™s test results indicate that when the p value less than 0.05, a null hypothesis is confirmed, and a significant correlation exists between the variables.
Table 4 describes the eigenvalues for the factor analysis of hydro chemical data for the Mosian Station. To determine the number of factors, we selected the eigenvalues which were bigger than 1 .
Fig. 3 indicates the screen plot, in which the horizontal axis determines the factor number and the vertical axis presents the eigenvalues. As presented in Fig. 3, the eigenvalues are in descending tradition and a sudden drop between eigenvalue 1 and eigenvalue 2 confirms the existence of at least two of the main factors. In general, factors with a steep slope are most helpful in analysis and factors with a low slope have less impact on the analysis.
The first three factors include 80.63% of the total variance (Table 4). Table 5 illustrates the results of rotated factor loading using a Varimax method. We used the result of factor analysis to select the proper input to the ANN. As indicated in Table 5, the first factor is total hardness (TH), bicarbonate (HCO3âˆ’1), chloride (Clâˆ’) and calcium (Ca2+), which are the most important parameters in water quality of the river Zayanderud. We selected the mentioned parameters as input parameters to the ANN.
3.1. The ANN Results
Input data for training the ANN are based on the results of factor analysis. We selected pH, Na+, Mg2+, carbonate parameters and the results of factor analysis, which were HCO3âˆ’1, Clâˆ’, Ca2+ and TH as input data to the ANN. In the sensitivity analysis the accuracy of the factor analysis will be examined. To avoid overfitting, in this study, we used 6-fold cross validation to compute the true error estimation. We applied six sets of data, in which the testing data were changed. The results of each five subsamples validating for both TDNN and RBF networks are listed in the Table 6.
As indicated in table 6, the second set had the best performance. The true RMSE acquires from the average of the each subsamples error. The true RMSE were 0.843 and 0.516 for TDNN and RBF network, respectively, which proves the accuracy of the RBF is better than TDNN model. The results of the two RBF and TDNN models are as follows:
3.2. The Time Delay Neural Network (TDNN) Results
Fig. 4 presents the MSE between the actual and the simulated data during the modeling process in the TDNN method for training, validation and testing. Changing the rate of the error after epoch 6 is negligible. Therefore, we stopped the training process at epoch 11. The TDNN contains 2 hidden layers and 15 neurons in each layer. For TDNNs, we reached a minimum error when 2 hidden layers and 15 neurons in each layer were applied.
As indicated in Fig. 5, the coefficient of determination (R2) and the IA between the predicted TDS and the observed data were 0.957 and 0.986 which means the accuracy of the model in predicting TDS parameters was acceptable.
3.3. The RBF Neural Network Results
As illustrated in Fig. 6, the amount of errors decreased sharply at first and then gradually declined until the amount of error approach zero during the training process of the network. The amount of errors after Epoch 90 was approximately the same for, validation and testing. As indicated in Fig. 6, using one hidden layer with 100 neurons, we reached an MSE equal to 0.0001, after which training was stopped.
Fig. 7. presents the actual and the predicted data of the TDS. As illustrated in Fig. 7, the R2 between the predicted data and the observed data for TDS in the Mosian station was 0.997 and the IA was 0.999.
Table 7 indicates the MSE, the MBE and the RMSE for the TDNN and RBF. As described in the Table 7, the amount of errors in the RBF is lower than in the TDNN. Using the RBF neural network is more acceptable than using a TDNN to predict the TDS of the river. If we compared the R2 of TDNNs with RBF neural network, RBF predicting of TDS indicates more accurate than TDNNs (Fig. 5, Fig. 7).
The prediction of TDS concentrations in the river may be beneficial for water quality management to make proper decisions in using river water for irrigation.
A similar research for predicting of TDS in the Ajichay River, Iran, was carried out by Kanani et al. (2008) using MLP and TDNN methods in one station. The R2 between the observed and the predicted TDS concentrations was 0.859 and 0.949, respectively. Their input data to the model was only the amount of flow. However, in our study, eight parameters including pH, Na+, Mg2+, Carbonat(CO3âˆ’2), HCO3âˆ’1, Clâˆ’, Ca2+ and TH were used as input parameters to the ANN. We also carried out a sensitivity analysis to determine the roles of each input parameter in predicting of TDS in the river. Asadollahfardi et al.  predicted the TDS of the Talkherud River, Iran, using an MLP and ELMAN methods. They studied two stations and the R2 between the observed and the predicted TDS concentrations was 0.964 and 0.96, respectively. Their input data was rate of flow. The R2 in our study is larger than their study and the input parameter for their work was only the amount of flow in the river. Nemati et al.  applied an ANN to predict TDS of the Siminehrud River, Iran. Its R2 was 0.841. The R2 of our study was 0.999 and for selecting of suitable input parameters, we applied factor analysis.
3.4. Sensitivity Analysis
Sensitivity analysis is a method to assess the importance of each input parameter on the concentration of the output parameter. We declined and increased each of the input parameters 20%, while the other input parameter data were kept unchanged. After that, the impact of each parameter in prediction of the TDS concentrations was identified.
Fig. 8 presents the results of sensitivity analysis, using TDNN network. As indicated in Fig. 8, Ca2+ and SO42âˆ’ had the greatest impact on the TDS prediction. After that, Clâˆ’, HCO3âˆ’1 and TH were effective, respectively. Except the SO42âˆ’, the results are the same as factor analysis results for selection of input parameters.
We summarized the results and discussions of using the RBF and the TDNN to predict TDS of the river Zayandehrud in the Mosian monitoring station as follows:
For TDS prediction in TDNN, R2 and IA between the predicted data and the observed data were 0.957 and 0.986, respectively, which mean that our two neural network results are acceptable.
The R2 and the IA between the predicted and the observed data for predicting TDS in the RBF was 0.997 and 0.999. The TDNN contained 2 hidden layers with 15 neurons in each layer and the RBF with one hidden layer containing 100 neurons.
The MSE, RMSE and MBE for the TDNN were 0.0006, 0.0603 and 0.843, respectively. For the RBF neural network the mentioned errors were 0.0001, 0.43 and 0.516, respectively.
The result of the RBF is more accurate than the TDNN in the prediction of TDS in Zayanderud River.
The results of sensitivity analysis indicated that Ca2+ and SO42âˆ’ had the highest effect on the TDS prediction. According to its results, all the parameters from factor analysis had an important role in changes of TDS. The SO42âˆ’ was not mentioned in the results of factor analysis.
I wish to acknowledge Mr. Ernest Rammelâ€™s assistance in editing our manuscript.
1. Hossaini Abari HZayandehrood from source to swamp. 2nd edEsfahan: Golha press; 2000.
2. Asadollahfardi G, Taklify A, Ghanbari AApplication of artificial neural network to predict TDS in Talkheh Rud River. J Irrig Drain Eng. 2012;138:363â€“370.
3. Maier HR, Dandy GCThe use of artificial neural networks for the prediction of water quality parameters. J Water Resour Res. 1996;32:1013â€“1022.
4. Zhang Q, Stanley SJForecasting raw-water quality parameters in the North Saskatchewan River by neural network modeling. Water Res. 1997;1354:72â€“79.
5. Huang W, Foo SNeural network modeling of salinity variation in the Apalachicola River. Water Res. 2000;36:356â€“362.
6. Misaghi F, Mohammadi KEstimating water quality changes in the Zayandeh Rud River using artificial neural network model. 2000. In : written for presentation at CSAE/SCGR;
7. Kanani SH, Asadollahfardi G, Ghanbari AApplication of artificial neural network to predict total dissolved solid in Achechay River basin. J World Appl Sci. 2008;4:646â€“654.
8. Nemati S, Naghipour L, Fazeli Fard MHArtificial neural network modeling of total dissolved solid in the Simineh River, Iran. J Civil Eng Urban. 2014;4:8â€“14.
9. Moienian MThe natural landscape of the Zayandehrood River. Esfahan: Jahad Daneshgahi; 1999.
10. Kaiser HFThe application of electronic computers to factor analysis. Educ Psychol Meas. 1960;20:141â€“151.
11. Sarmad Z, Bazargan A, Hejazi AResearch methods in the behavioral sciences. 2nd edTehran: Nshragah Institute; 1997.
12. Lapin LProbability and statistics for modern engineering. Tamuri M, Rezaeian M, editors1st edTehran: Univ. of Technology; 2007.