Combination of ANNs and heuristic algorithms in modelling and optimizing of Fenton processes for industrial wastewater treatment

This study evaluated the COD removal performances of classical-Fenton and photo-Fenton Processes by different prediction models. To optimize both Fenton processes performed in batch reactors, the effects of H2O2 dose, Fe(II) dose, H2O2/Fe(II) rate, and contact time were determined as the independent variables of the prediction models. Besides response surface methodology, three neural networks were used to more reliably and effectively predict the behaviors of dependent variables at different values of relevant parameters. Multi-Layer Perceptron trained by Levenberg–Marquardt (MLP-LM), Multi-Layer Perceptron and Single Multiplicative Neuron models trained by Particle Swarm Optimization algorithm (MLP-PSO; SMN-PSO) were studied. Models’ prediction performances were evaluated by Root-Mean-Square Error (RMSE) and Mean Absolute Percent Error criteria. Regression analysis was applied to determine the performance of the best model. The results from both criteria indicated that SMN-PSO model produced the best predictive results in almost all cases. Moreover, the key process parameters were determined by applying the genetic algorithm to SMN-PSO model outputs. The optimized conditions achieved the optimum removal with over 99% desirability. The optimum Fe(II) dose was determined as 399.99 mg/L in both Fenton processes. H2O2 dose was found as 726.18 and 894.07 mg/L and removal efficiencies were achieved 86.50 and 87.49% for classical Fenton and photo-Fenton, respectively. As a result, it will be possible to simulate and improve the different Fenton processes and determine the optimum process parameters by the obtained data in the treatment of wastewater with similar characteristics without many experiments, which are difficult and costly.


Introduction
For this reason, the toxicity of cosmetic wastewater should be eliminated or reduced to the acceptable limits require before discharging to receiving environments.
Chemical and biological treatment technologies are applied in the treatment of cosmetic wastewater. The most investigated technologies have been coagulation/flocculation (El-Gohary et al. 2010), dissolved ozone flotation (Wiliński et al. 2017), electro-coagulation, membrane systems (Monsalvo et al. 2014), submerged membrane bioreactor (Friha et al. 2014), up-flow anaerobic sludge blanket reactor (Puyol et al. 2011), advanced oxidation processes (AOPs) such as Fenton (Naumczyk et al. 2014) or photo-Fenton processes (Muszyński et al. 2019), catalytic wet peroxide oxidation (Bautista et al. 2010), etc. Compared to other treatment methods, AOPs are known as expensive methods due to their high energy and chemical requirements (Oller et al. 2011). The common feature of AOPs applied with different operating conditions is the formation of hydroxyl radicals at normal pressure and room temperature. The hydroxyl radical is a non-selective and strong oxidant that reacts with three different mechanisms. These mechanisms are hydrogen abstraction, radical addition, or electron transfer. Moreover, AOPs are always improved with new equipment as well as the application of the most efficient methods (Paździor et al. 2019). AOPs can be grouped as Fenton-based, Ozone-based, photo-catalytic, EAOPs, and others. Fenton-based processes among AOPs are Fenton, Electro-Fenton, sono-Fenton, photo-Fenton, photo-Fenton/TiO 2 , photo-sono-Fenton, photo-electro-Fenton, sono-electro-Fenton. Fenton processes are preferred in the treatment of various wastewaters because of their flexible operation, easy system, their ability to react at widely temperature ranges and under atmospheric pressure (Fernandes et al. 2018).
The performances of treatment methods depend on determining the optimum operating conditions. For this purpose, a limited number of trials are performed with the created experiment sets, and the optimum conditions are determined according to the data obtained here. Recently, Artificial Neural Networks (ANNs) and Response Surface Methodology (RSM) have come forward as effective experimental modeling and optimization methods. In the literature, some approaches have been used to model Fenton-based processes and to make predictions about these processes under certain operational parameters. Most of these approaches include statistical-based models like RSM. Recently, machine learning-based models such as ANNs have been applied as alternative modeling and prediction tool. Although ANNs are the subject of many studies for time series prediction (Egrioglu et al. 2013;Cagcag Yolcu et al. 2018;Yolcu et al. 2019), there are limited studies that used different ANN types to model Fenton-based processes (Elmolla et al. 2010;Zarei et al. 2010;Jaafarzadeh et al. 2012;Sabour and Amiri 2017;Radwan et al. 2018;Talwar et al. 2019;Tolba et al. 2019;Baştürk and Alver 2019;Cüce et al. 2021;Gholizadeh et al. 2021). Modeling a dependent variable through certain independent variables is essentially done to predict this dependent variable at different and especially non-existent independent variable values. RSM is known as a traditional modeling and prediction tool in the literature. However, this traditional methodology has some limitations such as linear model and distribution assumptions. These limitations are generally not met in the data sets in environmental sciences and ecological research. In case of the relationships among variables are not linear, its performance will be insufficient. In order to sort out such a case, either some variables must be transformed or a state of the art modeling tool must be used (Lek et al. 1996). As the computer systems, ANNs are improved to learn the data and to generate new information similarly the human brain. ANN learns the data structure using its hidden layer and offers more suitable models compared to traditional methodology when programming is impossible or exceedingly difficult.
The main purpose of applying the model tools in this study is to predict the behavior of the dependent variable more reliable and effective at different values of the relevant parameters. For this purpose, the effects of H 2 O 2 dose, Fe(II) dose, H 2 O 2 /Fe(II) rate, and contact time on COD removal performances of classical and photo-Fenton processes from cosmetic wastewater were evaluated by four different prediction models. The first of the applied models was the traditional model, RSM. The second was multi-layer perception (MLP) trained with the Levenberg-Marquardt training algorithm (MLP-LM). And, MLP and Single Multiplicative Neuron Networks trained by particle swarm optimization algorithm (MLP-PSO; SMN-PSO) were also the distinctive models and innovative aspects of this study. The generalized abilities of models were presented via training, validation, and test. A comprehensive comparison of model results was evaluated by RMSE and MAPE criteria. The performance of the best model was determined by using regression analysis.
The key process parameters were obtained by applying a genetic algorithm.

Materials
In this study, the wastewater formed during the production of automobile care products was used. The real wastewater samples were supplied from a company in Nevşehir city of Turkey. The automobile care products produced in this company are car washing shampoos, multi-purpose cleaning products, brake pad cleaners, tire care/cleaning products, car waxes, air conditioning care/cleaning products, lubricants for car care, special cleaning products for rim care.
Wastewater was collected as 2-h composite samples, taken to the laboratory in a transport box, and refrigerated at 4°C until use. The main properties are listed in Table 1.

Experimental procedure
The photo-Fenton system has four main parts. As it is seen in Fig. 1, the system consists of UV-C radiation lamps of 8 W mounted in parallel, two magnetic stirrers (Mtops MS200), 500 mL of reactors, and a light-proof wooden cabin. The dimensions of the cabin are 42 cm × 50 cm × 50 cm (H×L× W). The classical-Fenton system comprises reactors of 500 mL and a jar test flocculator (Velp JLT6). Both processes were conducted in a batch system with wastewater samples of 200 mL with an inlet COD concentration of 1129 mg/L. The influences of Fe(II) dose (50-400 mg/L), H 2 O 2 dose (200-1050 mg/L), H 2 O 2 /Fe(II) rate (200/150-1050/400), and contact time (0-60 min) on COD removal were investigated by both Fenton processes. The wastewater sample and reagents were mixed firstly at 300 rpm for two minutes. Then, mixing speed was decreased to 90 rpm and mixed for 45 min in classical-Fenton reactors and 20 min in photo-Fenton reactors. All experiments were performed at 20 ± 3 °C and pH of 3. The adjustment of the pH value of wastewater was done with 6 N of H 2 SO 4 and 3 N of NaOH. After the precipitation process, the mixture was filtered by using membrane filters of 0.45 µm. COD concentration was analyzed by using a thermoreactor (Hach LT200) and a spectrophotometer (Hach DR3900) according to by Closed Reflux Method (Baird et al. 2017). pH value was measured with a multi-parameter (Hach HQ40d). COD removal performance was calculated by using the final (C f ) and inlet (C i ) of COD concentrations (mg/L) as follows:

Multi-layer perceptron neural networks (MLP-NN)
Feed-forward neural networks (FF-NNs) are one of the most popular architectures due to their structural flexibility, capabilities of well-representational, and a large number of training algorithms available as well as well-known machine learning (Haykin 1999). FF-NNs are designed to replicate the ability for creating and designing new information of the human brain. Multi-layer perceptron neural network (MLP-NN) was firstly proposed by Werbos to solve nonlinear problems due to its architectural structure including hidden layer(s) (Werbos 1974). Then, Rumelhart et al. improved the MLP-NN method (Rumelhart et al. 1986). MLP-NN methods have been widely used for so many areas such as prediction, classifications, modeling, etc. MLP-NN comprises neurons regulated in layers as input, outputs, and one or more hidden layers. Every neuron is attached to all neurons of the next layer. A number of neurons in the hidden layer have a critical effect on the performance of the network (Li et al. 2017). Having a data-driven feature that comes from including hidden layers in its structure enables these kinds of neural networks to produce flexible and adaptable results for nonlinear problems. The neurons are attached by weights and output signals that are a function of the sum inputs to the neurons modified by a simple nonlinear transfer, or activation function.

Single multiplicative neuron model (SMN)
SMN was first introduced to the literature by (Yadav et al. 2007). SMN structure has just one neuron as a hidden layer, unlike the MLP-NN. Having this feature makes SMN more advantageous than MLP-NN, especially in solving the problem of the determination of appropriate structure. There is only one neuron in the SMN structure and the process of multiplication instead of an addition operator is applied to the signal accruing to the neuron. In Eq. (1), Ω(×, Θ) function comprises the product of weighted inputs. Θ is a vector that includes the weights ( w j ) , X ij is i th sample for j th input, and the biases ( b j ) of the model and can be shown with  There are m inputs which are shown with (X 1 , X 2 , … X m ) and just one output given by y and also f shows the activation function which is the function that specifies the nonlinear relationships between inputs and output. The net (net i ) and the output (y i ) values of the neuron are calculated as:

Particle swarm optimization (PSO)
PSO is a kind of heuristic optimization method, proposed firstly by (Kennedy and Eberhart 1995). PSO is improved by adding some coefficients to the optimization process (Shi and Eberhart 1999;Ma et al. 2006). The most significant feature of this algorithm is the ability to reach the optimum point from several points at the same time. So, thanks to this feature, the PSO algorithm moves away from the local optimum and reaches the global optimum. Recently, the PSO algorithm has been widely applied to the data due to its high solution quality, simplicity, and good convergence properties. In the PSO method, each particle has a position and velocity that shows its direction in the search space and the solution of the optimization problem. The best positions of particles in Pbest vectors and the best state of all particles in Gbest vectors representing the global optimum are stored. In this study to be able to train SMN, modified PSO is utilized. The process of modified PSO analysis has some steps which differentiate this model from the traditional one. These are cognitive ( c 1 ) and social ( c 2 ) coefficients, the inertia parameter (w). These parameters are calculated for each iteration by using Eqs. 3-5.
Here, ( c 1i , c 1f ) are possible intervals for cognitive coefficients, ( c 2i , c 2f ) are ranges for social coefficients and ( w 1 ,w 2 ) are inertia parameters. maxt gives a maximal number of iterations, and t is a valid iteration number. And finally, new values of positions and velocities are calculated with Eqs. 6-7; (2) where rand 1 and rand 2 are random numbers between 0 and 1. After reaching the predetermined iteration number, Gbest 's results are taken optimal parameters of the system.

Genetic algorithm (GA)
GA identified by (Holland 1992) is improved by (Goldberg 1989). GA is one of the heuristic optimization methods used to find benefit solutions to complicated problems. It contains important parts as the population for selection, crossover, and mutation. Firstly, random solutions (individuals) that have several features (chromosomes) are generated in the algorithm. According to the laws of genetics, cross-over and mutations happen in chromosomes, the second generation of individuals with more different properties to create. The calculations for GA function were performed through MATLAB 2018b.

Results and discussion
The results from batch studies were evaluated the belowreferred aspects in the previous article as the first step of the study in detail (Cüce and Aydın Temel 2021). The influences of pH, H 2 O 2 dose, Fe(II) dose, H 2 O 2 /Fe(II) rate, and contact time were examined to find the key operating conditions in batch studies. Zero-order, first-order, second-order, and Behnajady-Modirshahla-Ghanbery (BMG) kinetic models were analyzed to learn COD removal mechanisms of classical Fenton and photo-Fenton processes. To summarize the first step of the study, the effective pH values were determined as 3 for both processes. Therefore, pH 3 was used as a fixed variable in the present study. The key operating conditions were Fe(II) dose of 300 mg/L, H 2 O 2 dose of 1050 mg/L, and H 2 O 2 / Fe(II) rate of 600/300 for the classical-Fenton process. For the photo-Fenton process, these values were 300 mg/L, 900 mg/L, and 600/300, respectively. In these operating conditions, COD removals were achieved 75% in classical-Fenton, and 85% in the photo-Fenton. According to the kinetics, both Fenton processes were well represented by the Behnajady-Modirshahla-Ghanbery model. Consequently, the photo-Fenton process has come into prominence because of higher COD removal with less reagent.
Now, in the present study, it is basically aimed to predict COD removal efficiency of classical and photo-Fenton processes from cosmetic wastewater by using a traditional method (RSM) and three state-of-the-art models (MLP-PSO, MLP-LM, and SMN-PSO). The experiment design of classical and photo-Fenton processes is summarized in Table 2. An illustration of the combination of ANNs and heuristic algorithms is presented in Fig. 2.

Performance measures
The prediction results produced from RSM, MLP-LM, MLP-PSO, and SMN-PSO models were evaluated from different perspectives. Firstly, RMSE (Eq. 8) and MAPE (Eq. 9) that are revealed the predictive performance of models, the basic statistical criteria widely applied in prediction literature were discussed. In addition, scatter diagrams showing the symphony between predictions and actual values were illustrated.
The other perspective is the analysis of the regression model to create for the predictions and target values, and some properties of this regression model which is given by Eq. (10). The determination and regression coefficients ( ̂ ), of the model are required to be very close to 1 or equal to 1 for a successful prediction tool.

RSM modeling and optimization
The second-order polynomial model structure for the case where there are k independent variables is written as: The results of the data determined in terms of non-coded factors with RSM for all experimental designs of both processes are listed in Table 3. As seen in Table 1, MAPE values indicated that the predictions obtained by RSM in 5 of (10)

Neural networks-based modeling
Artificial neural networks have been widely used in many scientific areas. In particular, thanks to the rapid development of computer technology in recent years, neural network-based prediction models have started to be used frequently. One of these models is the MLP-LM. While derivative-based training algorithms such as Levenberg-Marquardt learning algorithm could sometimes get stuck in local optimum, particle swarm optimization carries no such risk. From this point of view, unlike existing studies in the literature that use MLP for similar purposes, MLP and SMN trained by PSO were used as prediction tools in this study. Unlike MLP, there is no architectural selection problem since SMN has only one neuron, making it more applicable. Moreover, using a multiplication aggregation function instead of an additive aggregation function makes SMN more flexible and successful, especially in solving nonlinear problems. The use of SMN, which has these features, as a predictive tool in this field is another pioneering and distinguishing feature of this study compared to other studies in the literature. In this respect, this study is the first study that takes into account all the above-mentioned issues in its literature. Also, analysis and modeling using MLP-LM, MLP-PSO, and SMN-PSO were conducted with MATLAB program codes written by researchers of this study. MLP and SMN structures with the two inputs are given in Fig. 3.
For all experiment designs, the prediction results produced by MLP-LM, MLP-PSO, and SMN-PSO models are listed in Tables 4 and 5 for RMSE and MAPE criteria, respectively. These tables also give the success rankings of the prediction models according to the corresponding criteria. When the findings given in Tables 4 and 5 are evaluated, it was seen that all three NN-based prediction models  produced better predictive results than the RSM in all cases except for one exception, in terms of both criteria. The best estimation results were produced by the SMN-PSO model, as predicted according to the average success rankings of both criteria among three NN-based models. The main reason for this is that SMN-PSO trained by PSO has less risk of getting stuck in local optimums than MLP-LM trained with the Levenberg-Marquardt algorithm. In addition, SMN does not have architectural selection problems, and thanks to the multiplicative aggregation function it uses, it is more flexible and successful than MLP in these types of problems with nonlinearity dominance. Thus, it is seen that the SMN-PSO produced predictive results with errors of around 2 and 3%, and in many cases much lower than these rates, up to values less than 1%, considering the training, validity, test sets and even all data sets for each experiment. Another important reason for preference to use NN-based prediction tools is that unlike RSM, producing in stronger results in terms of reliability and consistency by analyzing the data in different parts such as training, validation, and test sets. Especially when considering estimates from out-of-sample test sets, the MAPE values were about 1% for almost all experiments. In the 2 nd , 7 th and 8 th experiments, the percentage of errors even below 1% were observed in the out-of-sample data set for SMN-PSO. These findings indicate that SMN-PSO produced highly satisfactory and consistent predictions even for the out-of-sample data sets. One of the most important aspects of these results is that in cases that experimental designs are costly or difficult, satisfactory predictive results for out-of-sample data sets without the need for any additional experiments are proof that they can be produced with SMN-PSO.
Another way to reveal the superior predictive ability of a prediction tool is to examine some properties of a linear regression model to be established between predictions and target values. Such a strategy was also followed for SMN-PSO, which showed superior performance among the three NN-based prediction models. Table 6 presents the findings for this regression analysis.
The findings given in Table 6 can be investigated from three different angles. For all experiments, the beta coefficient estimates obtained in the regression estimation equation were pretty close to 1. It was a sign that the predictions produced by SMN-PSO were very close to the actual observations. Moreover, the confidence intervals of β (with 95% probability) coefficients covered 1 and also had a very narrow frame. In other words, beta coefficients were equal to 1 with a probability of 95%. In addition, the fact that the determination coefficient, R 2 , was very close to 1 for each experiment proved that there was a very high linear relationship between SMN-PSO predictions and actual removal performances. This is another feature that a superior prediction tool like SMN-PSO should have.
Moreover, besides all these statistical evaluations, the predicted and observed removal efficiencies were visualized by scatter diagrams for SMN-PSO that gave the best prediction performance among NN-based models used in this study. The fact that most of the points in a scatter plot were close to the line segment shows that satisfactory predictions were produced. The scatter plots, given in Fig. 4, also contain examples of exactly this situation. In other words, in the scatter plots, the points were spread very close to the line.

Optimization via genetic algorithm
After the modeling process, as the final goal of the study, the various operating conditions applied in both Fenton processes were optimized. For this purpose, GA was used  to optimize the independent variables and maximize the removal efficiency. Here, the basic principle is to determine the independent variable values that will maximize the removal efficiency. Generally, the objective function is given as in Eq. (12).
where y is the removal performance of each Fenton process, X 1 and X 2 represent the independent variables such as H 2 O 2 dose, Fe(II) dose, H 2 O 2 /Fe(II) rate, and contact time. The optimization process was applied to SMN-PSO, which gave (13) y = f (X 1 , X 2 ) the best prediction. In this case, the objective function can be given as follows: The basic framework of the optimization process for each experiment is summarized in Table 7. An important advantage in modeling by using NN is that removal efficiencies corresponding to non-existent parameter values in performed experiments can also be revealed. Moreover, by using the fitness function of a trained neural network, an optimization

Comparison of the results
When the results obtained from each model were examined in detail, NN-models presented superior predictions than RSM. The success rankings determined according to the RMSE and MAPE criteria of all models applied for eight different experiments also support this situation. The averages of success rankings given in Figs. 5 and 6. As seen in Figs. 5 and 6, SMN-PSO had much better performance than other NN-based models for training, validation, and test sets. Moreover, when all data sets were considered, it was also observed that SMN-PSO had a superior prediction performance than both other NN-based models and RSM.

Conclusion
In this study, COD removal performances of classical and photo-Fenton processes from cosmetic wastewater were predicted by using statistics-based RSM and NN-based three machine learning models.
• Among all methods, SMN-PSO was observed to be the model that produced the best predictive results in almost all cases. • SMN-PSO often produced predictions with an RMSE value of around 1% or less. • In terms of the MAPE criterion, a measure of the error percentage, SMN-PSO revealed errors of less than 2% in most cases. • Considering the average of the success rankings, the SMN-PSO produced far superior prediction results compared to other models in terms of both criteria.
The main reasons why SMN-PSO exhibits superior prediction success are as mentioned below: • Since the SMN-PSO is trained by PSO, it does not get caught in local optimum traps unlike derivative-based training algorithms such as LM. • With the multiplicative aggregation function, SMN-PSO has a higher ability to the adaptation to nonlinear problems. • Moreover, SMN-PSO, unlike MLPs, does not include a problem such as determining the architecture. • By using the genetic algorithm, the independent values to maximize COD removal performance were obtained in a particular search field. COD removal performances corresponding to the obtained optimum independent variable values provided high desirability values.
With this study, the key process conditions can be determined more accurately by modeling the findings obtained from the limited number of experiments conducted to determine the performance of the treatment processes. In this way, technical difficulties, costs, manpower, and time are no longer a problem. The estimates of the models trained with the available data allow the process to work efficiently. In future studies, statistical-based prediction models and NNbased machine learning models can be combined to get better predictions of removal performance.