“Electricity price forecasting in Turkey with artificial neural network models”

The electrici efficiency. T electric mark have been tr nonlinear pro neural netwo momentum, number of n compared ac observed as 9

e ( 2 ) In network learning, the weights are adjusted in accordance to a learning algorithm with the help of training inputs. A common network learning technique is working on calculation error, starting from output layer down through hidden layer. It is named as back propagation of error with modified delta rule (Rumelhart, 1986). The main aim of all algorithms is aim minimizing the error. A frequently used error function can be defined as follows.
where n is the total number of output nodes, o k is the network output at the k th output node and t k is the target output at the kth output node. Training algorithms attempt to reduce the global error by adjusting weights and biases. Various training algorithms have been proposed to improve the neural network learning procedures such as gradient descent, conjugate gradients, Quasi-Newton and Levenberg-Marquardt.
Gradient descent is a standard back-propagation algorithm in which the network weights are moved along the negative of the gradient of the performance function. The back propagation algorithm with gradient descent is given as follows: , kk k wg (4) where k w is a vector of weight changes, g k is the current gradient, k is the learning rate that determines the length of the weight. The major disadvantages of standard back propagation are its relatively slow convergence rate and being trapped at the local minima (Azar, 2013). In order to avoid oscillations and to reduce the sensitivity of the network, there is a momentum term added to gradient descent algorithm as shown in the following formula: where p is the momentum parameter. Furthermore, the momentum allows escaping from small local minima. The gradient descent and gradient descent with momentum do not produce the fastest convergence and they are even too slow to converge.
There has been considerable research on training methods to speed up the convergence of neural networks. These techniques include such ideas as varying the learning rate, using momentum and rescaling variables. There are four types of algorithms that are commonly used to minimize the network error. These methods are steepest descent, conjugate gradients, Quasi-Newton and Levenberg-Marquardt (Fine, 1999).
If the error function is truly quadratic, Newton's method can be used to minimizing weight vector in a single step (Fine, 1999). While the gradient descent algorithm requires only information of the first partial derivatives, Quasi-Newton algorithm produces improved convergence with second derivatives.
The Newton step size k w for a second order Taylor series approximation of f(x) at any current point w i is obtained from the following equation ii wH w f w . (6) Assuming that the Hessian matrix (second derivative matrix) is non-singular, Eq. (6) can be written as Broyden, Fletcher, Goldfarb and Shanno (BFGS) algorithm has been the most successful algorithm in Quasi -Newton method studies (Prasad, 2011 J is the Jacobian matrix that contains first derivatives of the network errors with respect to weights. The Levenberg-Marquardt (LM) uses this approximation to get updated weighted matrix as in the following formula: 1 1 , tt kk wwJ J e IJ (9) where w is the weight vector matrix, I is the unit matrix, is the combination coefficient, is Jacobean matrix, refers to the error vector. If is zero, this becomes the same as the Newton method, when is large, this equation becomes a gradient descent with small step size.
There are many different learning algorithms that are not explained here such as the resilient methods and conjugate gradient. Table 1 lists commonly used neural network learning algorithms that are going to be used in this study.  (Guoqiang, 1998).
Neural networks have been employed in volatility forecasting, risk rating of bonds, stock market predictions, option pricing and inflation forecasting (Habib, 2014). In addition to electricity price forecasting with neural networks in the deregulated markets has been achieved with reasonable accuracy (Singhal, 2011).

Artificial neural network design.
Despite of the many satisfactory characteristics of neural networks, building a neural network for a particular forecasting problem is a challenging task. Generally, artificial neural network design procedures include the following steps: The selection of architecture. The selection of input, hidden and output nodes. The selection of layers. The selection of activation functions. Data preprocessing methods.
Training and test sets. The selection of training algorithm. The selection of performance measures.
Design decisions on the items listed will affect the network performance. The important question here is how to develop a specialized structure in neural networks since there are no well-defined rules, but rather ad-hoc procedures yield useful results (Haykin, 1999).
Preprocessing makes forecasting problem more manageable. By preprocessing the data, it can be simplified before the actual calculations. Getting rid of the noise of the input data will affect the performance of the network positively.
Based on architecture preference, artificial neural networks generally grouped into two categories as feed-forward networks and recurrent networks (Jain, 1996). In feed forward networks, the output of one layer is used as the input to the following layer. In recurrent networks, every layer can take input from another layer. The feed forward networks are generally preferred for forecasting.
Generally, neural networks consist of input, hidden and output layers. For the number of hidden layers, Zhang experiments with networks with more than two hidden layers, but it does not provide significant improvement (Guoqiang, 1998). Also the number of neurons in layers has to be determined by heuristic way. The most common way to determine the number of neurons in layers is via experiments with trial and error.
Another design factor is the sample size to train and test the model. The larger the sample size, the more accurate the results will be calculated. In reality, the sample size is defined also by the availability of data.
There is no training algorithm currently available to guarantee the optimal solution for a general nonlinear optimization problem. The most popular training method is the back propagation algorithms.
Transfer functions limit the output of neurons. These functions must be differentiable and nondecreasing. Most papers use either logsig or the tansig functions.
Mean absolute percentage error (MAPE), the weighted mean absolute percentage error (WMAPE), the mean absolute error (MAE) and root mean square error (RMSE) are widely used in network performance. Commonly used network performance measurement methods are shown in Table 2. 155 ions and 50 , total of 400 tricity price ed in Table 3. network is of day, hours ted average ure 6 shows uron number 6 5 0 0 e s s e s r Input data is divided into two categories, 72% hours of total data trains the models and %28 hours of total data tests the model. Figure 7 shows data with testing and learning parts.  Table 4.  Figure 6.

Fig. 8. Number of hidden layer neurons and transfer functions price forecast results
Because Levenberg-Marquardt and BFGS algorithms have more successful MAPE value, they are shown separately in Figure 9. When compared to the variation of the hidden layer neurons BFGS and Levenberg-Marquardt algorithm Levenberg-Marquardt MAPE results are analyzed algorithm seems to change values in a fixed range. BFGS algorithms increase in MAPE values increased by the number of hidden layer neurons is observed. Levenberg-Marquardt algorithms generally seem to be more successful than BFGS algorithms. Source: author's calculation. Models before 7,296 hours on the day of test data produced by the data in real time data on electricity price forecasts are shown in Figure 8. With model data resulting from differences between actual average 7,296 hours of test data "-1.12 TL / MWh" calculated.

Fig. 10. Actual data & model prediction data comparisons
Error rate on 29.09.2014 exceeds the rate of 1000%. We limit the graphics to 1000% rate for better explanation. Model is trying to estimate day-ahead price on 29.09.2014 that its actual value is 0.92 TL / MWh. The models estimated value is about 73.13 TL / MWh on 29.09.2014. Another lowest price, which is 0.79 TL / MWh, has occurred on 31.12.2014 at 23.00.
Model estimation for corresponding time is 119.88 TL/ MWh. So the abnormal day-ahead price of electricity makes estimation negatively. Between 03.03.2014-31.12.2014 days (7,296 hours of test data) the histogram is shown in Figure 11. Day-ahead electricity price data generally changes between 100 TL / MWh to 250 TL / MWh.

Conclusion
This study investigates different artificial feed-forward neural networks models for achieving the best forecasting results for Turkish day-ahead electricity market. We employed 13 factors with 26,304 hours historical data and created 400 different neural network models. Besides, 26,304 hours data for each model have been used to reach most successful artificial feed forward model. Models success is calculated according to the models MAPE values.
In this study, the most successful MAPE values obtained for the electrical day-ahead price is 9.76 %. Aggarwal et al. review on electricity price forecasting in deregulated market and compare neural network performance. In different price forecasting studies, MAPE values changes between 2.18% and 25.77% (Aggarwal, 2009). In addition to these studies, artificial neural networks would be a significant alternative for financial decision makers in forecasting the day-ahead electricity price.
Gradient descent, gradient descent momentum algorithms in neural network models are more sensitive to weight changes or abnormalities in their input to calculate the first order derivatives. Levenberg-Marquardt and BFGS methods uses second order derivative that produce better results against the data anomalies.
Selection of input variables for a particular model is still an open area of research. Further research can include selection of more appropriate data classification.