Prediction of Financial Strength Ratings Using Machine Learning and Conventional Techniques

Financial strength ratings (FSRs) have become more significant particularly since the recent financial crisis of 2007-09 where rating agencies failed to forecast defaults and the downgrade of some banks. The aim of this paper is to predict Capital Intelligence banks’ financial strength ratings (FSRs) group membership using machine learning and conventional techniques. Here we use five different statistical techniques, namely CHAID, CART, multilayer-perceptron neural networks, discriminant analysis and logistic regression. We also use three different evaluation criteria namely average correct classification rate, misclassification cost and gains charts. Our data is collected from Bankscope database for the Middle Eastern commercial banks by reference to the first decade in the 21st Century. Our findings show that when predicting bank FSRs during the period 2007-2009, discriminant analysis is surprisingly superior to all other techniques used in this paper. When only machine learning techniques are used, CHAID outperform other techniques. In addition, our findings highlight that when a random sample is used to predict bank FSRs, CART outperform all other techniques. Our evaluation criteria have confirmed our findings and both CART and discriminant analysis are superior to other techniques in predicting bank FSRs. This has implications for Middle Eastern banks as we would suggest that improving their bank FSR can improve their presence in the market.


Introduction
A bank's financial strength, its risk profile, soundness and financial stability are assessed by Capital Intelligence (CI) banks ' Financial Strength Ratings (FSRs). This incorporates factors within its internal and external environment. CI implements a specialized approach, including some qualitative and quantitative factors, in assessing a bank's stability and thus assigning the appropriate banks' FSR. This is achieved by grouping factors into the following six broad categories: ownership and governance; operating environment; management and strategies; franchise value; risk profile and financial profile. Internally, CI assesses a bank's governance and specifically the extent to which there is a division between ownership and the management of its operations. Bridging the gap between a Bank's internal and external environment, CI examines a Bank's domestic market share as reflected in its assets and its potential future earnings (see for example, Abdallah, 2013). As such, CI assesses these factors and generates a bank's FSRs.
In the Middle East region, financial stability and soundness are entirely affected by the host country's banking system. This is mainly due to the absence of the capital markets' role in resource allocation and thus FSR is seen as an important indicator of the banking systems soundness and stability. As such, a Bank's FSR is considered as an important indicator for various stakeholders in assessing the bank's FSRs. This is particularly important in due to deficiencies in legal and regulatory systems and lack of transparency within banking sectors and financial markets (Abdallah, 2013). The difficulty in developing accurate rating systems for banks as opposed to countries is reflected in the relative inability of rating agencies to agree a universal rating system. A strong bank FSR assists a bank in accessing capital markets with more favourable conditions as well as positively affecting its operations and performance (Hammer et al. 2012). In addition, these rating agencies have been accused of being liable for the 'housing bubble' and consequently financial crash of 2007-08 (Diomande et al. 2009).
In the literature, less attention is paid to the Middle East region due to a number of factors that appear to be influential in this respect. First, governments are the main source for Middle Eastern banks' equity financing. Second, the need to assess a bank's creditworthiness is reduced where the bank is government owned because the government use their banks to finance economic activities. This may cause a disconnect between the bank's FSRs and its capital structure. Third, the underdeveloped legal and regulatory system has resulted in a weak system to monitor capital risk in Middle Eastern countries (see for example, Abdallah, 2013).
This highlights the importance of our investigation as approximately 47% of commercial banks in the Middle East, that is 64 out of 135, as per Bankscope data-base 2011, are rated. The development of stock markets in the Middle East has encouraged the operation of foreign rated banks within the region and this in turn has resulted in improving the competitiveness and performance of non-rated banks. This is raise banks' interests in obtaining adequate FSRs.
The motivation of our investigation is to evaluate and rank the predictive capabilities of machine learning and conventional techniques using different decision criterion namely error rates, misclassification costs and gains charts for different sample sizes. Due to scarcity of studies related to banks ' FSR under Capital Intelligence (CI), the objective of this paper is to determine whether Middle Eastern bank's financial and non-financial indicators can be used to predict their FSR group membership. The novelty of this paper is to apply machine learning and conventional techniques to predict a bank's CI FSR by distinguishing high ratings from low rating using financial and non-financial indicators. We use banks' FSRs issued by CI rating agency for Middle Eastern commercial banks 1 in the first decade of the 21 st Century 2 , which is ignored in the literature. There is no empirical study, which to the best of our knowledge, uses non-financial indicators to capture the effect of country specific differences, with other firm level characteristics, to determine whether they are able to distinguish high from low CI FRSs.
The remainder of this paper is organized as follows: section 2 reviews literature; section 3 outlines the research methodology and data collection; section 4 provides a discussion of the empirical findings and compares results of different bank FSR group membership models; and the last section concludes the paper and highlights areas for future research.

Review of relevant literature
As early as the 1960s, there were studies that focused on forecasting business events and classifying companies into two or more separate groups. Many researchers have applied different conventional and advanced statistical techniques to build classification models to overcome problems such as; financial failure; bankruptcies; financial information and stock price manipulation; and predicting bond and credit ratings. The launch of Moody's Bank Financial Strength Rating (BFSRs) in 1995 is followed by Poon et al.'s. (1999) logistic regression model to predict Moody's BFSRs. Many researchers have paid attention to the determination and prediction of bank ratings for developed economies (see for example , Poon et al., 1999;Poon and Firth, 2005;Hammer et al., 2012;Beisland et al., 2014) but not the relationship between financial/non-financial factors and bank ratings. Unsurprisingly, less attention has been paid to developing economies and in particular to the Middle East region.
Various statistical machine learning techniques are used in predicting bank rating (see for example, Chen, 2012;Chen and Cheng, 2013). CART algorithms has been employed in a number of situations. For example to predict bankruptcy ( Chandra et al., 2009;Li et al., 2010) to develop credit scoring models for assessing the credit risk of bank customers (Lee et al., 2006;Kao et al., 2012); to develop early warning models to assess the soundness of individual banks (Loannidis et al. 2010); and to predict bank performance (Ravi et al. 2008). Many studies 1 CI is more specialized in rating banks in the Middle East region than Fitch and Moody's. According to Bankscope data-base as at January 2011, CI assigns bank FSRs for 64 commercial banks in the Middle East region compared to Fitch and Moody's who assign bank ratings for only 50 and 48 commercial banks, respectively. S&Ps has no publically available equivalent individual bank ratings in the region from 2001-2009. 2 The reason to choose the first decade of the 21 st Century is to avoid any potential effect of the Arab spring which commenced in 2010 and the huge missing data due to this phenomenon. However, it is part of our future research plan to investigate the effect of the Arab spring on bank ratings in the Middle East. into early warning system models for financial risk (Koyuncugil and Ozgulbas, 2012) and for developing credit scoring models for assessing bank customers credit risk (Thomas et al., 2002;Bijak and Thomas, 2012) have utilized CHAID algorithms. To the best of our knowledge, this is the first paper that uses both CART and CHAID algorithms to predict Middle Eastern commercial banks' FSRs.
Based on human brains, neural networks are non-parametric techniques and computational methods that are used to identify significant patterns or structures in data which are then used to predict future phenomena. Neural networks have been applied in various financial studies such as: to predict bankruptcy of banks (Kumar and Ravi, 2007;Ravi and Pramodh, 2008;Zhao et al., 2009;Loannidis et al., 2010); to predict bankruptcy of firms (Chandra et al., 2009;Falavigna, 2012); to evaluate banks' creditworthiness (see for example, Huang et al., (2004); and to predict banks' financial strength rating (Poon et al. 1999;Pasiouras et al., 2007;Hammer et al,. 2012).
Altman (1971) introduced DA z-score model that discriminates bankrupt from non-bankrupt firms. In finance literature, (Altman and Sametz 1977;Canbas et al., 2005;Li et al., 2010) apply many forms of the DA to predict corporate and bank failure and assessing financial distresses. In addition, DA has been employed by (Lee et al., 2006;Abdou et al., 2008;Abdou, 2009a;Akkoc, 2012) in building credit scoring models. In the field of banking DA and hybrid techniques are used in rating predictions (see for example, Chen, 2012;Chen and Cheng, 2013).
In the literature on finance, LR is a widely-used technique among practitioners in predicting corporate and bank failure (Kolari et al. 2002;Canbas et al., 2005;Zhao et al., 2009;Li et al., 2010;Abdou et al., 2016); in predicting credit ratings (Oelerich and Poddig, 2006;Kim and Ahn, 2012); as well as in building credit scoring models (Lee et al., 2006;Abdou et al., 2008;Abdou, 2009a;Akkoc, 2012;Abdou et al., 2016). Finally, the LR model is employed by (Poon et al., 1999;Hammer et al., 2012) to predict bank financial strength rating. Predicting both Moody's BFSRs (see, Poon et al., 1999) and Fitch FBRs (Pasiouras et al., 2007;Hammer et al., 2012) have been the focus of the majority of previous studies. It is notable that there is no previous study focused upon CI FSRs (see for example, Abdallah, 2013). Consequently, the focus of our investigation is to bridge this gap by using both machine learning and conventional techniques to predict banks' CI FSRs group membership in Middle Eastern commercial banks.

Research methodology
Using PASW® Modeler 14, initially auto-classifier node is applied to automatically create and compare a number of different statistical predictive techniques. Auto-classifier node uses specific criteria to generate, compare and rank a set of candidate predictive statistical techniques to identify the optimal performing techniques. In our paper, the 'overall accuracy percentage' is used to rank the predictive accuracy of different statistical techniques. This is achieved by identifying the correctly classified percentage of observations for each technique relative to the total number of observations. Moreover, auto-classifier node provides an evaluation chart to visually enable the performance of each predictive statistical technique to be assessed and compared. The software automatically chooses the best five statistical techniques namely CHAID, CART, MLP NN, DA and LR to predict banks' FSRs. Figure 1 provides a graphical visualization of the chosen five predictive statistical techniques in terms of differences in their overall accuracy (SPSSInc., 2012).  Figure 1, it can be observed that the auto-classifier node ranks the two decision trees techniques namely CHAID, with an overall accuracy of 96.30%, and CART, with an overall accuracy of 95.44%, as first and second. These two techniques are followed by MLP NN with an overall accuracy of 94.02%. In addition, there is a role for DA as one of the conventional techniques with an overall accuracy of 93.16%, which is comparable with the machine learning techniques. However, the auto-classifier node ranks LR far below the other four techniques with an overall accuracy of only 73.5%. Therefore, it can be suggested that CHAID, CART, MLP NN and DA could perform better compared to LR in predicting Middle Eastern commercial banks' FSRs. Finally, four different evaluation criteria namely average correct classification (ACC) rate, error rates, estimated misclassification cost (EMC) and gains charts are used to evaluate the predictive capabilities of these statistical modeling techniques.

CHAID
The Chi-squared Automatic Interaction Detector (CHAID) is a statistical technique used to assess the relationship between a target variable and a series of predictor variables (see for example, Koyuncugil and Ozgulbas, 2012;Abdallah, 2013). A CHAID model divide the data into mutually exclusive and exhaustive sub-sets that best describe the target variable and predict the interaction between predictor variables (Bijak and Thomas, 2012;Abdallah, 2013).
For categorical dependent variables, chi-squared is used as a measurement level, whilst for continuous dependent variables the F test is used instead (SPSSInc. 2012). In building our CHAID models, we use Pearson chi-squared statistics which are calculated using both observed expected cell frequencies with the p-value being based on the calculated statistics.
The Pearson chi-squared statistic is calculated as follows (see for example, PASW, 2012, p. 77; Abdallah, 2013, modified): refers to the actual cell frequency; ̂ refers to the expected cell frequency for cell ( = , = ) from the independence model; = br( 2 > 2 ) refers to the calculation of the corresponding p-value, where 2 follows a chi-square distribution with d = (J -1)(I -1) df.

CART
The Classification and Regression Trees (CART) is a classification non-parametric statistical model which can use a binary decision tree-based procedure. It can be simultaneously applied to both categorical and continuous data based on a set of 'if-then' rules. It automatically separates complex databases for separating significant patterns and relationships (Ravi et al. 2008;Chandra et al. 2009;Abdallah, 2013). CART methodology can be divided into three phases: first, the construction of a maximum tree (tree-growing process); second, the selection of the right-sized tree (pruning process); and third the classification of the new data using the constructed tree. Gini index is used as part of the process, and the model repeats the splitting process until either the homogeneity criterion is reached or other stopping criteria are fulfilled.
The Gini index uses the following impurity function g(t) at a node t in CART tree (PASW, 2012, p. 63;Abdallah, 2013;Abdou et al. 2016, modified): and are categories of the independent predictor variable, and where, ( ) refers to the prior probability value for category ; ( ) refers to the number of records in category of node ; and refers to the number of records of category in the root node.
The Gini index enhances splitting during tree growth process. As such ( ) and are only calculated respectively from the records on node and the root node with valid values for the split-predictor.
Then, 'the pruning process' improves generalization to avoid over-fitting by applying two pruning algorithms. First is the optimization by number of points in each node pruning algorithm which implies that the splitting is stopped when the number of observation in the node is < the pre-defined required minimum number of observations. Second, is the crossvalidation pruning algorithm which establishes an optimal proportion between the misclassification error and the complexity of the tree. As such the focus of the cross-validation pruning algorithm process is to use the minimal cost-complexity function to minimize both mis-classification risk and the complexity of the tree in order to obtain an optimal tree, as follows (see for exapmle, PASW, 2012, p. 67; Abdallah, 2013): ( ) = ( ) + |̃| ( ) refers to the mis-classification risk of tree ; |̃| refers to the number of terminal nodes for tree ; and refers to the complexity cost per terminal node for the tree. Finally, following the construction of right-size tree with the lowest cross-validated rate, the outcome of the third phase process is to classify the new data. As such, based on a set of rules, each new observation is assigned to a class or response value that fits with one of the terminal nodes of the tree.

Multilayer-Perceptron Neural Networks
Multilayer-Perceptron Neural Network (MLP NN) enables the analysis of complex relationships between different variables and consists of layers of interconnected nodes between the input layer and the output layer. As part of the network nomenclature, predicted outputs are generated and compared with actual outputs in order to calculate an error function.
The network repeats the process until the either the number of iteration is reached or the error function is almost zero. where, refers to the output probability; refers to the intercept of the equation; and 1 , 2… refer to the coefficients in the linear combination of the independent variables 1 , 2… .

Data collection
In order to develop the proposed bank FSR group membership models, we use 64 commercial banks rated by Capital Intelligence (CI) out of a total number of 135 Middle Eastern banks, in our original sample. As the vast majority of banks in the Middle Eastern region are commercial banks, we then focus on this group of banks to avoid any potential comparison problems between different types of banks and for homogeneity across different countries included in our final sample. We use data from 10 Middle Eastern countries 3 , as shown in Table 1

Dependent variable:
As shown in Table 2, we rank CI banks' FSR using a scale from 1 up to 20; where 1 refers to the lowest FSR rating category (D) and 20 refers to the highest FSR rating category (AAA) (see for exaplme, Poon et al. 2009). As also sown in Table 2, the highest FSR rating category for banks in the Middle East region in our sample is AA-(17) and the lowest FSR rating category is B (6). We use a simple weighted average to divide the data into four quartiles.
Then, we use the highest quartile (15 to 17) versus the lowest quartile (6 to 11) as our dependent categorical variable 4 .

Independent variable:
Selected independent variables for the proposed models are reduced to 17 financial and nonfinancial variables 5 .

Financial variables
We use different financial ratios under the following categories: asset quality, capital adequacy, profitability, credit risk and liquidity, following CI rating agency, to predict Middle Eastern banks' FSR group membership, as shown in the Appendix.

Non-financial variables
In this paper, authors examine non-financial variables that may improve a models predictive capability in terms of a bank's FSR group membership. The following three non-financial variables are used: first, we use size as a dummy variable which is measured by ln total assets.
To reflect qualitative characteristics such as product diversification and geographic location, we classify banks' size into small, medium and large. Second, we use a dummy variable for the effect of time. Third, we use CI's country sovereign risk ratings (SR) to reflect differences in the implemented regulatory systems across countries. In calculating SR, the following macroeconomic factors are considered: inflation, taxation, exchange rates, infra-structure, employment rate, size and the growth of economy and regulations. Sovereign ratings reflects the probability that a government may default in meeting their obligations ((see for example, Abdallah, 2013; Laere et al. 2012). Correlation between our finally selected variables indicates no serious correlation (i.e. > 0.60) found amongst these variables, as shown in the following section.
We divided the data-set into two samples.

Descriptive statistics
Correlation results between our predictor indicators including the dependent variable (high-FSR versus low-FSR), are shown in Table 3. All correlations between predictor indicators are within an acceptable range i.e. < 0.60.

Non-financial indicators
Descriptive statistics for the 3 non-financial predictor indicators are shown in Table 6. As per the information value 6 score, 'Size' is the most influential non-financial predictor with a sore

CART
CART is used to explore the anticipated differences between the proposed models in relation to ACC rates using the same 17 financial and non-financial predictor indicators.

Multilayer-Perceptron Neural Networks
MLP NNs are designed using the same 17 financial and non-financial indicators under sub-sample1 and sub-sample2. The overall ACC rate using testing sub-sample1 is 81% with 80.6% and 81.6% for high-FSR and low-FSR, respectively, as shown in Table 7. As for testing sub-sample2, the classification matrix shows that the overall ACC is 86.2%; in addition, MLP NN model predicts high-FSR (i.e. 91.4%) better than the low-FSR (i.e. 81%). The increased overall ACC rate is a result of the higher predictive capability rate of 91.4% for high-FSR in testing sub-sample2, compared to a rate of 80.6% in sub-sample1.

Discriminant analysis
We run DA models using the same 17 financial and non-financial predictor indicators, and they are statistically significant at the 99% confidence level. As shown in Table 8 Table 8.

Logistic regression
We also run LR models using the 17 financial and non-financial predictor indicators, and they are statistically significant at the 99% confidence level. As summarized in  (2009b) is applied in calculating the EMC: where, E(predicted low-FSR/actually high-FSR) and E(predicted high-FSR/actually low-FSR) refers to the corresponding EMC of Type I error and Type II errors; b(predicted low-FSR/actually high-FSR) and b(predicted high-FSR/actually low-FSR) refers to the probabilities of Type I error and Type II errors; and π2 and π1are prior probabilities of low-FSR and high-FSR, respectively. We use a ratio of 5:1 to present the EMC associated with Type II and Type I errors following, for example, Abdou et al. 2008 andAbdou, 2009b.   Table 7.

FIGURE 3 AND FIGURE 4 HERE
7 High-FSR is misclassified as low-FSR 8 Low-FSR is misclassified as high-FSR For more details relating to testing sub-sample1 (predicting 2007-2009) and testing sub-sample2 (randomly predicting 33% of the overall sample), the reader is referred to Figure 3 and Figure   4; this illustrates our third criterion namely the gain chart using the machine leaning and conventional techniques applied in this paper, respectively. The gains chart is a valuable method of visualizing how good a predictive model is, as it plots the values in the Gain (%) column from the gains table. Gains refer to the increment number of hits divided by the overall number of hits multiplied by one hundred. If the models are not used, the 'diagonal line' plots the expected response in the testing sub-samples. The higher percentiles of gains, reflected in the curve line, represent how much the model can be improved with steeper curves representing higher gains. Visual gain charts analysis has indeed confirmed our results for both sub-sample1 and sub-sample2 using other criteria namely ACC rate and EMC.

Conclusion and areas for future research
The assessment of the creditworthiness of banks and other financial institutions has become very challenging due to structural changes in the global banking sector and the variability of creditworthiness within this sector. In addition, the financial crisis of 2007-2009 highlighted that banking systems are facing severe problems across different regions and that predicting 'correct' banks' FSR group membership seems more important than ever. This paper presents Second, to compare rated and non-rated banks to identify what non-rated banks need to achieve in order to secure higher rates. Third, apply other statistical modelling techniques such as SVM and genetic algorithms. Finally, use cross-validation technique to reduce any possible inconsistencies in results.

References:
Abdallah  The ratio of loan loss provision to net interest revenue (LLPNIR) The ratio of loan loss reserve to impaired loans (LLRIL) The ratio of impaired loans to gross loans (ILGL) Capital adequacy The total capital ratio (TCR) The ratio of equity to total assets (CS) The ratio of equity to net loans (ENL) The equity multiplier (EM) Profitability The net interest margin (NIM) The ratio of non interest expense to total average assets (NIEAA) The recurring earning power ratio (REP) The asset utilization ratio (AU) The tax management efficiency ratio (TME) Credit risk The ratio of loan loss provisions to total loans (LLPTL) Liquidity The ratio of liquid assets to deposit and short term funding (LADSTF)* Notation: *Liquid assets are short-term assets that can be easily converted into cash, such as cash itself and deposits with the central bank, treasury bills, other government securities and interbank deposits.