“Minimum sum regression as the optimum robust algorithm in the computation of financial beta”

In the world of finance and portfolio management, “beta” refers to the sensitivity of a security’s return to the sensitivity of the “market” portfolio and is an indication of the level of systematic risk, i.e. the amount of risk that a company’s equity shares with the entire market. Portfolio managers must have accurate estimates of beta so as to adequately con-trol risk in the portfolio. Typically, beta is estimated using Ordinary Least Squares, but OLS is reliant on some very stringent assumptions. Here, betas are computed and compared using OLS and four robust regression algorithms. Minimum sum regression is identified as the superior robust regression algorithm to estimate beta.


Introduction
In the world of finance and portfolio management, "beta" refers to the sensitivity of a security's return to the sensitivity of the "market" portfolio and is an indication of the level of systematic risk, i.e., the amount of risk that a company's equity shares with the entire market.
Harry Markowitz [1] developed the notion of beta as the mathematical slope in a linear regression of company rate of return onto the market rate of return. Eqn. 1 below displays the equation for beta which Markowitz described as the characteristic line.
= α +β * , where rr i -rate of return for company i rr mkt -rate of return for market  -alpha, intercept  -beta, slope.
The intercept  is the expected return when the market return is equal to 0 and the slope  i is the percent change in the security for a one percent change in the market return, on average other things equal. While Eqn. 1 is straightforward, the estimation of the equation is not quite so. The conventional method to estimate the security market line alpha and beta is OLS, i.e., ordinary least squares. OLS has important assumptions that underlie the validity of the model. If the assumptions are violated, then, inaccurate parameter estimates for  and  i will be had. The usual violation is outlying observations in the y or x domain. Results from inaccurate betas will lead to incorrect portfolio construction and unanticipated portfolio returns. Beta is an important metric for portfolio management, managerial finance and/or investment banking contexts. On a casual basis, an individual retail investor can assess the expected volatility of the company's equity relative to a universe benchmark by looking up a beta on Yahoo/Finance, brokerage house information sites, or other free sources. In a more formal, institutional asset management context, portfolio manager of institutional clients, e.g. foundations, pension funds, mutual funds, etc. must satisfy a number of constraints in their portfolio management activities. For example, the following are some of many constraints imposed in the contract: 1. Portfolio turnover -usually limited to 100% per year. 2. Number of names in portfolio -usually required to be 50 -100. 3. Tracking error -usually limited to be +-3% of index. 4. Weighted Ave. Beta -usually constrained to 98-1.02.
All of these constraints are imposed to prevent excessive risk taking by the portfolio manager. In the case of item 4 which constrains portfolio beta to a value close to 1.0, incorrect beta estimates can lead to unexpected volatility with concomitant issues in portfolio management regarding items 1-3. Institutional asset managers rely on betas provided by vendors such as Barra or Bloomberg. But even these vendors need to be sure their betas accurately reflect reality. In a capital budgeting context, accurate betas are needed for estimation of cost of capital. An inaccurate beta generates inaccurate cost of capital, and this could lead to the incorrect acceptance or rejection of a capital project or acquisition.
This research examines conventional and alternative linear regression estimation techniques to estimate beta. Specifically, OLS will be compared to four robust linear regression estimation methods, and the superior algorithm will be identified.

1. Methodology
Monthly closing prices for all SP500 constituents as of 12/31/2015 were downloaded from Bloomberg and monthly returns calculated since 12/13/1980. As some companies are newer to the index, they have fewer data points than others. Eqns. 2, 3, and 4 display the functional specification, population regression line and sample regression line to be estimated.
rr i = f(rr mkt ) (2) rr i = +*rr mkt (3) rr i = a+b*rr mkt (4) Specifically, beta coefficients, standard deviation of residuals and residual inter-quartile ranges for each company for each of the five regression methods, i.e., OLS, lmRobMM, ltsreg, lmsreg and ms are computed. The five regression methodologies are briefly discussed below.
Ordinary Least Squares (OLS). OLS, developed by Sir Francis Galton 1894, finds the estimates for the intercept and coefficients by minimizing the sum of squared residuals as seen in (5) below. An advantage of OLS is that it is closed form and, therefore, computationally easy. The OLS disadvantage is that parameter estimates are highly influenced by outliers in the response and/or the explanatory variables. (y-(a+b*x)) 2 Robust Maximum Liklihood Estimation (lmRobMM). Huber introduced lmRobMM in 1973 which finds estimates for intercept and coefficients using maximum likelihood as the estimation method. The method is robust to outliers in the response variable but not to outliers in the explanatory variables. When there are outliers in the explanatory variables, the method has no advantage over OLS.
Least Median Squares (LMS). Rousseau developed LMS in 1984 which replaces the sum of squared residuals by the median of the squared residuals. The equation is not sensitive to 50% of the data.
Least Trim Squares (LTS). LTS minimizes the sum of squared residuals over over a subset, k, of n points. The (n-k) points not included do not influence the equation. The subset, k, of observations to be included is determined by optimizing a loss function for the subset of points.
Minimum Sum Regression (MS). MS minimizes the sum of the residuals of any model or objective function specified. Not only can any linear or nonlinear modeled be estimated, any objective function can be modeled, be it minimize sum of squares, sum of absolute value of residuals, sum of percent errors, or any other. Eqn. 6 below minimizes the sum of the absolute residuals. ABS(y-(a+b*x)) Five betas will be estimated for each company, one for each regression methodology. The best regression methodology will be determined by using absolute measures of fit. Table 1 displays beta coefficients for regressions from ten large companies, a subset of the result of time series regressions for all SP500 constituents. The regression results for all 500 companies are contained in Appendix 1. Notice that the coefficients differ substantially depending on regression method.

Results
To compare model fits between the five algorithms, a common regression goodness of fit metric is needed. Any measure of fit involving squared deviations is not appropriate, as the results are highly sensitive to outlying observations. Thus, metrics such as R-squared, standard deviation of residuals, and the like are suspect. One metric which does NOT employ squared deviations, and the metric to be used here to compare fits, is the inter-quartile range of residuals. Table 2 displays the regression coefficients, standard deviation of residuals and inter-quartile range of residuals for each of the five regression methods depicted in Fig. 1 for BMY, i.e. Bristol Meyers Co. Notice that the standard deviation of the residuals is least for OLS when comparing the five methods. This should come as no surprise, as minimization of the standard deviation of the residuals is the objective function for OLS. However, the standard deviation of residuals for each method is suspect, as it relies on squared residuals. A large residual will disproportionally influence beta (up or down) and raise the standard deviation of residuals. The standard deviation is only accurate and relevant in the instance of a normal distribution. Also, notice that the skew metric for OLS residuals is greatest. This indicates a bias in fit.Notice that LMS, provides the minimum inter-quartile range (IQR). The IQR is defined as Q 3 -Q 1 , i.e. the upper limit of the 3 rd quartile minus the upper limit of the 1 st quartile. The IQR of the residuals gives the range of the middle 50% of the observations and is unaffected by outlying observations. The IQR is higher than the standard deviation, as one standard deviation constitutes 34% of a normally distributed variable while the IQR constitutes 50% of the same distribution.
This research adopts the decision that the best regression algorithm is the one that generates the minimum median IQR across all 500 regressions. In the case of BMY, least median squares is the optimum regression algorithm to measure beta.   Notice that minimum sum regression has the lowest IQR for four companies of the ten.Tables 4 displays the medians of all inter-quartile ranges for the SP500 regressions. Notice that MS, i.e., minimum sum regression, generates the smallest median IQR of the residuals of the regressions. The conclusion is that minimum sum regression generates the best betas for SP500 equities. This is not to suggest that MS is the best method of robust regression under all regression exercises or even in all circumstances related to portfolio management. Table 5 displays a copy of Table 1 from above augmented by the addition of the equal-weighted (10% each) portfolio betas. The weighted average betas differ significantly. We concluded above that MS is the superior algorithm to estimate betas. Choosing minimum sum regression instead of others yielded a weighted average beta of 1.054.

C Conclusions
This research evaluated five regression algorithms to identify which algorithm best measures beta, i.e. the sensitivity of the security return to the market and the measure of systematic risk. The five regression algorithms evaluated were OLS, lmRobMM, least trimmed squares, least median squares and minimum sum regression. The superior algorithm was determined to be minimum sum regression on the basis of minimizing the median of the interquartile ranges of the residuals over the aggregate of the SP500 constituent beta regressions.