Do Coherent Risk Measures Identify Assets Risk Profiles Similarly? Evidence from International Futures Markets

The authors consider Lévy processes with conditional distributions belonging to a generalized hyperbolic family and compare and contrast full density-based Lévy-expected shortfall (ES) risk measures and Lévy-spectral risk measures (SRM) with those of a traditional tail-based unconditional extreme value (EV) approach. Using the futures data of leading markets the authors find that ES and SRM often differ in recognizing the risk profiles of different assets. While EV (extreme value) is often found to be more consistent than Lévy models, Lévy measures often perform better than EV measures when compared with empirical values. This becomes increasingly apparent as investors become more risk averse. Sharif Mozumder (Bangladesh), M. Humayun Kabir (New Zealand), Michael Dempsey (Vietnam) BUSINESS PERSPECTIVES LLC “СPС “Business Perspectives” Hryhorii Skovoroda lane, 10, Sumy, 40022, Ukraine www.businessperspectives.org Do Coherent Risk Measures Identify Assets Risk Profiles Similarly? Evidence from International Futures Markets Received on: 20th of August, 2017 Accepted on: 4th of October, 2017


INTRODUCTION
In the last few decades, we have experienced an increased level of financialization and securitization, and convergence between banking, insurance and security markets. This trend has led to a significant increase in financial risk and unpredictability of extreme events resulting in large losses faced by individual and institutional investors. Under such circumstances, the determination of the level of risk, and the management of risk have become even more challenging. With the advent of Value-at-Risk (VaR), both academicians and practitioners have been trying to devise models to measure risk more effectively. The VaR as a popular risk measure has the advantage of simplicity, but comes with inherent weaknesses. Specifically, the VaR does not satisfy the subadditivity requirement, which is an important property for a coherence risk measure (Artzer et al., 1999). VaR fixes tail events corresponding to a given confidence level, and considers the conditional likelihood of tail events while ignoring the actual size of extreme catastrophic events. Thus, VaR gives a partial snapshot of potential losses and fails to take into account the actual size of extreme losses after the point of cut-off.
In order to overcome this weakness and to ensure the subadditivity (and hence coherence) requirement is met, an expected shortfall (ES) measure has been proposed. ES estimates the potential loss by averag-ing all the possible losses in the tail of the distribution 1 . However, the ES gives all tail losses an equal weight implying that the individual is risk neutral at the margin between better and worse tail outcomes (Grootveld & Hallerbach, 2004;Cotter & Dowd, 2006). On the other hand, the spectral risk measure (SRM) proposed by Acerbi (2002Acerbi ( , 2004) is independent of any particular extreme event and requires that catastrophic tail events and usual non-tail events have different weights and that the weight of the catastrophic tail events are allowed to vary according to how averse an investor is towards the risk. In contrast, extreme value (EV) models use only the data remaining in the tail of the distribution after the cut-off point. The SRM provides investors with the flexibility to choose their individual degree of aversion to risk; but this flexibility comes at a greater computational cost 2 Cotter and Dowd (2006) use tail density based extreme value ES and SRM risk measures and compare the precision of the estimates of these risk measures. They find the ES standard errors are higher than the VaR for S&P 500, FTSE100, DAX, and Hang Seng except Nikkei 225 futures contracts. However, the ES has higher coefficient of variations and narrower confidence intervals suggesting that they are more precisely estimated. On the other hand, the SRM has significantly wider confidence intervals than the VaR and ES.
In this paper, we focus on estimating coherent risk measures ES and SRM for both Lévy and EV models. A Lévy process is characterized by stationary independent increments and the distribution is infinitely divisible so as to represent skewness and excess kurtosis in the data. Moreover, Lévy models use the entire data or full density to estimate the model parameters in contrast to an EV model that uses only the tail density of the distribution. 1 Recently, Acharya et al. (2010Acharya et al. ( , 2012 have introduced marginal expected shortfall (MES) as a measure of losses faced by a firm in the tail of the aggregate sector's loss distribution, as well as a systemic expected shortfall (SES), which increases with the firm's leverage and with its expected loss in the tail of the system's loss distribution. For more detail on the estimation, see Brownlees and Engle (2012). 2 The computational issues are discussed by Kevin and Cotter (2006) in the context of an extreme value (EV) approach. Specifically, the authors evaluate the integrals associated with the calculation of VaR, ES and SRM. 3 We consider futures data on the S&P 500, FTSE100, DAX, Hang Seng and Nikkei 225 indexes. The discussion on data is presented in section 2. Formally,

( )
ES α is restricted to the tail at the extreme end of the density distribution (with the confidence level, α value is as high as 0.95 or even 0.99): where R is the Arrow-Pratt coefficient of absolute risk aversion), which is not only restricted to such a tail, but also embraces the data outside the tail. Cotter and Dowd (2006) focus only on EV-ES and EV-SRM to determine the clearing house's margin requirements.
We investigate the performance of full density based Lévy-SRM and Lévy-ES risk measures, and compare the results to the tail density-based EV-SRM and EV-ES. To the best of our knowledge, this is the first paper to apply full density-based Lévy-SRM and Lévy-ES risk measures to international futures markets.
We discuss the computational challenges that arise in the implementation of Lévy models for estimating the SRM. We then conduct detailed empirical analysis with international futures markets 3 to determine whether coverage based coherent risk measure ES and risk aversion based coherent risk measure SRM provide similar risk scenarios.
The Lévy approach, though mathematically elegant, comes with a major drawback, in that with few exceptions, there are no closed form formulae for risk measures. As such, even a relatively straightforward VaR estimation is difficult to implement. As the risk measures ES and SRM are compounded versions of VaR, their implementation is even more difficult. Our approach follows a procedure of fixing the tail as applied in EV calibration followed by calculation of ES and SRM. We then use the Lévy models from generalized hyperbolic class with calibration based on the entire data and calculate ES and SRM for these models.
The paper is structured as follows. Section 1 briefly describes the Lévy and EV frameworks. Section 2 provides the initial data analysis. In section 3 we discuss conceptual matters regarding estimation and bootstrapping of ES and SRM. In Section 4, we discuss goodness of fit under Lévy and EV models. Section 5 presents the analysis of estimates of ES and Lévy-SRM models. Section 6 describes the empirical findings. Last section concludes the paper. 4 The theory of Lévy processes can be found in Bertoin (1996), Sato (1999), and Kyprianou (2006), amongst others.

CHARACTERIZATION IN LÉVY FRAMEWORK
The characteristic function of a stochastically continuous process starting at zero and with stationary independent increments can be written as Equation (1) Thus, an inverse Fourier transform on any time scale can be used to numerically obtain the transition density from the characteristic function (1) with the Lévy measure of the process. The numerical transition densities can then be used to estimate the risk measure under different model assumptions.
In this paper, our interest is limited to those members of the generalized hyperbolic (GH) family of Lévy processes that have been widely used in financial modeling. The Lévy process has extensively been used in option analysis (German,  Later Eberlein and Prause (2002) and Prause (1999) studied the whole family of GH distributions as a tool to model log-returns of financial assets. Some of its subclasses were separately studied in a financial context. Eberlein and Keller (1995), Bingham and Kiesel (2001) studied the hyperbolic distribution (HYP) and Barndorff (1995) applied the normal inverse Gaussian (NIG) to financial data. Eberlien and Hammerstein (2002) provide a complete and useful overview of limiting cases for this rich family of processes. We focus on a subclass of Lévy processes -variance gamma (VG), normal inverse Gaussian (NIG), hyperbolic distribution (HYP) and generalized hyperbolic (GH) 5 . Restricting ourselves to the subclasses of variance gamma (VG), normal inverse Gaussian (NIG), hyperbolic (HYP) and generalized hyperbolic (GH) allows us to obtain either the transition densities across time for processes closed under convolution, or at least the densities at time 1, t = for those which are not closed under convolution. Furthermore, in our empirical section, authors use daily return data for the indices under consideration and maintain a time scale in days, so that 1 t = in equation (3), which ensures that we are not required to use any inversion to obtain the transition densities numerically even when the underlying distribution is not closed under convolution.
Let, 11 log( / ) tt X SS + = for any non-negative integer t and characterized by the Lévy-Khintchine formula in equation (1). For our models, the equivalent processes are given more effectively by their densities.
The availability of closed form densities makes it easier to obtain the standard errors of each parameter through Fisher's information matrix.
The competing approach to Lévy idea for this paper is the extreme value (EV) model, which incorporates only extreme returns in calibration. As explained in Dowd (2005), and subsequently as applied in Cotter and Dowd (2006), perhaps the most elegant approach to such objectives is to utilize the peaks-over-threshold (POT). The essence of the POT approach lies in the fact that as the threshold u becomes larger, the distribution of exceedances converge to a two parameter generalized Pareto (GP) distribution: The parameters ξ and 0 β > are, respectively, shape and scale parameters, contingent upon the threshold . u

INITIAL DATA ANALYSIS
Our analysis is based on future contracts return data. More specifically, we study the returns based on the heavily traded S&P 500, FTSE100, DAX, Hang Seng and Nikkei 225 indices for the period from January 1, 1991 to December 31, 2003 collected from Datastream. The data refer to futures contracts that expire in the following trading months and rollover from one expiring contract to the next one at the start of each trading month. When dealing with bank holidays, Data stream considers padding the dataset and takes the bank holidays end-of-day price to be the previous trading day's end-of-day price. Thus, we have the same number of daily returns for all contracts (3,392). Our selection of data and sample period is intentional so that we can compare the results with Cotter and Dowd (2006) who use same data and sample period for EV-ES and EV-SRM models. in each extreme tail plot is selected according to the extreme value theory and is discussed in Cotter and Dowd (2006). Clearly, the extents of extremity in return corresponding to various indexes are different and the visual goodness-of-fit of various models, both tail-based and full density-based, are also clearly distinct.
In Table 1, we replicate the unconditional maximum likelihood estimates of futures indexes of the generalized Pareto (GP) distribution of Cotter and Dowd (2006), which provides a good fit to the data both for long and short positions. The tail indices are positive except for the Nikkei 225 and the estimated scale parameters fluctuate around 1. Table 1 also provides assumed thresholds , u the associated number of exceedances ( ) u N and the observed exceedance probabilities (Prob). While the numbers and probabilities of exceedances change, the assumed thresholds are in the stable tail-index regions based on the tail index plots.
The tail-based calibration provides significantly different estimates for long and short positions for the extreme value GP model, as it makes use of only tail observations. For tail based EV when the left skewed density become right skewed, the tail observations for long and short positions could be significantly different in numbers and hence significantly affect the estimates. As a result, we find tail asymmetry of long and short positions under an extreme value model in Table 1. Evidently, the same cut-off  In contrast, Lévy based calibration makes use of the complete data of short and long positions causing the densities to be reflected along the y-axis, and long and short positions just alter the sign of the parameter characterizing the skewness of the model. Thus, while a particular model, for example, VG with a long position gives a left skewed density, the shape of the density remains the same but becomes right skewed for a short position. Long and short positions thereby correspond to a sign change of the skewness characterizing parameter. Only the return vector, which is used in estimation for a long position gets multiplied by (-1) before being used in estimation 6 In relation to all the models, it can be said that under turbulent market conditions, investment with a short position is riskier than investment with a long position.
corresponding to a short position. The net effect is that the estimated density gets reflected along the yaxis. As a result, only the skewness controlling parameter has the sign change corresponding to long and short positions 6 . Table 2 presents the conditional maximum likelihood estimate of parameters for all five indexes for the four separate Lévy models. We report only the estimates of long positions for the brevity in the table. The skewness parameter is θ for VG, β for NIG, HYP and GH models. For short positions, this parameter is negative.
While the tail masses for observations in excess of thresholds are observed to be different for EV and Lévy models, different Lévy models that correspond to the same threshold, the tail masses exhibit further differences. These differences possibly explain how different Lévy models feed informa- tion from observations outside the tails in fitting the tails. As a result, the corresponding quantiles of extreme-value and Lévy models do not lie along a vertical line. To illustrate the differences between the two approaches, we use the same number of tail observations and compare the QQ plot of EV with each of the Lévy models separately as illustrated in Figures 1 to 5.
At the very extreme tail, there is a clear evidence of deviation between EV and Lévy quantiles, and this deviation is smaller for EV in most cases.
Specifically, EV provides a better fit for S&P 500, FTSE100, DAX, Hang Seng. Furthermore, for these indices, we see that NIG and GH provide a better fit on the tail than is the case for VG and HYP models. However, for the Nikkei 225, we observe the opposite feature. A close look reveals that while S&P 500, FTSE100, DAX, and Hang Seng indices show a greater fall in price than for a rise, Nikkei 225 shows a greater rise in price than a fall during the sample period. To visualize tail fits of the models, we present the generalized Pareto EV tail with each of our considered Lévy models separately. Figures 1 to 5 show the tails for S&P 500, FTSE100, DAX, Hang Seng and Nikkei 225, respectively. We obtain the EV quantiles in excess of thresholds and then obtain the corresponding quantiles from the Lévy models. In other words, we do not fix the tail mass, but instead fix the thresholds. The consequence is that some of the Lévy quantiles closed to EV thresholds are, in fact, somewhat less than the thresholds in magnitude. This means that the Lévy tails are slightly fatter than the EV tails. This, in turn, explains the difference in tail masses covered by EV and Lévy models as reported in Tables  1 and 2.

ESTIMATION OF RISK MEASURES:METHODOLOGY AND PERFORMANCE
Apart from few specific cases, VaR in general is obtained as the solution of the quantile-integral equation: where α is the coverage level.
As in the GP model, the significantly high th α quantile, which is also VaR at a high confidence level , α is given by: and the expected shortfall (ES) with a coverage to the level of α is: In equation (11), n is the total number of observations and u N is the number of observations which exceeds the threshold .
u Expected shortfall (ES) is estimated using the following equation:  (15) The ES formula for other Lévy models can also be obtained by considering other densities from section 1.
Spectral risk measures, however, do not rely on any particular confidence level. Instead, given a parameter characterizing the degree of investors' risk aversion, they consider the entire spectrum of losses. For our benchmark EV model, the closed form VaR formula provides a relatively simpler expression for SRM: In the case of Lévy models, however, computation of SRM is very time consuming in regard to the closed form VaR measure: The variance gamma SRM model can, then, be obtained from the equation: The SRM estimates of other Lévy models can be obtained by considering respective densities in equation (18). The φ symbolizes that the SRM is calculated using the exponential risk aversion function Even with 100 resampling, we find that a machine sophisticated configuration takes several hours to provide the SE and CI for ES from a Lévy model, corresponding to a confidence level. The same is true for SRM with each particular choice of risk aversion parameter. However, as Cotter (2006) reports, SE and CI with 5000 resampling for an EV model -which has closed form expressions both for VaR and ES and where closed form VaR allows SRM to be calculated equally well in seconds -determines that the difference between 100 and 5000 resampling is not significant between VaR and ES. However, in case of SRM, the difference is enormous. This is because, in addition to considering a small number of resampling, we evaluate the integral in SRM by considering only 100 slices. This makes the estimation performance of SRM comparable only between Lévy models.
The parametric bootstrap is applied to obtain the standard errors (SE) and confidence intervals (CI) of each risk measure. However, as we are dealing with Lévy models, which have no closed form expressions for risk measures, it is infeasible to use bootstraps with a large number of resampling. For each resample we draw the same number of uniform (0.1) random numbers as sample size and after sorting them in ascending order, we find the relevant quantile corresponding to the coverage level. This quantile is then used as the bootstrap coverage level, corresponding to which we obtain the VaR and ES as given by equations (10) and (14). Since, for a given bootstrap coverage level, this VaR equation needs to besolved numerically, the corresponding ES equation takes a long time to find a converging value. This is because any numerical scheme applied to obtain the ES searches the converging value by evaluating the integrand 'vector byvector' and for each element of a vector the VaR needs to be obtained as a solution of the quantile integral equation (10). Bootstrapped VaR and ES vary because of the variation in the bootstrapped confidence level. Thus, since SRM does not depend on any particular confidence level, in order to obtain bootstrapped estimates of SRM, we need to randomize the whole spectrum. We thus need to approximate the integral in (20)

GOODNESS-OF-FIT TESTS
The Anderson-Darling (AD) test is particularly suitable in assessing the performance of tail based risk management models for goodnessof-fit 8   We report the goodness-of-fit statistics in Table 3 using an AD and e AD ν test statistic. Though an AD test utilizes the entire dataset, the test puts more weight on tail observations. Thus, both AD and e AD ν are more informative regarding tail fit while hardly providing any information regarding the density fit far from the tail. Table 3 shows that both EV and full density Lévy models perform well on the tail, with EV appearing more reliable when tail fit alone is concerned. However, the fact that AD and e AD ν provide hardly any information regarding the fit far from the tails explains why models with better tail-fits fail to give any information regarding the quantification of risk measures that use quantiles far from tails for SRM models. This is demonstrated by the results in Table 4.

ESTIMATES OF ES AND LÉVY SRM
In this section, authors attempt to discuss and compare the estimates of ES and SRM that are based on GP and four chosen Lévy models. The calibrated parameters for each are as in Table  4. The ES risk measure is based on a high confidence level and the SRM risk measure is based on a large risk aversion parameter. We consider con- We now focus on the precision of the risk estimates. For the GP model, SRM standard errors are often higher than those of ES (clearly at higher risk levels) for all indices. Thus, we could argue that for GP model, ES estimation is more precise than that of SRM. Further, we note that ES estimates have higher coefficients of variation (estimated risk measure divided by standard error) for GP model than SRM estimates. This further suggests that GP-SRM estimation is more precise than that of ES estimation.
In Table 4, bootstrapped estimates of the 90% confidence interval for both the risk measures are included. For GP models across all the indices, these confidence intervals are narrower for ES estimates than those for SRM estimates, suggesting that risk exposures of ES are estimated more precisely, in general, for the GP model across indexes than is the case for the SRM. Furthermore, we also find that the estimated confidence intervals are symmetric for low confidence levels. However, at higher risk levels, confidence levels are asymmetric with the right bound further away from the mean of the bootstrapped estimates. Cotter and Dowd (2006) also find that EV-ES standard errors are higher than the VaRs and that VaR is estimated more precisely than the ES. However, the EV-ES has higher coefficients of variation than the VaRs suggesting that the ES is more precisely estimated. Similarly, the SRM estimates are less precise than VaR or ES estimates. We find that the ES estimates for four Lévy models across all the indices have narrower confidence intervals. Thus, the ES estimation, in general, based on Lévy models is more precise than that of GP.
Finally, the standardized confidence intervals for the SRM estimates are similar across different positions, but increase with the risk aversion parameters. Similar to ES, the confidence intervals for very high R value display asymmetry.
Furthermore, the SRM estimates have considerably wider confidence intervals than ES. Thus, the SRM estimation is less precise than that of ES. This discrepancy can be explained by the sample size. If we have n observations in the tail, the ES make use of these observations only. The SRM estimator, on the other hand, makes use of all observations and places considerable weight on a small subset of these tail observations and thus effectively uses a smaller sample size (Cotter & Dowd, 2005). Unfortunately, authors cannot investigate similar precision issues for Lévy models and/or between Lévy and GP models. This is because we do not have analytic formulae for either of the Lévy coherent risk measures, and subjective numerical implementations are not comparable with analytic implementations.

DISCUSSION
Full density-based Lévy models are considered for estimating the tail-based coherent risk measure ES (with coverage level as parameter) and entire distribution-based coherent risk measure SRM (with degree of risk aversion R as parameter), alongside the estimation of ES and SRM for a tail targeting EV model. Our findings reveal that between the approaches (Lévy and EV), neither is comprehensively superior to the other.
In Table 5, we report the frequency of significant EV and Levy estimates presented in Table 4 We find that the extreme value model of ES out performs Lévy models in terms of precision. This is in spite of the fact that the Lévy densities are calibrated on the entire data set and the EV models are calibrated using only the extreme observations. Thus, the tail-based risk measures of ES perform better if the model is calibrated on the tail alone. Additionally, SRM apparently incorporates all quantiles with corresponding probability mass spread on the entire spectrum. When compared to empirical values, the SRMs of Lévy models often outperform those of the EV model. To explain this feature, we note that SRM is not a tail based risk measure, whereas EV parameters are calibrated using only extreme quantiles. We observe that such a calibration could yield misleading quantiles, especially when outcomes fall in the most extreme end of the opposite tail. On the other hand, Lévy models that consider the entire data set in calibration are expected to generate consistent values of quantiles far from the extreme tail 9 .
The practical implication of our findings is that portfolio managers or hedge fund managers should consider Lévy SRM models, as they provide the opportunity to choose appropriate risk aversion parameters to determine the expected loss in the event of market turmoil. In addition, the SRMs of Lévy models often outperform those of the EV model when compared to their empiri- 9 Note that we consider only 100 slices in evaluating the integrals of SRM, as is uniformly considered for all models. Cotter and Dowd (2006) reports the SRM for EV considering one million slices. This is almost impossible to apply for Lévy models. Even so, the Lévy SRM with 100 slicing for numerical integration is in general fairly comparable with their EV SRM and stands superior for the NIG and GH models in particular. cal values. This becomes increasingly apparent as investors become more risk averse. We note that even though VaR is more precise than the ES risk measure, professional investors these days are more likely to use ES on account of that such measure has the benefit of coherence. Similarly, even though the SRM measure is less precise than the extreme value measure, it has the benefit of ascertaining investor's risk aversion while most closely matching losses empirically, especially in the case of the Lévy-SRM.