“Optimized consortium formation through cluster analysis”

Some problems cannot be solved optimally and compromises become necessary. In some cases obtaining an optimal solution may require combining algorithms and iterations. This often occurs when the problem is complex and a single procedure does not reach optimality. This paper shows a conglomerate of algorithms iterated in tasks to form an optimal consortium using cluster analysis. Hierarchical methods and distance measures lead the process. Few companies are desirable in optimal consortium formation. However, this study shows that optimization cannot be predetermined based on a specific fixed number of companies. The experiential exercise forms an optimal consortium of four companies from six shortlisted competitors.


Introduction 
Combinations of entities for working together are sometimes inevitable.The aim of combining is to create synergies for improved performance.These combined entities do not always result in the outcome desired.Hence, it is vital that when such combinations are formed, mechanisms should be designed to enhance that they perform at the required levels.Among the common and also important combinations that have shown failures in recent times are the public-private partnerships and consortia (Joshi, 2010).A consortium is a conglomerate of several entities working together towards a collective objective (Dolnicar & Lazarevski, 2009).Companies form a consortium through the process of clustering almost daily.Some consortia succeed while others fail (Larson et al., 2005).Partnerships fail because of 'lack of chemistry' between the component entities (Koti, 2006).This study designs an attraction in consortium formation.Using systematic methods enhances consortium success, and not using them heightens chances for consortium failure.This paper applies cluster analysis to form optimal consortia.

Cluster analysis techniques
The purpose of cluster analysis is to discover a system of organizing entities into groups in which group members share properties (Seo & Shneiderman, 2002).This study applies cluster analysis to determine optimal consortia.Involved companies should form synergies.In the sense of this paper, attributes in the optimal consortium should possess the best possible performance promise.
1.1.Fundamental cluster analysis steps.Cluster analysis starts from a proximities matrix of the items to be grouped (Everitt, Landau & Leese, 2001).It combines items such that grouped items do not include duplications, rather, items should complement one another (Kaufman & Rousseeuw, 1990).Grouped items should add value by enabling synergies, and strengths of one item should offset weaknesses of others.Cluster analysis starts with a data matrix displaying the column of items and rows depicting criteria (or attributes) that is converted into a proximities matrix (Dhillon & Modha, 2001).The proximities matrix shows proximity values of the different items while its diagonal elements are all zero to indicate no distance between an item and itself.Two ultimate tasks are imperative.Firstly, a decision is required about the items to be gathered for inclusion.Secondly, the method to apply in combining multiple measures into a single similarity measure between the items should be decided.A typical data matrix takes the normal known form of rows and columns as follows: If the number of items for a cluster/consortium is not known beforehand, hierarchical linkage methods are useful (Kraskov et al., 2003).The linkage methods are discussed next, followed by distances.
Single linkage: Single linkage is based on clustering items that are nearest neighbors.It calculates the distance between two items as the minimum distance between any two items.
Allocation is done for the first pair which shows the minimum distance, and the scores are combined as applicable.Then the distances for the new groups are calculated again.The next allocation is based on the shortest distance that emerges at this stage.The process is repeated for the subsequent steps.
Clustering is considered complete when an optimal state is achieved.

Complete linkage:
Complete linkage is the opposite in use of distance to the single linkage as it uses the furthest neighbor as the criterion for clustering items.It also starts by calculating the distances between pairs of items in each step.It then groups together items that show to have the maximum distance.

Average linkage:
The procedure of average linkage is similar to the single and maximum linkage methods, but considers the average distance.It computes the distance between subgroups at each step as the average of the distances between the two items.The procedure continues as for the previous linkages methods, and the process stops when optimality is achieved.

Mahalanobis distance
The Mahalanobis distance (MD) measures the distance between two correlated variables (Weisstein, 2003).Let    μ, N p be the probability density function of the normal distribution.The MD measure between x and y in the p-dimensional space is given by: 21 (, ) ( ) ( ) .

Geometric distance
Geometric distances are often measured in the Euclidean space, where a distance is a numerical description of the way items are lying far apart (Seker, Altun, Ayan & Mert, 2014).Distance is a metric function to describe that items are "close to" or "far away from" each other.In real numbers, metric distance between x and y satisfies the conditions: Analytic geometry definition of the Euclidean distance between two items is:

Manhattan distance
When p = 1, the following formula results are the Manhattan distance:

Euclidean distance
The case p = 2 yields the Pythagorean Theorem generalization: .
The p-norm

Chebyshev distance
The Chebyshev distance is defined by:

Matthews correlation coefficient
The machine learning description of the Matthews correlation coefficient (MCC) is that MCC is a measure of the quality of binary (two-class) classifications (Perruchet & Peereman, 2004;Powers, 2011).The MCC allows for true and false positives and negatives.Fawcelt (2006) describes the MCC as a correlation coefficient between the observed and predicted binary classifications ranging from −1 to +1.Let n be the total number of observations.The MCC statistic (or the phicoefficient) is: 1.3.Fuzzy logic techniques.

Decision making under pure uncertainty
Personality type and decision making work together.People often make decisions due to their inner influences (Triantaphyllou, 2000).When a person controls a system fully, and is influential, their approach tends to depend on their 'basic' expectations.Focus is on 'pessimists' and 'optimists' to complement the methods presented earlier.'Pessimism' expects that bad things always happen, and considers the possible worst cases of all the alternatives.It starts by selecting the alternative with the minimum payoff, and then selects the maximum of the minima (MaxMin process, or maximizing the minimum possible gain).'Optimism' is the approach of maximum of the maximum gains (MaxMax process, or maximizing the maximum possible benefit).

Regret approach
The approach in decision making is to minimize risks (Sharma, 2006) A smaller CV indicates more reliability.Thus, data with a smaller CV are more stable (i.e.lower risk).

Data
In an exercise in which a tender invitation was issued, several companies were evaluated using scores on nine project attributes to determine the winner.Each attribute was judged out of 100.No competitor was found adequate, but some had attributes indicating promising performance on some aspects of the identified project.Also, when combined, they contained all the desirables.The scores of the top six shortlisted companies were used to form consortia in identifying the optimal consortium.3. Findings

Company mean strengths.
The averages (also expected pay-offs) of the points awarded to each company are considered at this stage to determine the rated performances of these companies.The 4 th column of Table 3 shows average scores of company strengths.Merit order is C1 (45.11 points); C4 (44.67 points); C3; C2; C6; and C5.Leader C1 has most attributes in which it leads all the others, but performs poorly at attributes A2, A3 and A8.It is the winning candidate, but should be clustered to offset its weak parts.At C1's weakest attributes, C3 leads at A2 and A3; C6 leads at A8, and C4 leads at A1. C2 and C5 are not leading at any attribute, and cannot improve the weaknesses of other companies.They are candidates for exclusion in any consortium.Initial possible cluster pairs could be (C1:C3), (C1:C6) and (C4:C6).

Consideration of relative company stability.
CVs measure the stabilities of the companies (Table 3).

Interpretation of the ANOVA table
The null hypothesis of ANOVA test is that the means are all equal.The ANOVA table (Table 4) indicates that the values of the points awarded to the companies are not significantly different, based on the p-value exceeding 0.05.The table also does not indicate the differences in the strengths obtained on the attributes during shortlisting.This information therefore, shows that the mean strengths of the companies are not significantly different.

Applying distance measures.
The p-norm distance for the current problem is: Applying this on the data matrix the proximities matrix is: These two best items do not define optimal consortia.The first one is outwitted at attributes A1 to A3, A6 and A8.Elsewhere in other attribute the second one is outwitted.The aim is to find optimal solution to consortium formation with the smallest possible number of companies.Since at this stage optimality has not materialized, the process continues.The new data matrix of strengths becomes: Again, none of these two consortia is optimal.The first one is outwitted at attributes A2, A3 and A8.

Complete linkage: From
Elsewhere in other attributes the second consortium is outwitted.
Average linkage: From Table 7, similar members are C3 and C6.Then the next proximities matrix is obtained by merging C3 with C6, forming cluster C3:C6.The new distance between other companies and the cluster is the average distance between any company with C3 or C6: The next clusters, based on closest proximity, are C1:C4 and C3:C6.The next cluster, based on closest proximity, is C3:C4:C6.

MinMax approach
The approach requires identifying lowest performers in each attribute.C1 is not leading at attributes A1, A2, A3, A6 and A8; C3 is not leading at attributes A1 and A4 to A9; C4 is not leading at attributes A2 to A5, and A7 to A9; and C6 is not leading at attributes A1 to A7 and A9.Thus, in a consortium including these companies, these companies cannot lead activities related to the attributes in which they underperform.Rather, they can be considered for transfer of skills from leading companies in these attributes.The next step focuses on the optimist's approach.
Optimist's approach In attributes where a company performs below average, it cannot be used for that attribute while those on above average, a company could be considered for inclusion on the bases of that attribute.At an attribute where a company shows zero regret, the company should be considered on the bases of that attribute.The minimum EOL is obtained at C1.Thereafter, the order on merit of the next list is C4, C3, C6.

Consortia formation
Pair-based consortia: The strongest companies in descending order are C1, C4, C3, C6.The correlation of C1 and C4 is 0.51.The proposed consortium has the attributes below: The consortium shows an improved strength with mean 52.1 compared to strength 45.1 of C1 and 44.7 of C4.The CV = 0.35 of cluster C1:C4 is lower than those of its components (C1 has CV = 0.49; C4 has CV = 0.42).Hence, the cluster has improved strength and improved stability.Despite these developments, this consortium is deficient at attributes A2, A3 and A8 when compared to the possibilities of strengths from other companies discussed earlier.The assessment starts with consortium C3:C4.The correlation of C3 and C4 is 0.1966.The proposed consortium follows: The strength of proposed consortium is 60.0, which is higher than strength 43.2 of company C3 and 44.7 of C4.The CV = 0.15 of cluster C3:C4 is less than those of its components.Thus, this cluster also has improved strength and improved stability.Despite the strong points shown, this consortium is still suboptimal at attributes A4, A5 and A7 to A9 as compared to the possibilities of strengths from other companies.Next is consortium C1:C3.The correlation of companies C1 and C3 is -0.7318.

Consortium C1:C4.
The proposed consortium has: The strength of proposed consortium is 60.0, which is higher than strength 45.1 of company C1 and strength 43.2 of company C3.The CV = 0.12 of this cluster is lower than those of its components.This cluster too, has improved strength and improved stability.This consortium is still suboptimal at attributes A1, A6 and A8 as compared to the possibilities of strengths from other companies.
3.4.6.Consortium C3:C6.The correlation of C3 and C6 is 0.7115.The proposed consortium has: The strength of proposed consortium is 45.7.It is only slightly higher than strength 43.2 of C3 and 44.7 of C4.The CV = 0.27 of this new cluster is higher than CV = 0.24 of C3, but lower than CV = 0.42 of C4.Hence, the strength of this cluster cannot be said to be convincingly better while the stability has also not improved.In addition to the weaknesses shown, the cluster is also sub-optimal at attributes A1, A4 to A7 and A9.The next is consortium C4:C6.The correlation of C4 and C6 is -0.8027.The proposed consortium has: The correlation of C1 and C6 is -0.8920.The proposed consortium has: The strength of proposed consortium is 58.0, which is higher than strengths 45.1 of company C1 and 32.2 of C6.The CV = 0.10 of the cluster is lower than those of its components.Hence, the cluster has improved strength and improved stability.This consortium is sub-optimal at attributes A1 to A3 and A6.Thus the pairs of consortia cannot give an optimal solution.There can be consortia of more than two companies.The next discussion proceeds to cases of more than two companies.The above accounts lead to the possibilities of the consortia pairs: C1:C4; C3:C4; C1:C3; C3:C6; C4:C6; C1:C6.

Observations
One observation about the correlation of the consortium formed is that being low or negative does not anything imply regarding the strength of the consortium.Cluster C3:C4 had a low positive correlation while C1:C6 had negatively high correlated companies.The two companies formed a strong consortium.Almost all the consortia showed improved stabilities (lower CVs) and stronger than the original members.Cluster C3:C6 was less stable.Also, all the pairs showed to be sub-optimal as some attributes were still suboptimal.Based on the mean strength, the strongest consortium was C3:C4.However, this consortium was not the most stable according to CV. Cluster C1:C6 was the most stable, and second strongest according to mean strength.This consortium is not optimal because some of its attributes were outwitted by corresponding ones of other companies.The observation made is that possible consortia in paired cases are such that the correlation of consortium members showed a high positive correlation coefficient (> 0.5) with the CV between the same members.This trait can be investigated further in another study.

Top three consortia of pairs of companies.
The strongest consortium C3:C4 has mean strength of 60.0.It was formed from companies that initially had a low positive correlation of only 0.20.This consortium is sub-optimal at the attributes A4, A5, A7, A8, A9.The second strongest consortium C1:C6 has mean strength of 58.0, formed from companies that initially had a high negative correlation of -0.89.This consortium is also sub-optimal.The attributes identified to be sub-optimal are A1, A2, A3, A6.The next strongest consortium C1:C3 has mean strength of 57.8, formed from companies that initially had a high negative correlation of -0.73.This consortium is also sub-optimal at attributes A1, A6, A8.

Consortia of more than two companies
The idea was to form a cluster of more than two companies.The consortium starting with cluster C1:C6 is inevitable since it was explained that C1:C3 can address the sub-optimality problem.Hence, the new consortium is C1:C3:C4.Strength of proposed consortium is 60.7.However, sub-optimality occurred at attributes A1 and A2.For these two attributes, C3 in particular, showed no regrets, and can be examined.The solution sought is a consortium that maximizes all the possible benefits and minimizes all the detriments to the level at which it is practically possible.Consortium C1:C3:C4:C6 is optimal.It possesses all the maximum benefits in each attribute.Its performance shows an increased strength.It also has the smallest CV.Thus, the optimal consortium derived for this study is C1:C3:C4:C6.

Discussion
Each method was able to identify strong and weak companies as well as weak and strong consortia.However, no single method was able to provide an optimal consortium.Iterations and amalgamations of distance and hierarchical clustering algorithms were necessary to verify that the weak consortia identified were indeed weak, and that the strong ones were indeed strong.Only this dynamic approach could provide an optimal consortium.

Conclusion
The logical iterations and conglomeration of various methods showed consistency in identifying strong and weak consortia.The methodical approach resulted in a dynamic, efficient and effective result.This approach showed to be crucial in optimization of the ultimate consortium formed.

Recommendation
Care should be taken during formation of consortia or other partnerships aimed at delivering results.A consortium should not be formed from unsubstantiated or speculative standpoint.The study recommends that for the purpose of clustering, application of cluster analysis should combine several different methods, and allow logical iterations.

Table 1 .
Data matrix format desirable, but they should be adequate to possess a desirable number of attributes for the tasks required.

Table 2 .
Data matrix

comparison of mean strength of companiesTable 4 .
ANOVA performer C5 is excluded.The weak attributes in high performer C1 are points of weakness requiring to be strengthened.Low performer C6 performs extremely high at attribute A8.This could offset the A8 weakness of partner companies when included in a cluster.Statistical

Table 5 .
Correlation matrix of companies

Table 6 .
Proximities matrix for companies

Table 10 .
Single linkage-based data matrix

Table 13 .
Complete linkage-based data matrix

Table 24 .
Proximities matrixComplete linkage clustering leads to the next table.

Table 25 .
Single linkage cluster matrix

Table 26 .
Summary of consortia formation

Table 28 .
Data matrix

Table 31 .
Regret matrix with EOL

Table 32 .
Performances of C1 and C4

Table 34 .
Performances of C3 and C4

Table 36 .
Performances of C1 and C3

Table 38 .
Performances of C3 and C6

Table 44 .
Summary of consortia

Table 45 .
Summary of consortia