I am conducting a thesis in which I study the relationship of 3 independent variables to 1 dependent. In addition, I use 3 metric control variables and 2x3 dummy variables that represents 2 different categorical variables with each 4 all variables are pooled for 36 firms (cases), it are 36 firms because this study is conducted in the us airline industry.
Because all these variables are pooled for 4 different years, 2004 2005 2006 and 2007, I intended to study the regression using glm repeated measures from spss.
The difficulty that I currently walk into is that the three independent variables, important for my hypothesis, correlate above .9 with each other. These high levels of multicollinearity will bias the direction and signficance of the coefficients and therefore I will have to solve this issue.
I found a way in the principal component regression analysis, in which the independent variables will be transformed using factor analysis into uncorrelated principal components. And next using OLS lineair regression the relations will be tested to the dependent variable using spss.
1 - should I create uncorrelated principal components for ALL independent variables, including the control variables and dummies? OR should I only create components for variables that indicate multicollinearity?
2 - Spss highlights that using principal component regression after creating components the ols lineair regression in spss need to be used to measure the regression. Although, doing so will force me to ignore the repeated measure option from GLM.
Therefore, should I conduct the lineair regression seperated for each year using the principal component linear regression analysis per year? OR is it also possible to compute uncorrelated principal components for each year's variables and combine them in conducting the glm repeated measures analysis?
I would be very pleased if somebody could help me because my questions are so specific that I am not able to find the answers in spss tutorials, help desks, forums or others. my problem combines the difficulty of identical cases over different time periods and overcoming multicollinearity using PC regression.
I already tested a simple lineair regression after using PC factor analysis for the data from 2007, the results show nice beta coeffcients that are significant with tolerance and vif values of 1. Although the adjusted R2 is even with only my three hypothesized independent variables above .94.. Which is so high, that I question if this is correct. Is it possible that the uncorrelated principal components affect the interpretation of the coefficient of determination (R2)?
Thanks in advance for any help, and answers to my questions.