Code: Select all
company date hbVol wikiVol stockVol0 ------------------------------------------------ 1 1 200 150 2423325 1 2 194 152 2455343 . . . . . 1 103 205 103 2563463 2 1 752 932 7434124 2 2 932 823 7464354 . . . . . . . . . . 86 103 3 55 32324
A Durbin-Watson of 0,276 suggest significant autocorrelation of the residuals. The residuals are, however, bellshaped, as can be seen from the P-P plot below. The partial autocorrelation function shows a significant spike at a lag of 1 to 5 (above upper limit), confirming the conclusions drawn from the Durbin-Watson statistic:
The presence of first-order autocorrelated residuals violates the assumption of uncorrelated residuals that underlies the OLS regression method. Different methods have been developed, however, to handle such series. One method I read about is to include a lagged dependent variable as an independent variable. So I created a lagged `stockVol1` and added it to the model:
Now, Durbin-Watson is at an accceptable 2,408. But obviously, R-squared is extremely high because of the lagged variable, see also the coefficients below:
Another method I read about when being confronted with autocorrelation, is autoregression with Prais-Winsten (or Cochrane-Orcutt) method. Once performing this the model reads:
This is what I don't understand. Two different methods, and I get very different results. Other suggestions for analyzing this data include (i) not including a lagged variable but reformat the dependent variable by differencing (ii) perform AR(1) or ARIMA(1,0,0) models. I haven't calculated those because I am now lost on how to proceed because of the different results of the two tests I did perform.
What model should I use to perform a proper regression on my data? I'm very keen on understanding this, but have never had to analyze a timeseries dataset like this before.