## How to perform pooled cross-sectional time series analysis?

Moderators: statman, Analyst Techy, andris, Fierce, GerineL, Smash

reveller
Posts: 8
Joined: Wed Jul 18, 2012 8:14 am

### How to perform pooled cross-sectional time series analysis?

For 86 companies and for 103 days, I have collected (i) tweets (variable `hbVol`) about each company and (ii) pageviews for the corporate wikipedia page (`wikiVol`). The dependent variable is each company's stock trading volume (`stockVol0`). My data is structured as follows:

Code: Select all

``````    company  date  hbVol    wikiVol   stockVol0
------------------------------------------------
1        1     200        150     2423325
1        2     194        152     2455343
.        .      .          .         .
1       103    205        103     2563463
2        1     752        932     7434124
2        2     932        823     7464354
.        .      .          .         .
.        .      .          .         .
86      103     3          55      32324
``````
As I understood, this is called pooled cross-sectional time series data. I have taken the Log-value of all variables to smoothen the big differences between companies. A regression model with both independent variables on the dependent `stockVolo` returns:

A Durbin-Watson of 0,276 suggest significant autocorrelation of the residuals. The residuals are, however, bellshaped, as can be seen from the P-P plot below. The partial autocorrelation function shows a significant spike at a lag of 1 to 5 (above upper limit), confirming the conclusions drawn from the Durbin-Watson statistic:

The presence of first-order autocorrelated residuals violates the assumption of uncorrelated residuals that underlies the OLS regression method. Different methods have been developed, however, to handle such series. One method I read about is to include a lagged dependent variable as an independent variable. So I created a lagged `stockVol1` and added it to the model:

Now, Durbin-Watson is at an accceptable 2,408. But obviously, R-squared is extremely high because of the lagged variable, see also the coefficients below:

Another method I read about when being confronted with autocorrelation, is autoregression with Prais-Winsten (or Cochrane-Orcutt) method. Once performing this the model reads:

This is what I don't understand. Two different methods, and I get very different results. Other suggestions for analyzing this data include (i) not including a lagged variable but reformat the dependent variable by differencing (ii) perform AR(1) or ARIMA(1,0,0) models. I haven't calculated those because I am now lost on how to proceed because of the different results of the two tests I did perform.

What model should I use to perform a proper regression on my data? I'm very keen on understanding this, but have never had to analyze a timeseries dataset like this before.

### Who is online

Users browsing this forum: No registered users and 1 guest