I am using Generalized Estimating Equation (GEE) in SPSS to analyze panel data. Each participant has provided data for 2-5 years, during 2004-2008. This means, people have joined and left the panel on different times and after different durations.
I am mostly checking on whether/how changes in life (Independent variables: i.e. increase in household size, salary change, transition from work to retirement etc. - all binary variables with 1 meaning change happened and 0=no change) affect to changes in behavior (Dependent variable, also binary: 1=change happened, 0= no change). Changes have been measured with LAG + AGGREGATE commands from previous years.
Now my question is: Should I filter out the data of first year from every participant? (As I cannot know whether their behavior changed from previous year, there should not be value 0, but instead "missing" or filtered out - right?) And how should I do this?
- When I filter out only years 2004 (affecting 52% of participants), results from GEE still look nice.
- When I filter out first year of every participant, GEE results change dramatically (making all the most expected interactions non-significant: Sig-value raises dramatically from when filtering is not yet done or done only by taking out year 2004). I used this syntax to determine which participants are filtered out (data has been earlier sorted in ascending order by ID & YEAR):
if $casenum = 1 or ID ne lag(ID) counter = 1.
COMPUTE filter_$=(counter >= 3).
VARIABLE LABELS filter_$ 'counter >= 3 (FILTER)'.
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMATS filter_$ (f1.0).
FILTER BY filter_$.
The results are further different (slightly more like expected), if I then filter out those participants, who were left with only one year of data in the dataset...
Thus, which way would give me more correct results (also, eliminating most of my independent variables or not): filtering the first year of every participant out, or leaving this data in?
The syntax of my GEE is following:
DATASET ACTIVATE DataSet1.
* Generalized Estimating Equations.
GENLIN Behavior_change (REFERENCE=FIRST) BY work_ret Change_salary (ORDER=DESCENDING) WITH Salary_level
/MODEL work_ret Change_salary Salary_level INTERCEPT=YES
/CRITERIA METHOD=FISHER(1) SCALE=1 MAXITERATIONS=100 MAXSTEPHALVING=5 PCONVERGE=1E-006(ABSOLUTE)
SINGULAR=1E-012 ANALYSISTYPE=3(WALD) CILEVEL=95 LIKELIHOOD=FULL
/REPEATED SUBJECT=hhnr WITHINSUBJECT=YEAR SORT=YES CORRTYPE=INDEPENDENT ADJUSTCORR=YES
COVB=ROBUST MAXITERATIONS=100 PCONVERGE=1e-006(ABSOLUTE) UPDATECORR=1
/PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION (EXPONENTIATED).
I very much appreciate any help for this problem! Please don't hesitate to ask if this seems very unclear.