New Variable to average last three variables of time series

Moderators: statman, Analyst Techy, andris, Fierce, GerineL, Smash

alexmsvh
Posts: 2
Joined: Tue Jun 24, 2014 7:19 pm

New Variable to average last three variables of time series

Postby alexmsvh » Tue Jun 24, 2014 7:33 pm

My time series data consists of 1 data point per month. I need to make a new variable that averages the last three entries in the time series for each case. The problem is that not all cases have the same number of entries. For example, lets say each case represents a student, and each month the student is assessed and receives a score. Some students stay in the program for three years, others for just one, ect. However, they all still take an assessment each month. Therefore, some students have 36 data points (assessments) while others have just 12. It is easy to create a new variable that averages the first three scores, but more difficult to create one that averages the last three scores. Does anyone have any advise?

Just as an aside, I am averaging the first and last scores to account for a poor monthly assessment (ie: student had a cold and tested poorly, ect), as we are looking to see if there is general improvement over time, rather than looking at a snapshot of an initial test and a final test.

Please feel free to post questions or critique my logic. Examples of simple create syntax below:

Simple create new variable syntax for average of first three assessments:

COMPUTE F3TM_AVERAGE = (total_tpm_1+total_tpm_2+total_tpm_3)/3.
EXECUTE.
KeithMurray
Posts: 16
Joined: Fri Feb 21, 2014 1:12 pm

Re: New Variable to average last three variables of time ser

Postby KeithMurray » Wed Jun 25, 2014 8:48 am

Hi alexmsvh

I'm assuming scores for an individual student are all on a single row, and that you have variables labelled total_tpm_1 to total_tpm_36.

I'm also assuming that cells have been left blank for months students have not been assessed.

If you have inserted missing values instead, adjust the syntax "do if not (missing(score))"..

First, compute a new variable, "month", for the number of assessments per student. Then compute your mean of three assessments based on this new variable. You can also compute a first and last month mean - I've called this new variable "aside", using the month variable too.

Syntax divided into three sections.

The first computes new month variable. Substitute total_tpm_1 to total_tpm_36 where I've used score1 to score6.

compute month = 0.
do repeat score = score1 to score6.
do if not (missing(score)).
compute month = month+1.
end if.
end repeat.
execute.

Second section computes three month mean. This bit is spaghetti like as you'll need 34 lines of syntax to cover all 36 values of month, so maybe someone else could suggest a loop structure which cuts out verbiage.

if (month = 3) avescore = (score1+score2+score3)/3.
if (month = 4) avescore = (score2+score3+score4)/3.
if (month = 5) avescore = (score3+score4+score5)/3.
if (month = 6) avescore = (score4+score5+score6)/3.

etc .......

execute.

Third section computes aside. Again, spaghetti like.


if (month = 3) aside = (score1 + score3)/2.
if (month = 4) aside = (score1 + score4)/2.
if (month = 5) aside = (score1 + score5)/2.
if (month = 6) aside = (score1 + score6)/2.

etc .......

execute.

Hope this is clear.
alexmsvh
Posts: 2
Joined: Tue Jun 24, 2014 7:19 pm

Re: New Variable to average last three variables of time ser

Postby alexmsvh » Wed Jun 25, 2014 3:02 pm

KeithMurray,

Thanks you for your insight, this is a huge help. I actually already have a variable with the number of months that a student was in the program, so that saves me a step!

Your assumptions are correct, all scores are columns in a row for each student (total_tpm_1, total_tpm_2, ect), and cells with no data have been left blank.

As for the "aside" variable, I think I was a bit unclear. Sorry about that! I was actually explaining my reasoning for averaging the first three month's scores and averaging the last three month's scores. I do not think I need a variable that averages the first and last score together.

That being said, does it seem like a wise statistical decision to take the average of the first three month and the average of the last three months, and then look at the change in those average scores? Time series are a bit new to me, and I want to make sure I do not miss step my analysis.

I should probably explain the scores a little more. They are assessments in a sense, but are more so just a count of occurrences within a given time (one month). We are looking for change in these occurrences over time, with one month being the unit of measurement. I used the student example in the original post because it seemed easier to explain.

On a final note, aside from z-scores, is there a good way to standardize scores like this? The scores vary drastically, as some students go from scoring 40 to scoring 5 over time, while others go from scoring 1 to scoring 0 over time, and they are not normally distributed. I think this will complicate the analysis. We are looking for the average % change per student, and looking at variables that may influence someone's rate of change or amount of change.

Thanks for your help, I hope this makes sense.

-alexmsvh
KeithMurray
Posts: 16
Joined: Fri Feb 21, 2014 1:12 pm

Re: New Variable to average last three variables of time ser

Postby KeithMurray » Thu Jun 26, 2014 10:42 am

Hi alexmsvh

There will be better statisticians on this forum than me who would be able to provide you with a more satisfactory answer to your standardisation issues, but this is my tuppence worth.

First of all, if you compare the mean of the first three months against the mean of the final three months this would at least smooth out some of the variation in the observed scores.

I would also be tempted to simplify your design: given the large individual differences between students and the non normal distribution of the scores, I would suggest computing for each student a simple binary variable which would indicate whether their mean scores have increased or decreased. This simple measure would disregard the extent to which they have changed, but this would appear to be difficult to do anyway. Percentage changes are vulnerable to the first score obtained; if one student changes from 2 to 1 and another student from 40 to 20, both have changed by 50% but have actually behaved in entirely different ways.

If you adopt a binary variable approach, this would enable you to express results in terms of how many students changed one way or the other and these could be expressed as percentages. If your "variables that may influence someone's rate of change " were categorical variables or could be made categorical , then this would allow you to analyse your results using simple chi squared tests.

Anyway, best of luck.

Who is online

Users browsing this forum: No registered users and 2 guests

cron