Select Cases that occur in both years

Moderators: statman, Analyst Techy, andris, Fierce, GerineL, Smash

Joostaw
Posts: 4
Joined: Tue May 05, 2015 8:41 am

Select Cases that occur in both years

Postby Joostaw » Tue May 05, 2015 8:52 am

I have a big dataset with surveys on persons. Most of these people have multiple surveys per year. I want to select all the cases (peron surveys) that occur in both 2007 and 2009. In other words: I want to filter out all the persons that only did the survey in one of the two years (I already filtered out all other years). How can this be done?
GerineL
Moderator
Posts: 1477
Joined: Tue Jun 10, 2008 4:50 pm

Re: Select Cases that occur in both years

Postby GerineL » Tue May 05, 2015 8:54 am

how are your data set up?

if they are 1 year per line (max 2 cases per person), you could create an identifier using a lag function to select cases.
If they are 1 person per line (year 1 and 2 follow each other), you can use an if-function.

More info is needed about how your data are set up for a more detailed response.
Joostaw
Posts: 4
Joined: Tue May 05, 2015 8:41 am

Re: Select Cases that occur in both years

Postby Joostaw » Tue May 05, 2015 9:33 am

Thank you for your quick reply. Every row has an identifier for the person the survey is on and the year of the survey and of course all the outcomes of the survey (so a column with all the identifiers and a column with years). Some persons have up to 15 surveys (rows) in one year. If that person also has at least one survey in the other year, all of his cases should be kept in the dataset. I hope this makes my question more clear.
GerineL
Moderator
Posts: 1477
Joined: Tue Jun 10, 2008 4:50 pm

Re: Select Cases that occur in both years

Postby GerineL » Tue May 05, 2015 12:02 pm

Does it also happen that people only have data for the second year and not for the first year?
Joostaw
Posts: 4
Joined: Tue May 05, 2015 8:41 am

Re: Select Cases that occur in both years

Postby Joostaw » Tue May 05, 2015 1:01 pm

Yes, as these years are taken from a much bigger dataset, covering almost 25 years, some people only have data in 2009 and not in 2007.
GerineL
Moderator
Posts: 1477
Joined: Tue Jun 10, 2008 4:50 pm

Re: Select Cases that occur in both years

Postby GerineL » Tue May 05, 2015 2:01 pm

okay, there might be an easier way but I would go like this:

let's assume that the variable with year is called "year" and the variable with person ID is called "ID".


1. compute a new variable "yeardich" in which 2007 gets a value of 1 and 2009 gets a value of 2

Code: Select all

recode year (2007 = 1) (2009 = 2) into yeardich.
execute. 

2. use identify duplicate cases with break on person to create a variable with 1 for a new occuring year within a person and a 0 for not new (lets call this duplicate_1).

Code: Select all

SORT CASES BY id(A) year(A).
MATCH FILES
  /FILE=*
  /BY id year
  /FIRST=PrimaryFirst
  /LAST=PrimaryLast.
DO IF (PrimaryFirst).
COMPUTE  MatchSequence=1-PrimaryLast.
ELSE.
COMPUTE  MatchSequence=MatchSequence+1.
END IF.
LEAVE  MatchSequence.
FORMATS  MatchSequence (f7).
COMPUTE  InDupGrp=MatchSequence>0.
SORT CASES InDupGrp(D).
MATCH FILES
  /FILE=*
  /DROP=PrimaryFirst InDupGrp MatchSequence.
VARIABLE LABELS  PrimaryLast 'Indicator of each last matching case as Primary'.
VALUE LABELS  PrimaryLast 0 'Duplicate Case' 1 'Primary Case'.
VARIABLE LEVEL  PrimaryLast (ORDINAL).
FREQUENCIES VARIABLES=PrimaryLast.
EXECUTE.


3. compute temp1 which only has the value 1 / 2 for non-duplicate cases.

Code: Select all

compute temp1 = primarylast * yeardich.
execute.
4. aggregate the temp1 variable as a sum.

Code: Select all

AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES
  /BREAK=id
  /temp1_sum=SUM(temp1).
5. use max function to create a variable in which the highest value of each case is given to the person.

Code: Select all

AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES
  /BREAK=id
  /temp1_sum_max=MAX(temp1_sum).

6. recode into filter variable.

Code: Select all

recode temp1_sum_max (3 = 1) (else = 0) into filter.
execute.

use = 1, not use = 0.


and then to make it complete, delete unnessecary variables:

Code: Select all

delete variables  yeardich PrimaryLast temp1 temp1_sum temp1_sum_max.
Joostaw
Posts: 4
Joined: Tue May 05, 2015 8:41 am

Re: Select Cases that occur in both years

Postby Joostaw » Tue May 05, 2015 3:48 pm

Thanks, that works!
GerineL
Moderator
Posts: 1477
Joined: Tue Jun 10, 2008 4:50 pm

Re: Select Cases that occur in both years

Postby GerineL » Tue May 05, 2015 4:40 pm

let me know if you come up with an easier solution :-)

Who is online

Users browsing this forum: No registered users and 1 guest

cron