Duplicates OK, triplicates bad

Moderators: statman, Analyst Techy, andris, Fierce, GerineL, Smash

Shirtless
Posts: 3
Joined: Fri Aug 01, 2014 1:15 am

Duplicates OK, triplicates bad

Postby Shirtless » Fri Aug 01, 2014 1:19 am

Hello,

I'm encountering an interesting issue. I have a dataset in which I want to keep all observations in which a select variable's value is unique or duplicated between two observations, but not if that value is triplicated between three observations.

I've looked at the identify duplicates and data validation options, but they only deal with duplicates, not triplicates. Any ideas? Thanks!
Shirtless
Posts: 3
Joined: Fri Aug 01, 2014 1:15 am

Re: Duplicates OK, triplicates bad

Postby Shirtless » Fri Aug 01, 2014 2:26 am

Maybe another way to ask this, a non-beginner way, is:

I can use match sequencing to identify which cases are a triplicate (match sequence of 3). Then I want to store the ID number of that case in an array. Then I want to remove every case with an ID number that matches the ID number in the array.
RubenGeert
Posts: 100
Joined: Mon May 19, 2014 6:06 am

Re: Duplicates OK, triplicates bad

Postby RubenGeert » Fri Aug 01, 2014 4:59 am

Create a variable holding the sequence number for observations on the same ID. See the second syntax example of http://www.spss-tutorials.com/lag/

Then you can use this variable in a SELECT IF command: http://www.spss-tutorials.com/select-if/

Kind regards,

Ruben Geert van den Berg
www.spss-tutorials.com
Shirtless
Posts: 3
Joined: Fri Aug 01, 2014 1:15 am

Re: Duplicates OK, triplicates bad

Postby Shirtless » Fri Aug 01, 2014 5:41 am

RubenGeert wrote:Create a variable holding the sequence number for observations on the same ID. See the second syntax example of http://www.spss-tutorials.com/lag/

Then you can use this variable in a SELECT IF command: http://www.spss-tutorials.com/select-if/

Kind regards,

Ruben Geert van den Berg
www.spss-tutorials.com
Hi Ruben,

Thanks for the response. I apologize for not having been more specific. I don't want to just remove the third (triplicate) observation. I want to remove all observations of any ID that has a triplicate. In other words, if person 50 appears three times in the dataset, I want to remove all three observations, not just the third observation. I think the link you provided would only allow me to identify and remove the third observation, not all three.
RubenGeert
Posts: 100
Joined: Mon May 19, 2014 6:06 am

Re: Duplicates OK, triplicates bad

Postby RubenGeert » Sat Aug 02, 2014 6:24 am

Ok, then use AGGREGATE with MODE ADDVARIABLES and BREAK = ID to create a frequency variable. Next, delete cases with SELECT IF ...

Also, see http://www.spss-tutorials.com/aggregate/ and http://www.spss-tutorials.com/select-if/

HTH,

Ruben

Who is online

Users browsing this forum: No registered users and 3 guests

cron