Selecting unique cases by a condition on a variable

Moderators: statman, Analyst Techy, andris, Fierce, GerineL, Smash

DrNerj
Posts: 3
Joined: Tue Jul 10, 2012 3:30 pm

Selecting unique cases by a condition on a variable

Postby DrNerj » Tue Jul 10, 2012 5:33 pm

Hi everyone, I'm a first-time poster here and quite new to SPSS. I'm hoping someone here can help me out, this seems like a great resource/community.

I have a table of several hundred cases and at least 20 variables. For some of the statistical analysis that I want to do, I need to exclude certain cases (but I don't want to delete them from the table).

My first variable is 'CustomerID' (8-digit number) and a single customer can have multiple cases. I have 2 other variables called 'LeftHand' and 'RightHand', both of which have possible values within the range of 0-7.

I want to select unique CustomerIDs, and the case that I want to select for each customer is the case that has the highest number in either LeftHand or RightHand.

For example, if for one of the customers the variables of importance from the table look like....

CustomerID | LeftHand | RightHand
| |
02439568 | 6 | 3
02439568 | 2 | 5
02439568 | 1 | 2
02439568 | 4 | 7
02439568 | 2 | 6

When I select my cases, I want the 4th row shown to be selected for this customer.
(sorry for the crudity of my table)

Any help with this would be much appreciated!
Penguin_Knight
Posts: 473
Joined: Thu Apr 05, 2012 5:58 pm

Re: Selecting unique cases by a condition on a variable

Postby Penguin_Knight » Tue Jul 10, 2012 6:02 pm

What about this?

Code: Select all

CustomerID   |    LeftHand       |   RightHand
02439568     |        6          |       3
02439568     |        2          |       5
02439568     |        7          |       2
02439568     |        4          |       7
02439568     |        2          |       6
Do you want the 3rd or the 4th? Or you are sure that does not happen?
DrNerj
Posts: 3
Joined: Tue Jul 10, 2012 3:30 pm

Re: Selecting unique cases by a condition on a variable

Postby DrNerj » Tue Jul 10, 2012 6:47 pm

Thanks for the quick reply.

That's a good question.
In that case, I would want the 4th because the secondary value is higher in the 4th case than in the 3rd case (4 vs 2).

If the secondary values are also equal, I would want to choose the most recent case. (I also have a 'Date' variable where values are of the format DD-MMM-YYYY)

Thanks for bringing that to my attention, I didn't consider that possibility. Do you think it's possible to select cases in this way?
Penguin_Knight
Posts: 473
Joined: Thu Apr 05, 2012 5:58 pm

Re: Selecting unique cases by a condition on a variable

Postby Penguin_Knight » Tue Jul 10, 2012 8:03 pm

Let's see if this will work:

Code: Select all

COMPUTE higher=max(left, right).
EXECUTE.

AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES
  /BREAK=id
  /higher_max=MAX(higher).

USE ALL.
COMPUTE filter_$=(higher_max = higher).
VARIABLE LABEL filter_$ 'higher_max = higher (FILTER)'.
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMAT filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE.

AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES
  /BREAK=id
  /date_max=MAX(date).

COMPUTE ToSelect = (date_max = date) .
EXECUTE .

FILTER OFF.
USE ALL.
EXECUTE.

IF SYSMIS(ToSelect) ToSelect = 0 .
EXECUTE .

DELETE VARIABLES higher higher_max filter_$ date_max .
EXECUTE .
Save a copy of your data and then try use the syntax above. To apply a syntax, go to File > New > Syntax. Copy and paste the thing in the box to the syntax window. And the choose Run > All in the tool bar.

I named the variables as how you named them. Except date. I just called that "date" and you'll need to change it to however you called your date variable.

A simple run down:

Code: Select all

COMPUTE higher=max(left, right).
EXECUTE.
1) First use max() function to select which one is the higher value of the two.

Code: Select all

AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES
  /BREAK=id
  /higher_max=MAX(higher).
2) Use aggregate to add the largest left/right value for each ID to the data as a new variable.

Code: Select all

USE ALL.
COMPUTE filter_$=(higher_max = higher).
VARIABLE LABEL filter_$ 'higher_max = higher (FILTER)'.
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMAT filter_$ (f1.0).
FILTER BY filter_$.
EXECUTE.
3) Select entries whose larger left/right value matches the aggregated largest value.

Code: Select all

AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES
  /BREAK=id
  /date_max=MAX(date).
4) Apply another round of aggregate to select maximal date (aka latest date). This procedure will only be done to cases that were selected. You'll see a lot of cells with missing "." and that's normal.

Code: Select all

COMPUTE ToSelect = (date_max = date) .
EXECUTE .
5) Select entries, among the selected ones, whose date matches the aggregated largest date value.

Code: Select all

FILTER OFF.
USE ALL.
EXECUTE.

IF SYSMIS(ToSelect) ToSelect = 0 .
EXECUTE .
6) Compute a flag variable to identify the cases to be selected. (1 = flagged, 0 = not flagged). Each ID should only have one 1 with the rest being 0, unless you have a case that has two lines with same date and same largest data, which I hope not.

Code: Select all

DELETE VARIABLES higher higher_max filter_$ date_max .
EXECUTE .
7) Cleaning up the junk.
DrNerj
Posts: 3
Joined: Tue Jul 10, 2012 3:30 pm

Re: Selecting unique cases by a condition on a variable

Postby DrNerj » Wed Jul 11, 2012 1:23 pm

Thank you so much Penguin_Knight. It worked wonderfully.

And thanks for explaining each step of the code, very helpful.
jdsfg
Posts: 2
Joined: Thu Jul 19, 2012 8:07 am

Re: Selecting unique cases by a condition on a variable

Postby jdsfg » Thu Jul 19, 2012 8:25 am

Thanks for the quick reply.

That's a good question.
In that case, I would want the 4th because the secondary value is higher in the 4th case than in the 3rd case (4 vs 2).

If the secondary values are also equal, I would want to choose the most recent case. (I also have a 'Date' variable where values are of the format DD-MMM-YYYY)


____________________

Sac Longchamp 2012

Who is online

Users browsing this forum: No registered users and 1 guest

cron