spssforum.com

by SPSS users, for SPSS users
SPSSVideoTutor.com
It is currently Wed May 22, 2013 4:23 pm

All times are UTC




Post new topic Reply to topic  [ 11 posts ] 
Author Message
PostPosted: Wed Jun 13, 2012 5:06 pm 
Offline

Joined: Wed Jun 13, 2012 4:50 pm
Posts: 6
Hi,
I have a question about how to transform my dataset. I have a dataset with 188 variables and 1308 cases. Each row is for a single case (i.e. one person). The variables represent all the diagnoses codes this person has ever received at the clinic. There can be up to 6 diagnosis codes per visit. There are a lot of missing values as some people may have only come once and have only 1 diagnosis code while others may have come 10 times and have multiple codes per visit - some of which are the same (e.g. the code for hypertension may come up more than once since this is a chronic condition and would be reported on each visit).

So, I was wondering if there is a way to transform the data so that each diagnosis is only listed once per person (i.e. if hypertension is listed more than once for a person, I only want to count it once for that person). The ultimate goal is to see what the top 20 diagnoses codes are for the entire clinic based on the total number of patients. The messy part is that there are also roughly 980 diagnosis codes used.

Does anybody have any suggestion as to how I can transform this huge dataset so I can get some simple frequencies?

Thanks very much!


Top
 Profile  
 
PostPosted: Wed Jun 13, 2012 5:14 pm 
Offline

Joined: Thu Apr 05, 2012 5:58 pm
Posts: 463
Reconstruct the data to a long form where the array to be transformed is the six diagnostic codes and the fixed variable is patients' ID. The final data set should have only ID and diagnosis, and 6x1308 cases. This transformation can be found under Data > Restructure.

Then, identify the duplicates by the find duplicate function under the Data menu. Delete the cases that have duplicated ID and diagnosis. Afterwards you can run the frequency.


Top
 Profile  
 
PostPosted: Wed Jun 13, 2012 5:41 pm 
Offline

Joined: Wed Jun 13, 2012 4:50 pm
Posts: 6
Thanks for your quick response...however maybe I didn't explain it correctly. But there are up to 6 diagnosis codes PER visit. So there could be up to 186 possible diagnosis codes (It appears that a patient has visited the clinic 31 times, so 31*6=186). I'm attaching just a portion of the dataset copied into Excel. The variable names go all the way up to Diagnosis 631.
Thanks!


You do not have the required permissions to view the files attached to this post.


Top
 Profile  
 
PostPosted: Wed Jun 13, 2012 6:33 pm 
Offline

Joined: Thu Apr 05, 2012 5:58 pm
Posts: 463
In that case, move all 186 into the array to be transformed, yielding 186x1208 cases. The procedure is the same.


Top
 Profile  
 
PostPosted: Wed Jun 13, 2012 6:40 pm 
Offline

Joined: Wed Jun 13, 2012 4:50 pm
Posts: 6
Sorry, I'm still relatively new at SPSS. How do I do that? Could you help with some codes?


Top
 Profile  
 
PostPosted: Wed Jun 13, 2012 6:44 pm 
Offline

Joined: Thu Apr 05, 2012 5:58 pm
Posts: 463
Can you upload part of the file? Even just one case is enough. I need to see your variable name and data set structure.


Top
 Profile  
 
PostPosted: Fri Jun 22, 2012 3:48 pm 
Offline

Joined: Wed Jun 13, 2012 4:50 pm
Posts: 6
Hi!
I uploaded a sample of the data in my previous post - I had to copy it into excel and then save it as a picture in order to upload. Do you want to see the variable view from the dataset?
Thanks!


Top
 Profile  
 
PostPosted: Fri Jun 22, 2012 4:22 pm 
Offline

Joined: Thu Apr 05, 2012 5:58 pm
Posts: 463
kot4x wrote:
Hi!
I uploaded a sample of the data in my previous post - I had to copy it into excel and then save it as a picture in order to upload. Do you want to see the variable view from the dataset?
Thanks!


No, I need the real data so that I can get your list of variables. But that wouldn't matter. Here is the syntax:

Code:
VARSTOCASES
  /ID=id1
  /MAKE trans1 FROM Diagnosis11 TO Diagnosis631
  /INDEX=Index1(trans1)
  /KEEP=id
  /NULL=KEEP.


You may need to modify the syntax a bit. I use Diagnosis11 as your first variable and Diagnosis631 as your last diagnosis variable (probably Diagnosis631, I can't be sure because there isn't a data set for me to check). And it's important that all these 186 variables have to be contiguous, which means they should all be adjacent to each other in your data set without any other foreign variable mixed in between.

Remember to save a back up before restructuring.

Now after it's restructured, you can get rid of the empty cases and deduplicates in the data by using this:
Code:
FILTER OFF.
USE ALL.
SELECT IF (trans1  ~=  "").
EXECUTE.

* Identify Duplicate Cases.
SORT CASES BY id1(A) trans1(A).
MATCH FILES
  /FILE=*
  /BY id1 trans1
/DROP = PrimaryFirst  /FIRST=PrimaryFirst
  /LAST=PrimaryLast.
DO IF (PrimaryFirst).
COMPUTE  MatchSequence=1-PrimaryLast.
ELSE.
COMPUTE  MatchSequence=MatchSequence+1.
END IF.
LEAVE  MatchSequence.
FORMATS  MatchSequence (f7).
COMPUTE  InDupGrp=MatchSequence>0.
SORT CASES InDupGrp(D).
MATCH FILES
  /FILE=*
  /DROP=PrimaryLast InDupGrp MatchSequence.
VARIABLE LABELS  PrimaryFirst 'Indicator of each first matching case as Primary'.
VALUE LABELS  PrimaryFirst 0 'Duplicate Case' 1 'Primary Case'.
VARIABLE LEVEL  PrimaryFirst (ORDINAL).
FREQUENCIES VARIABLES=PrimaryFirst.
EXECUTE.

FILTER OFF.
USE ALL.
SELECT IF (PrimaryFirst  =  1).
EXECUTE.


For the GUI command, you can get the same results by using Data > Select Cases, and then Data > Identify Duplicate Cases.


Last edited by Penguin_Knight on Fri Jun 22, 2012 8:41 pm, edited 1 time in total.

Top
 Profile  
 
PostPosted: Fri Jun 22, 2012 4:27 pm 
Offline

Joined: Wed Jun 13, 2012 4:50 pm
Posts: 6
Thank you so much! I will try it and see if it works!


Top
 Profile  
 
PostPosted: Fri Jun 22, 2012 5:03 pm 
Offline

Joined: Wed Jun 13, 2012 4:50 pm
Posts: 6
Oh my gosh, Penguin_Knight! You are my hero! Thank you SOOO much! That worked beautifully!!


Top
 Profile  
 
PostPosted: Fri Jun 22, 2012 8:12 pm 
Offline

Joined: Thu Apr 05, 2012 5:58 pm
Posts: 463
Wonderful, glad to hear.

Good luck with your analysis.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 11 posts ] 

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group