I have a dataset of 1.5 million death records and I need to identify the top 2 leading causes of death for each of 300 different cohorts -- separately for each cohort.
The dataset has these variables: "cohort" and "deathcause". There are 300 different cohorts in the dataset. The dataset contains 5,000 death records for each cohort
(300 cohorts x 5,000 deaths = 1.5 million death records)
The values in the "deathcause" variable are disease codes for the underlying cause of death such as HIV, heart disease, cancers, Alzheimers, etc.
I need to identify the top 2 leading causes of death for each of the 300 cohorts separately.
For cohort #1, leading cause of death = heart disease, followed by cancer
For cohort #2, leading cause of death = HIV, followed by injuries.
For cohort #3, leading cause of death = cancers, followed by heart disease.
For cohort #4, leading cause of death = heart disease, followed by HIV.
etc etc etc
I am a newbie to SPSS so I am not really sure what to do. I am able to use this...
It displays the leading cause of death for the entire dataset first, followed by the second, third, and fourth causes of death, and so forth, etc. It contains dozens of rows. Unfortunately, there are 2 problems with this: 1) it displays ALL of the leading causes of death for the entire dataset (but instead I need the information separately for each of the 300 cohorts) and 2) it displays many rows, when I only need to know the top 2 leading causes of death for each cohort.
Any suggestions for how I should approach this challenge?
OS=Windows 7 Enterprise; SPSS version 21
Info about study: Identifying and describing the leading causes of death for various different cohorts.
OS=Windows 7 Enterprise
SPSS Version 21 64 bit
My study: Identification and description of leading causes of death for various cohorts.