## Calculating a score - huge amount of variables and IFs

Moderators: statman, Analyst Techy, andris, Fierce, GerineL, Smash

Lor259
Posts: 3
Joined: Tue Jan 14, 2014 3:17 pm

### Calculating a score - huge amount of variables and IFs

Hi,

I have a problem that theoretically speaking is quite easy to solve, but becomes really hard practically: I have to calculate a clinical score based on the discharge diagnosis of patients, which are coded with the International Classification of Diseases (ICD) coding scheme.

The first difficulty arises from the fact that there are up to 170 diagnostic codes per patient, but each patient has a different amount of diagnostic codes.

The dataset, in its original form, is structured so that for each diagnostic code for each patient for each admission, there is a record: so that a patient that for a particular admission had, let's say, 15 diagnosis, will have 15 records: each record will have patient's name, medical record number, admission and discharge dates, repeated; and the only variable that will vary for each record will be the diagnostic code. In the original form of the dataset, there are 184K records.

Original dataset

Code: Select all

``````ID	 Name	 Date of birth	 Gender	 Date of admission	Date of discharge	 Diagnostic_code
1	 Doe, John	 01-01-1950	 M	         10-12-2009	        01-01-2010	         443.9
1	 Doe, John	 01-01-1950	 M	         10-12-2009	        01-01-2010      	 V56.8
1	 Doe, John	 01-01-1950	 M	         10-12-2009	        01-01-2010	         221.02
1	 Doe, John	 01-01-1950	 M	         10-12-2009	        01-01-2010	         428
``````
Thinking that it might the analysis, I ran successfully a CASESTOVAR, so that now for each patient's admission there is only one record. After this procedure I have 21K records. The drawback of this is that now each admission has 170 diagnostic code variables (diagnostic_code.1, diagnostic_code.2, and so on).

After CASESTOVAR

Code: Select all

``````ID	 Name	 Date of birth	 Gender	 Date of admission	 Date of discharge	 Diagnostic_code.1	 Diagnostic_code.2	 Diagnostic_code.3	 Diagnostic_code.4
1	 Doe, John	  01-01-1950	 M	         10-12-2009	         01-01-2010	         443.9	                 V56.8	            221.02	                 428
``````
The second difficulty of my problem is due to the fact that the clinical score comprises 30 items, and each one has a variable number of points that add up to the final score. This items, as I said, coded using the ICD scheme. This is were the hard part begins. For example, for a given item, the possible codes are something like:
B18.x, K70.0--K70.3, K70.9, K71.3--K71.5, K71.7, K73.x, K74.x, K76.0, K76.2--K76.4, K76.8, K76.9, Z94.4
Where "x" and "-" are wildcards that indicate "any value after the '.' sign" and a range, respectively. I imagine that coding this is not gonna be easy.

So you can imagine that if I have to write the code in the traditional way (for each combination of diagnostic code variable, item and ICD diagnostic code), its gonna take me forever.

The other option that I thought of was to calculate the score on the original dataset (the one with each record corresponding to a diagnostic code). In this way we would avoid the problem due to the 170 different variables for each possible diagnostic code, but we'd have the obstacle of figuring out how to calculate the score on a per-admission basis. For example, I don't know how I could tell SPSS how to identify an admission based on multiple records, and calculate the score adding points only for a given set of records that I know that represent an admission, and then move on to the next set of records that represent another admission.

I hope to have been able to clearly explain myself and, especially, that someone can help me out...

Thank you very much!

Lo
GerineL
Moderator
Posts: 1477
Joined: Tue Jun 10, 2008 4:50 pm

### Re: Calculating a score - huge amount of variables and IFs

As I am not familiar with this system, it is not clear to me yet what you want exactly.
What variable do you want to create exactly?
I understand what your dataset looks like, but not yet what you want to achieve.
Lor259
Posts: 3
Joined: Tue Jan 14, 2014 3:17 pm

### Re: Calculating a score - huge amount of variables and IFs

The ICD codes are codes that are assigned to medical diagnoses at discharge from the hospital, to provide a unified framework. This allows hospitals to ask for reimbursement from health authorities, to objectively assess the severity of a specific patient condition, to compare the prevalence of certain diagnoses between different healthcare systems, and so on.
We are currently using ICD-10 coding scheme, although ICD-9 has been used (and is being used, in certain countries) until recently.

A quick look at ICD-10 codes: http://en.wikipedia.org/wiki/ICD-10
Just a random page to show you the structure of the ICD-9 coding scheme: http://www.icd9data.com/2014/Volume1/29 ... efault.htm
And the ICD-10: http://www.icd10data.com/ICD10CM/Codes/ ... 0-H05/H02-

So, in my case, what I want to achieve is to construct a comorbidity index that takes into account the relative weight of certain diseases on prognosis. The two comorbidity indexes that I want to calculate are the Deyo-Charlson and the Elixhauser. The former is based on 17 conditions, and the latter on 30. Since they were developed by physicians, and not by coders, there is an equivalence table between each condition and the corresponding ICD-9/ICD-10 codes. For example:

Myocardial infarction:
For ICD-9: 410.x, 412.x
For ICD-10: I21.x, I22.x, I25.2

Congestive heart failure:
For ICD-9: 428.x
For ICD-10: I09.9, I11.0, I13.0, I13.2, I25.5, I42.0, I42.5–I42.9, I43.x, I50.x, P29.0

and so on

(where: "x" is a wildcard for any other code after the "."; "-" is a wildcard that stands for a range of values)
Please note that the variables that contain the codes should be treated as a string, because it can contain numbers and letters, and also "428", "428.0" and "428.00" are three different codes.

So, one difficulty might arise from this. And also from the fact that, even though "-" can be coded with "FROM ... THRU ...", it's not clear to me how to code "x".

For each condition, a score (1, 2, 3, etc) is assigned. In the end, the final score is calculated by adding the points of each condition that the patient presents.

I hope to have been more clear and that you can better understand what I want to achieve now. If not, please don't hesitate to ask me.

Thanks a lot!

Lo
GerineL
Moderator
Posts: 1477
Joined: Tue Jun 10, 2008 4:50 pm

### Re: Calculating a score - huge amount of variables and IFs

Unfortunately, it is still not very clear to me.

Please note that in order to answer your question I don't think it is necessary cary to know what diagnoses are in the system etc. It is just necessary to know what your data looksl ike, and what you want it to look like.
Especially the last part is not clear to me.

Can you try to rephrase your question so it is more on the level of variables rather than content?

Can you - for instance - give some examples of what values your variables hold now, and what values you want them to hold?
Lor259
Posts: 3
Joined: Tue Jan 14, 2014 3:17 pm

### Re: Calculating a score - huge amount of variables and IFs

For example:

ID Code.1 Code.2 Code.3 Code.4
1 410.1 324.33 311.1 042.1
2 123.1 124.10 125.11 111.1
3 250.4 428.0 490.1 290.0

Let's say that only the following codes account in the final score:
410.x --> myocardial infarction --> +1 point
042.1 --> AIDS --> +6 points
250.4 --> diabetes with complications --> +2 points
428.0 --> chronic heart failure --> +1 point
490.1 --> chronic pulmonary disease --> +1 point
290.0 --> dementia --> +1 point

So, the scores should be as follows:
Patient 1 --> 1+0+0+6 = 7 points
Patient 2 --> 0+0+0+0 = 0 points
Patient 3 --> 2+1+1+1 = 5 points

Did this example clarify practically what I mean?

Thanks again!
GerineL
Moderator
Posts: 1477
Joined: Tue Jun 10, 2008 4:50 pm

### Re: Calculating a score - huge amount of variables and IFs

Ok if you have an overview somewhere of what diagnostic code gives what pionts, I would use recode and then create a new var with these points.

You can easily do this with excel for instance, just place diagnostic code in column B, and the value you want it to be in column D, make column c "=", column A "(" and column E ")"

For instance:

recode diagnosticcode1 diagnosticcode2 diagnosticcode3
("410.x" = 1)
("042.1" = 6)
into diagnosticcode1a diagnosticcode2a diagnosticcode3a.

etc.

that is the basics, and now there were some additional problems, could you please clarify these (for instance: what do you mean with the X part exactly?)

### Who is online

Users browsing this forum: No registered users and 1 guest