## Creating count variables from nested data?

Moderators: statman, Analyst Techy, andris, Fierce, GerineL, Smash

secondarydatageek

### Creating count variables from nested data?

Greetings,

I am working with a secondary data set and am having trouble figuring out how to get SPSS to calculate variables that I want.

The data capture sentences for criminal offenses. The unit of analysis is the count of conviction. For each conviction count, there is unique offender ID (Variable 1--string format). However, an individual offender can have multiple conviction counts. In other words, s/he CAN be--but isn't alwasy--several times but won't necessarily be (e.g. they won't if they've only got one count of conviction). Additionally, each count has a 'Batch' number (Variable 2--string format). This tells you which counts belong to the same court case. As I understand it, counts are nested within cases ('Batch' number) as well as nested within offenders (offender ID). However, any offender can be involved in multiple cases. Some cases have multiple offenders, but the only way you can tell is to match up the offender IDs to the same Batch number.

Presumably, SPSS can uses these variables to determine which counts belong to each individual offender as well as to match multiple offenders to a single case (e.g. whether there were co-defendants).

What I would like to do is create variables that tell me :
1) How many counts are related to a given
a) offender
b) case
2) How many co-defendants are involved in a given
a) count
b) case

Once I have these, I want to create a weight that will enable me to control for the counts that are linked to other counts when looking at the sentences given out for each case and another one for the number of co-defendants (does this make sense)?

I have started doing this by hand by simply sorting the data by offender ID and entering (in a new variable) the number of times s/he appears for the same batch number. I was then going to do the same thing, sorting by Batch number and hand keying the number of co-defendants involved in each count. However, I have over 5,000 cases and am hoping that syntax might solve this more quickly. Any help you could offer would be GREATLY appreciated.

Thank you very much
Smash
Moderator
Posts: 233
Joined: Tue Aug 07, 2007 11:48 am
Sorry mate,
quite complicated description. Could you insert here a sample data - structure of the file (variable names + few examples of multiple cases/offenders...), so I could imagine it
Does the structure is like:
conviction offender_ID batch
1 'tom **CENSORED**' 'case_ZZ1'
1 'john yyy' case_ZZ1
2 'tom **CENSORED**' 'case_ZZ2'
...
Multiple convictions for one offender + multiple offenders for one batch?

I think that in this case both: aggregate option and a lag fucntion should works for you.
secondarydatageek
Posts: 8
Joined: Tue Oct 16, 2007 12:32 pm
Righto! Sorry for the confusion.

The data structure is like this:

CaseNo ID BATCH COUNT
'1' '2241' 'XL142' 'Armed Robbery'
'2' '2241' 'XL142' 'Assault'
'3' '2241' 'XL142 ' 'Kidnapping'
'4' '3451' 'XL142' 'Armed Robbery'
'5' '7417' 'KM481' 'Rape'
'6' '8419' 'PR472' 'Arson'
'7 ' '3451' 'QT771' 'Armed Robbery'
'8 ' '4723' 'QT771' 'Armed Robbery'
'9' '3451' 'RV737' 'Armed Robbery'

Offender ID 3451 has 3 different batches (court cases), 2 of which have co-defendants. Meanwhile, offender ID 2241 has 3 counts but only one court case (batch) as well as one co-defendant.

Does this make more sense?

Regardless, thank you very much for your help!
secondarydatageek
Posts: 8
Joined: Tue Oct 16, 2007 12:32 pm
I realised that I should include the variables I want to result from this.

CaseNo ID BATCH COUNT Target1 Target2 Target3 Target4
'1' '2241' 'XL142' 'Armed Robbery' '3' '2' '2' '4'
'2' '2241' 'XL142' 'Assault' '3' '2' '1' '4'
'3' '2241' 'XL142 ' 'Kidnapping' '3' '2' '1' '4'
'4' '3451' 'XL142' 'Armed Robbery' '1' '2' '2' '4'
'5' '7417' 'KM481' 'Rape' '1' '1' '1' '1'
'6' '8419' 'PR472' 'Arson' '1' '1' '1' '1'
'7 ' '3451' 'QT771' 'Armed Robbery' '1' '2' '2' '2'
'8 ' '4723' 'QT771' 'Armed Robbery' '1' '2' '2' '2'
'9' '3451' 'RV737' 'Armed Robbery' '1' '1' '1' '1'

Target1 is the number of counts per offender
Target2 is the number of defendants per batch
Target3 is the number of defendants per count
Target4 is the number of counts per batch

Hopefully this makes sense...
Smash
Moderator
Posts: 233
Joined: Tue Aug 07, 2007 11:48 am
Yes, now it is absolutely clear
and, 4 aggregassions are enouth here,
one must be done from previouse aggregassion but others can be simply added to main file. here is the syntax part, received results are identical as in your example:

Code: Select all

``````*** target1: counts per offender (ID+BATCH).
GET FILE='E:\tmp\court_cases.sav'.
AGGREGATE
/OUTFILE='E:\tmp\court1.sav'
/BREAK=ID BATCH
/target1=N.

*** target2: offenders per batch  (done from court1 aggregassion).
GET FILE='E:\tmp\court1.sav'.
AGGREGATE
/OUTFILE=*
/BREAK=BATCH
/target2=N.
SAVE OUTFILE='E:\tmp\court2.sav'
/COMPRESSED.

*** matching target1 & 2 with main file.
GET FILE='E:\tmp\court_cases.sav'.
sort cases by ID BATCH.
MATCH FILES
/FILE=*
/TABLE = "E:\tmp\court2.sav"
/BY=ID BATCH.
exec.

*** target3: offenders per count.
AGGREGATE
/OUTFILE=*
/BREAK=BATCH COUNT
/target3=N.

*** target4: count per batch.
AGGREGATE
/OUTFILE=*