## Selection problems

Moderators: statman, Analyst Techy, andris, Fierce, GerineL, Smash

Anders

### Selection problems

I am stuck with a problem in SPSS.

I have a huge datafile (about 6000 reccords) containing data about criminality. The data-file contains data about codes for crimes comitted, dates of crimes comitted, sentences (judges about penalties), dates of sentences, penalties and dates of penalties (for example imprisonement). For a ten year period there are 82 judge decitions (sentences). Each judge decision can contain up to 30 individual crimes.

Now I want to find out how many crimes that were performed two and one year ahed of the year of sentence and one and two years after the year of sentence. In order to find out, I have to know how many crimes of a certain kind that was comitted a specifik year.

The data file is organised in the following way:

--------------Sentence 1----------- to ---------- Sentence 82 -----------
|Data, Data, Crime, Year of crime | Data, Data, Crime, Year of crime|

Now I created "flags" or "needles" for a special combination of year and a special combination of crimes comitted. For example the codes of 30050 and 30051 are codes for drunken driving. So to create a flag for drunken driving (DD) 1990 I wrote "compute DD90 = 1, if year EQ 90 AND DD = 30050 OR DD = 30051). I have done this for all years between 1990 to 2005. Now I wanted to count or sum the number of crimes comitted in a certain year accross all sentences. There are about 2065 variables about crimes comitted and 2065 variables about dates of crimes.

If I pick a "needle" like DD90 and then sum all crimes in all sentences
I get exactly the same result as when using DD91 or DD92.

It is not likely that some clients decide to comit exactly 22 or 43 DUI-crimes year after ýear. So what am I doing wrong???????

Is there any help oot there it would be much appreciated.

Thanks !

/A.
arandren
Posts: 9
Joined: Wed Apr 16, 2008 8:57 pm

### Deep or wide files

Oh !
I forgot to tell you one thing. The data stucture showed above is wide (horizontal) were each individual is a case in one row. The original data-file is deep (i.e. each sentence is a case) and some individuals can be present on as many rows in SPSS data-editor as thare are sentences.

I had a problem deciding what data-file format to use, but decided to use the wide version. Maybe this was an incorrect desition?

Any comment on this ?

/A.
statman
Posts: 2734
Joined: Tue Jun 12, 2007 12:08 pm
Location: Florida, USA
Have to think about this, might need a restructuring or special syntax to extract needed cases. Someone will get back to you.
See the note below

NOTE: Please read the Posting Guidelines and always tell us your OS, the SPSS version and information about your study and data!

Statman
Statistical Services
arandren
Posts: 9
Joined: Wed Apr 16, 2008 8:57 pm
Thanks, I need it !

Besides, the data structure in the "deep" file is:

Sentence No (1-82), date of sentence, crime, crime date, type of verdict, date of verdict, Id-number etc

In the flat file the structure is :

Id-number (for each person) , crime, crime date, Sentence number, Verdict, date of verdict etc

Someone told me that I should use Phyton to program SPSS in this case, but I have little experience with Phyton and SPSS together.

Another suggestion was to move to SAS - but likewise - I have little experience from SAS.

A third proposal was to export the file to a database and use SQL. However I am still stuck.

A fourth proposal was to migrate to SPSS16. But I do not have access to SPSS16 - and do not know what that version can do more than SPSS15, that I am working with.

So - any help would be appreciated.

/ A.
Smash
Moderator
Posts: 233
Joined: Tue Aug 07, 2007 11:48 am
A question:
What is a difference between "deep" file and "flat" file ?
I can see same structure, so every line contains 1 crime/verdict per person (id_nr).
Still looks better than wide version, which is overcomplicated and definitly would need macros to go futher. So, go back to first data plan and put here a small example - 1 person.
We go futher after this, ok?
arandren
Posts: 9
Joined: Wed Apr 16, 2008 8:57 pm
Smash!
First, thanks for your replies.

The differenxce between what I call a "deep" and a "flat" file is as follows:

Remember: I have data for about 6000 persons (convicted criminals). From the year of 1990 to 2005, some of these guys´ had up to 82 trials at court and were sentenced to a number of penalties, the same number of times. (i.e. 82). The procedure at Swedish courts are that they kind of "collect" a number of crimes in each trial - so each trial can address up to 30 crimes. I thus have data about 2065 individual crimes comitted and 2065 variables about dates of crimes. I also have data about year of trial.
Penalties can include a lot of sanctions (from inprisonement to attendance of special programs within the prison and probation service. Now I want to know what sanction (prison, program 1, program2 etc) that has the best preventive effect. So I want to be able to find out:

Nr of crimes Nr of crimes Nr of crimes Nr of Crimes
2 years 1 year 1 year 2 years
before before SANCTION after efter
sanction sanction sanction sanction

I have data about year of sanction - lets say 1990. But 1990 can pop up in many trials (theorethically 82 times). Anyhow, I hace to count or summarize the number of crimes comitted before and after a sanction for a specific year.

I have the same information in one "deep" and one "wide" file.

Deep file- structure is organized in a way where each trial/conviction is
a ccase (a row in SPSS):

--------------------------- COLUMNS --------------------------
Row -> Trial 1, Crime, Date of Crime, Trial, Date of Trial, Sanction, ID
Trial 2, Crime ,Date of Crime, Trial, Date of Trial, Sanction,ID

Since some individuals have been convicted more times the same ID can pop up in two or more rows.

The "flat" file structure is organisad like this (and C1_1 is crime number one in trial one etc, D1_1 year of C1_1)

--------------------------- COLUMNS ------------------

Row -> ID, C1_1,D1_1,C2_1, D2_1.. C1_82.D1_82 ....

Of course thare are other variables intermingled in this data structure as well (but for reasons of simplicity I only show the crimes and crime years).

In this data stucture an individual ID occupies one row only in SPSS data manager.

Does this make sence to you?

/ Anders

PS When doing a preview of my post i can se that the format of my text is corrupt, however I think you, despite that, can understand what I tried to show you.DS
Smash
Moderator
Posts: 233
Joined: Tue Aug 07, 2007 11:48 am
Ok, so "flat" is same as "wide"...
I still think "deep" file is better than "wide", because it is more understable. Maybe during analysis, we will come to "wide" version, but for understanding "deep" is much more clear.
Penalties can include a lot of sanctions (from inprisonement to attendance of special programs within the prison and probation service. Now I want to know what sanction (prison, program 1, program2 etc) that has the best preventive effect. So I want to be able to find out:
Ah, it clears your problem a little.
So, as I am clear, you want to check if person stopped doing selected kind of crime after some of sanction (taken from year of verdict?). And calculate how many crimes same type, he did after each verdict ?

To help you I still need example, please send me a part of your data and a result you want to receive from this part (numbers). Just, select few records and calculate result by hand, so I can imagine the whole process.
arandren
Posts: 9
Joined: Wed Apr 16, 2008 8:57 pm
Thanks!
I will prepare a file for you. However - it will take some houers, because I have to label it up, so you can understand the contents (if you dont speak Swedish - that is....

I´m grateful.

Back soon..

/ Anders

### Who is online

Users browsing this forum: No registered users and 1 guest