Mon Sep 10, 2012 5:32 pm

I have a large dataset (approximately 600,000 cases), of which a small subset of cases (about 2,000) participated in a treatment option (trmt=1). I have demographic characteristics for this subset:

% caucasian = 19%
% hispanic = 57%
% male = 48%

I'd like to create a comparison group from the cases who did not participate in the treatment (trmt=0, about 598,000). I'd like to retain as many cases as possible (i.e. keep the sample size as large as possible), but I want the comparison group to have the same demographic characteristics as the treatment group (males = 48%, etc.).

Is this possible, and if so, how would I go about creating the comparison group? I don't currently have the Python add-on.

Thank you very much for your assistance!
Re: Creating control group

Wed Oct 03, 2012 6:29 pm

Re: Creating control group

Mon Mar 14, 2016 8:22 am

I would create a variable that is an aggregate of all your variables of interest.

11 = male caucasian
12 = male hispanic
21 = female caucasian
22 = female caucasian


Then run frequencies on that, look at those to determine how many people should be in each group (probably based on the smallest group), and then use spss's function to randomly select cases.

