I have a large dataset (approximately 600,000 cases), of which a small subset of cases (about 2,000) participated in a treatment option (trmt=1). I have demographic characteristics for this subset:
% caucasian = 19%
% hispanic = 57%
% male = 48%
I'd like to create a comparison group from the cases who did not participate in the treatment (trmt=0, about 598,000). I'd like to retain as many cases as possible (i.e. keep the sample size as large as possible), but I want the comparison group to have the same demographic characteristics as the treatment group (males = 48%, etc.).
Is this possible, and if so, how would I go about creating the comparison group? I don't currently have the Python add-on.
Thank you very much for your assistance!