Append cases only when no match exists in destination file

Moderators: statman, Analyst Techy, andris, Fierce, GerineL, Smash

0101
Posts: 52
Joined: Fri Mar 04, 2011 9:42 pm

Append cases only when no match exists in destination file

Postby 0101 » Mon Apr 30, 2012 7:18 am

I am trying to append cases (Data -> Merge Files -> Add Cases) from File A to File B, but I want to append only those cases in File A that are not in File B (i.e., no match on two variables), within (breaking on) a third variable.

For example, File A contains Survey IDs and the questions (QstnID) on each survey:

Code: Select all

SurveyID QstnID
---------------
4        1
4        2
4        3
4        4
4        5
-----------
5        1
5        2
5        3
5        4
-----------
etc.

Survey 4 has five questions, survey 5 has four questions, etc.

File B contains surveys that have been completed (CompletionID), but doesn't always include every possible question for the survey. For example:

Code: Select all

CompletionID SurveyID QstnID
1            4        1
1            4        2
--------------------------
2            4        5
--------------------------
3            5        1
3            5        3 
--------------------------
- The first completion of Survey 4 includes only 2 of the 5 questions (I need to append QstnIDs 3-5 from File A.)
- The second completion of Survey 4 includes only 1 of the 5 questions (I need to append QstnIDs 1-4 from File A.)
- The completion of Survey 5 includes 2 of the 3 questions for Survey 5 (I need to append QstnIDs 2 and 4 from File A.)

The result should look like below (* indicates what was appended):

Code: Select all

CompletionID SurveyID QstnID
1            4        1
1            4        2
1            4        3*
1            4        4*
1            4        5*
--------------------------
2            4        1*
2            4        2*
2            4        3*
2            4        4*
2            4        5
--------------------------
3            5        1
3            5        2*
3            5        3
3            5        4* 
--------------------------
Any suggestions about how to do this? I obviously need the ADD FILES command, plus maybe some sort of loop and aggregate?
apeape
Posts: 56
Joined: Mon May 02, 2011 6:07 pm

Re: Append cases only when no match exists in destination fi

Postby apeape » Mon Apr 30, 2012 1:49 pm

This should work for you. Rather than trying to conditionally append, it merges everything and conditionally removes duplicates.

Code: Select all

** sample data.
data list list /CompletionID SurveyID QstnID.
begin data.
1     4        1
1     4        2
1     4        3
1     4        4
1     4        5
2     4        1
2     4        2
2     4        3
2     4        4
3     5        1
3     5        2
3     5        3
3     5        4
end data.
dataset name set1.

data list list /CompletionID SurveyID QstnID.
begin data.
1            4        1
1            4        2
2            4        5
3            5        1
3            5        3
end data.
dataset name set2.


** merge files.
add files /file=* 
  /file='set1' 
  /in=source. 
exe.

dataset close set1.

** identify and remove the duplicates from dataset 1.
sort cases by CompletionID SurveyID QstnID. 
match files 
  /file=* 
  /by CompletionID SurveyID QstnID 
  /first=duplicate.
select if (source = 0 and duplicate = 1) or (source = 1 and duplicate = 1).
exe.
delete variables duplicate.
0101
Posts: 52
Joined: Fri Mar 04, 2011 9:42 pm

Re: Append cases only when no match exists in destination fi

Postby 0101 » Mon Apr 30, 2012 5:01 pm

The dataset you created for set 1 doesn't match the my first dataset. Yours has a CompletionID (mine doesn't), and also lists every completed survey (e.g., SurveyID 4 appears twice) with the full list of questions for each. Mine lists each Survey only once.

I've edited the sample data routine accordingly:

Code: Select all

* Sample data.
* SurveyID 4 has 5 questions, SurveyID 5 has 4 questions, SurveyID 6 has 2 questions.
data list list /SurveyID QstnID.
begin data.
4        1
4        2
4        3
4        4
4        5
5        1
5        2
5        3
5        4
6        1
6        2
end data.
dataset name set1.

* SurveyID 4 was completed twice. The first instance has questions 1 & 2 but not 3-5; the second instance has question 5 but not 1-4.
* SurveyID 5 was completed once. It includes questions 1 & 3 but not 2 & 4. SurveyID 6 was not completed.
data list list /CompletionID SurveyID QstnID.
begin data.
1            4        1
1            4        2
2            4        5
3            5        1
3            5        3
end data.
dataset name set2.

Who is online

Users browsing this forum: No registered users and 1 guest

cron