Combine two datasets

Moderators: statman, Analyst Techy, andris, Fierce, GerineL, Smash

Tim Allan

Combine two datasets

Postby Tim Allan » Fri Jun 15, 2007 11:10 am

Apologies if this is a real beginners' question - I am quite new to SPSS.

I have two SPSS datasets (D1 and D2), each with different layouts. Both
datasets contain one common field (Field X).

What I want to do is this: for each occurence of Field X in dataset D1, I
want to check if it exists in D2 and, if so, either flag it or filter it
out to a new dataset. Both datasets contains tens of thousands of records,
thus preventing me from doing it manually.

All occurences of Field X in D1 are present in D2. However, in D2 there
are many other occurences of Field X that I am not interested in, hence the
exercise of removing them.

Can anyone help out please?
forum
Site Admin
Posts: 14
Joined: Tue Apr 10, 2007 8:17 am

Postby forum » Fri Jun 15, 2007 11:42 am

Hi Tim,

I don't know if this is possible in SPSS. I always use Excel for this task. Paste both datasets in a different Excel sheet. Than use the VLOOKUP function for every record one of the sheets to check if it excists in the other sheet. Use the function as follows:

=VLookup(A1, Sheet2!A1:A10000, 1, TRUE)

A complete explanation of the function is give on -for example- this page:

http://techonthenet.com/excel/formulas/vlookup.php

Please let me know if this works out for you.
statman
Administrator
Posts: 2735
Joined: Tue Jun 12, 2007 12:08 pm
Location: Florida, USA

Postby statman » Fri Jun 15, 2007 1:14 pm

Right, not directly available in SPSS but might be in it programmability facility via Phython. Go to Developer Central where you might get some advice at

Code: Select all

http://www.spss.com/devcentral/
See the note below

NOTE: Please read the Posting Guidelines and always tell us your OS, the SPSS version and information about your study and data!

Statman
Statistical Services
Niat
Posts: 5
Joined: Wed Sep 28, 2011 12:15 pm

Re: Combine two datasets

Postby Niat » Wed Sep 28, 2011 3:53 pm

Tim Allan wrote:Apologies if this is a real beginners' question - I am quite new to SPSS.

I have two SPSS datasets (D1 and D2), each with different layouts. Both
datasets contain one common field (Field X).

What I want to do is this: for each occurence of Field X in dataset D1, I
want to check if it exists in D2 and, if so, either flag it or filter it
out to a new dataset. Both datasets contains tens of thousands of records,
thus preventing me from doing it manually.

All occurences of Field X in D1 are present in D2. However, in D2 there
are many other occurences of Field X that I am not interested in, hence the
exercise of removing them.

Can anyone help out please?

This is exactly what i need help with but i need the Field X in dataset 1 to be matched with the ones available in dataset x and extract those observations to a separate fil or just move them over to Dataset 1. My dataset is huge and I cannot do this one by one in excel. Is there a faster way with SPSS??

Thanks!!!!!
GerineL
Moderator
Posts: 1477
Joined: Tue Jun 10, 2008 4:50 pm

Re: Combine two datasets

Postby GerineL » Wed Sep 28, 2011 4:14 pm

Ok so you have 1 variable (field X) let's call it ID.
I am not sure if I understand correctly, so just bare with me.
I have an example of 5 ID's here, it is of course way more complex for you since you have more, but the principle is the same.

File 1 contains:

ID [all sorts of other variables, lets say X1 to X20]
1
3
5
7

File 2:

ID [all sorts of other variables, lets say Z1 to Z20].
1
2
3
4
5

So the ones that should be moved to a new file are ID 1, 3 and 5 since they are in both datasets.
Now you want a new file with only ID 1, 3 and 5, which includes variables X1 - X20 and variables Z1 to Z20.

There is an easy solution for this:

- In file 1, create a new variable with the value 1 for each case (call it CONS or something).
- Sort on ID in both files, merge file 1 to file 2
- Recode CONS (sysmis = 0).
- Now select cases on CONS, filtering out all 0

done.

of course, using select cases you can also copy all the 1's to a new dataset etc.
Niat
Posts: 5
Joined: Wed Sep 28, 2011 12:15 pm

Re: Combine two datasets

Postby Niat » Wed Sep 28, 2011 4:32 pm

Thank you so much for taking time to help. I think I need to explain the situation more closely.

Dataset 1:
ID
2
4
6
8

Dataset 2:
ID ...and plenty more variables for each ID-nr
1
2
3
4
5
6
7
8

So all the ID-numbers in Dataset 1 are definitley available in Dataset 2 but I want to extract those ID-number in Dataset 1 from Dataset 2 including the values of the variables. Do I do it the same way?

And could you explain the steps a bit more closely.
When I sort the variable ID, does it make a difference since the ID observations are not listed the same on the txo datasets since the other one contains less observations. Is this a problem when I merge?
How do I recode??

Sorry, Im new to SPSS, Im a SAS user :)
GerineL
Moderator
Posts: 1477
Joined: Tue Jun 10, 2008 4:50 pm

Re: Combine two datasets

Postby GerineL » Thu Sep 29, 2011 8:02 am

Niat wrote: So all the ID-numbers in Dataset 1 are definitley available in Dataset 2 but I want to extract those ID-number in Dataset 1 from Dataset 2 including the values of the variables. Do I do it the same way?
Niat wrote: And could you explain the steps a bit more closely.
- In file 1, create a new variable with the value 1 for each case (call it CONS or something).

compute CONS = 1.
execute.


- Sort on ID in both files,
When I sort the variable ID, does it make a difference since the ID observations are not listed the same on the txo datasets since the other one contains less observations. Is this a problem when I merge?
in order to be able to merge, the ID variables have to be of the same type and length etc.
E.g. numeric width is 8.

It does not matter that one of them contains more or less observations, you take one as a startingpoint (here: file 2, since this is where you want to extract the relevant variables) and merge the other.
I've done this oftentimes and everything goes ok as long as you click the right options.
But you do need to make sure the ID variables look the same in variable view, and that there are no double ID's in the file you merge FROM (i.e., file one which only contains 2, 4, 6, 8 in your example).

- merge file 1 to file 2
Just follow the menu, indiciate the ID variable as keyed variable, click option "non-active dataset is keyed table"

- Recode CONS (sysmis = 0).

recode CONS (sysmis = 0).
execute


- Now select cases on CONS, filtering out all 0
Look into select cases online, there is several options and you can take whatever you want, I think it is pretty much self explanatory but otherwise look in the manual of your spss version, since it differs a bit between versions.
Just remember that cases with a 0 are now only in file 2, cases with a 1 are also in file 1.
Niat
Posts: 5
Joined: Wed Sep 28, 2011 12:15 pm

Re: Combine two datasets

Postby Niat » Thu Sep 29, 2011 8:57 am

THANK YOU SO MUCH!

This is an amazing site :)))

Who is online

Users browsing this forum: No registered users and 1 guest

cron