Merge two files with duplicates in both

Moderators: statman, Analyst Techy, andris, Fierce, GerineL, Smash

0101
Posts: 52
Joined: Fri Mar 04, 2011 9:42 pm

Merge two files with duplicates in both

Postby 0101 » Wed Jan 14, 2015 6:42 pm

I would like to merge two datasets, but both have duplicate IDs (Metric is the ID I'm matching on) in them.

My first dataset has:

Code: Select all

State Metric Result
AL  Employment   .40
CA  Employment   .52 
AL  Poverty      .30
CA  Poverty      .24
...
My second dataset has:

Code: Select all

Metric      UsedFor
Employment  Forecasting
Employment  Reporting
Poverty     Forecasting
Poverty     Reporting
...
I would like to merge variables from the second dataset into the first, matching on Metric.

The results should be:

Code: Select all

State Metric Result   UsedFor
AL  Employment   .40  Forecasting
AL  Employment   .40  Reporting
CA  Employment   .52  Forecasting
CA  Employment   .52  Reporting
AL  Poverty      .30  Forecasting
AL  Poverty      .30  Reporting
CA  Poverty      .24  Forecasting
CA  Poverty      .24  Reporting
...
But due to the duplicate keys, I will always get the "Duplicate key in a file" warning.

What's the best way to do this merge?
GerineL
Moderator
Posts: 1477
Joined: Tue Jun 10, 2008 4:50 pm

Re: Merge two files with duplicates in both

Postby GerineL » Thu Jan 15, 2015 12:27 pm

in the example you give, it seems like you just want to double your cases and give 1 "forecasting" and 1 "reporting" on this new variable.
if that is the case, why use merging at all?

Otherwise, you can think about using 2 variables as ID (e.g., employment + something else) to make it unique.

Who is online

Users browsing this forum: No registered users and 2 guests

cron