Merge Using a Key Variable Range

Moderators: statman, Analyst Techy, andris, Fierce, GerineL, Smash

jchand77
Posts: 2
Joined: Mon Oct 29, 2012 2:29 am

Merge Using a Key Variable Range

Postby jchand77 » Mon Oct 29, 2012 2:52 am

Hi all,

I'm having trouble understanding how to merge two files, where one of the key variables is a range rather than a single value. My data is as follows:

Dataset 1 - Subscription information:
company/ product/ sub start date/ length of sub/ price/ variables 6 to 10
xyz abc 4/1/2012 12 months $$ other attributes
xyz abc 2/1/2011 2 months $$ other attributes

Dataset 2 - Usage Information:
company/ product/ user name/ usage date
xyz abc jon 5/1/2012
xyz abc jon 6/1/2012
xyz abc tim 7/1/2012
xyz abc tim 8/7/2012
xyz abc tim 3/1/2011

I need to merge dataset1 and dataset2 based on the unique combination of company name, product, and whether the usage date falls within the subscription range. Company name and product are easy. But the dates involve matching a single date to a date range. And I have no idea how to do this. Also, not sure if this matters but my datasets are pretty large, 1.5M for one and 38K for the other.Any suggestions will be greatly greatly appreciated.

Thanks!
J
pythonforspss.org
Posts: 116
Joined: Sat Oct 06, 2012 6:21 am

Re: Merge Using a Key Variable Range

Postby pythonforspss.org » Mon Oct 29, 2012 7:28 am

Dear jchnad77,

Do I understand correctly that the desired end result consists of the SECOND data file with info from the first data file added to it? That is, each record in the second data file becomes exactly one record in the end result?

What are the exact formats of all relevant variables in both files?

I believe you can do it with "ADD FILES" (indeed, not "MATCH FILES" which seems more likely at first glance), sort by company/product/date and then use some LAG function to evaluate whether the date is within the date range.

It's not the easiest question I've ever seen but it's certainly doable with a bit of syntax. If you work on a reasonable computer then 1.500.000 cases should be fine.
Kind regards,

Ruben Geert van den Berg
http://www.spss-tutorials.com
jchand77
Posts: 2
Joined: Mon Oct 29, 2012 2:29 am

Re: Merge Using a Key Variable Range

Postby jchand77 » Mon Oct 29, 2012 12:23 pm

Thanks Ruben. Yes, the desired result is the second file with data from the first added to it, so adding the usage data to the subscription data. The final dataset would look like:

company/ product/ user name/ usage date / sub start date/ length of sub/ price/ variables 6 to 10
xyz abc jon 5/1/2012 4/1/2012 12 months $$ other attributes
xyz abc jon 6/1/2012 4/1/2012 12 months $$ other attributes
xyz abc tim 7/1/2012 4/1/2012 12 months $$ other attributes

Re the formats, company name and product name are strings, usage date and sub start date are both dates. And the length of time for the subscription is numeric.

Can you please explain where I can find "Add Files"? I only see "Merge Files". Also how does the LAG function work?

Thanks so much.
Jenn
pythonforspss.org
Posts: 116
Joined: Sat Oct 06, 2012 6:21 am

Re: Merge Using a Key Variable Range

Postby pythonforspss.org » Mon Oct 29, 2012 8:00 pm

Dear Jenn,

You can look up both functions in the Command Syntax Reference. I should warn that in order to get this done properly, you need to master syntax to quite some extent. This is not an easy request.
Kind regards,

Ruben Geert van den Berg
http://www.spss-tutorials.com

Who is online

Users browsing this forum: No registered users and 2 guests

cron