I have details relating to over 700,000 customer contacts. Our customers may use more than one address (e.g. home and business) and more than one telephone number (e.g. mobile or landline). It is also possible that there may be more than one name (e.g. mis-spelling) recorded for a particular person. I have identified 511,000 unique combinations of name / address / telephone number. Also, there are 4,000 unique names, 210 unique telephone numbers and 2,000 unique addresses.
My challenge is to identify all on the customer contacts that may be related to a single customer. I have a file which lists all 700K contacts, along with the unique identifier created for each combination of name / address / telephone number. We also know all of the names, addresses and telephone numbers associated with that contact.
In testing, and on small scale data, the following macro works well. However, it is proving impossible for the macro to manage the entire dataset as a single entity. Is there another, more efficient, way that this process could be written in SPSS?
Thank you in anticipation.
SET MITERATE 1000001.
!Do !Repeat_Contact = 1 !TO 155649.
DATASET ACTIVATE CONTACT_DATA_FILE.
DATASET COPY Nominal_DATA.
DATASET ACTIVATE Nominal_DATA.
COUNT TARGET=UNIQUE_REF to END_1 !CONCAT ("('","URN_",!Repeat_Contact,"')").
IF (TARGET ge 1) Group_URN = !QUOTE (!CONCAT ("Group_URN_",!Repeat_Contact)).
SELECT IF Group_URN = !QUOTE (!CONCAT ("Group_URN_",!Repeat_Contact)).
Select if (Num_Link_Contacts ge 5).
DATASET ACTIVATE Output_TEMPLATE.
ADD FILES /FILE=*
DATASET CLOSE Nominal_DATA.
* RUN MACRO!.
*SET MPRINT ON.