SPSSINC ANON VARIABLES (anonymize) command very slow

Moderators: statman, Analyst Techy, andris, Fierce, GerineL, Smash

rubbershark
Posts: 2
Joined: Tue Feb 23, 2016 7:37 am

SPSSINC ANON VARIABLES (anonymize) command very slow

Postby rubbershark » Tue Feb 23, 2016 8:02 am

Hi,
I'm running SPSS Statistics 23 and I'm trying to anonymize a bunch of usernames in a data set of about 500 K records with the following command:

SPSSINC ANON VARIABLES =user_name
/OPTIONS ONETOONE=user_name MAXRVALUE=9999999
METHOD=SEQUENTIAL

If I take a random sample of 5 K users it only takes about one minute to complete this command successfully.
With the full data set this is very slow, it has now been running about 45 minutes with no end in sight (this is my third time trying to run this command, I've rebooted my laptop but no help).
I've presorted the file with the user_name in ascending order.

Is my data set simply too big for this command?
GerineL
Moderator
Posts: 1477
Joined: Tue Jun 10, 2008 4:50 pm

Re: SPSSINC ANON VARIABLES (anonymize) command very slow

Postby GerineL » Tue Feb 23, 2016 8:22 am

Well if it takes 3 minute to do 1/100 of the sample, it makes sense that it would take about that time.

what you could do as an alternative is to draw 10 variables with an random number between 0 and 7, and them merge these together in 1 variable.
You could use "identify duplicate cases" to see if there are any ids that are double, and then repeat the process for these cases.
Might go faster.
rubbershark
Posts: 2
Joined: Tue Feb 23, 2016 7:37 am

Re: SPSSINC ANON VARIABLES (anonymize) command very slow

Postby rubbershark » Tue Feb 23, 2016 8:50 am

Thanks for the suggestion!
I forgot to mention in my original question that the data set includes usage data, i.e. the same user has several events (=cases) within the data set. Each unique event should be allocated to the corresponding anonymized user. Thus 'straight' randomization does not work.
GerineL
Moderator
Posts: 1477
Joined: Tue Jun 10, 2008 4:50 pm

Re: SPSSINC ANON VARIABLES (anonymize) command very slow

Postby GerineL » Tue Feb 23, 2016 11:00 am

might it go faster if you use identify duplicate cases to get a list of unique persons, create id codes for them and merge them back in?

Who is online

Users browsing this forum: johnnyrut and 1 guest

cron