Different median in custom tables to frequency

Moderators: statman, Analyst Techy, andris, Fierce, GerineL, Smash

djphilthy
Posts: 15
Joined: Fri Nov 28, 2014 8:49 pm

Different median in custom tables to frequency

Postby djphilthy » Fri Nov 28, 2014 10:11 pm

I have just discovered that when measuring the median for the same data (Scale variables to 2dp) using different functions, I get two different results. The differences are negligible and and only noticeable at the 4th decimal place, however, they are nonetheless different.

So, does anyone know the reason for this? The differences are between 1. Custom tables and 2. (both) compare means and frequencies match.

I have checked and they match when the data has the weights turned off, however with the weights on, I cannot get them to match despite trying various combinations of decimal places etc. I assume it is a default setting in one of the functions that looks at the data in a slightly different way to the other, but I cannot find anything on it. Has anyone come across this before?

Thanks,
Phil
djphilthy
Posts: 15
Joined: Fri Nov 28, 2014 8:49 pm

Re: Different median in custom tables to frequency

Postby djphilthy » Mon Dec 01, 2014 11:45 am

Is anyone able to help me with my query?

When I turn the weights into integers, I get matching results between the two functions, but different results once again to the two original - seems counter-intuitive to my naive inexperienced mind...

FYI I am using SPSS 19 and windows 7.

Thanks in advance
Phil
djphilthy
Posts: 15
Joined: Fri Nov 28, 2014 8:49 pm

Re: Different median in custom tables to frequency

Postby djphilthy » Wed Dec 03, 2014 9:39 am

Has anyone come across this issue before?

Any help would be greatly appreciated!

Many thanks,
Phil
GerineL
Moderator
Posts: 1477
Joined: Tue Jun 10, 2008 4:50 pm

Re: Different median in custom tables to frequency

Postby GerineL » Wed Dec 03, 2014 12:22 pm

No I have not encountered this issue before. I guess you are right and there is a difference in algorithm used.
You could try to contact IBM to see what algorithms are indeed used.
Also, I can't help but wonder whether it actually matters... ?
djphilthy
Posts: 15
Joined: Fri Nov 28, 2014 8:49 pm

Re: Different median in custom tables to frequency

Postby djphilthy » Wed Dec 03, 2014 1:23 pm

Thanks for the response. I am trying to contact IBM independently. I'll update if I get to the bottom of it :)

And your point is perfectly valid, in this instance it doesn't really matter no, however there are definitely circumstances when it could matter. I'm sure it's to do with rounding the weight variable at a slightly different stage of the process, which seems bizarre. For me I guess it's more a peace of mind thing (and just personal frustration of not getting to the bottom of it)

Thanks again,
Phil
GerineL
Moderator
Posts: 1477
Joined: Tue Jun 10, 2008 4:50 pm

Re: Different median in custom tables to frequency

Postby GerineL » Thu Dec 04, 2014 10:55 am

Ok, let us know what you find out, I am sort of curious now ;-)
djphilthy
Posts: 15
Joined: Fri Nov 28, 2014 8:49 pm

Re: Different median in custom tables to frequency

Postby djphilthy » Thu Dec 04, 2014 3:17 pm

So, I got a reply from IBM, linking me to a previous conversation topic:

SPSS has five different methods for computation of percentiles (see the statistical algorithms for the EXAMINE procedure, available via Help>Algorithms). The method used in FREQUENCIES, and the default method in EXAMINE, is what's known as HAVERAGE, or the weighted average at X-sub-(w+1)*p, which we describe as the weighted average of X-sub-i and X-sub-i+1, where i is the integer part of (w+1)*p (w is the weighted case count, which would often be called N). This method can be confusing, as it can give results where the estimated percentile is higher than the case representing p% of the sample distribution. It is, however, the method yielding an unbiased estimate of a population percentile for p.

A useful discussion of percentiles that features this version in its primary definition is provided in an online engineering statistics text provided by the U.S. National Institute on Standards and Technology at http://www.itl.nist.gov/div898/handbook ... prc262.htm . Another method that is widely used is AEMPIRICAL, which will give either one of the values in the data set or half way between two values. CTABLES and the Visual Bander both use the AEMPIRICAL method. It is the "third way" discussed on the NIST web site.


I've read over the literature and although by no means a statistics expert, it seems to me that the CTABLES approach is less accurate as it will be rounding the data at an earlier stage but it seems that either way is going to be an estimate of some degree, so it's a case of choosing the best to suit your needs (which I imagine will normally be the default HAVERAGE function)

If anyone can help interpret IBM's response for me into slightly more user-friendly language, that would be great? As I need to explain this away to a client and not sure if I could, based on the above!

Cheers,
Phil

Who is online

Users browsing this forum: No registered users and 4 guests

cron