Mahalanobis Distances

Moderators: statman, Analyst Techy, andris, Fierce, GerineL, Smash

r4ph437
Posts: 2
Joined: Thu Jan 08, 2009 10:41 pm

Mahalanobis Distances

Postby r4ph437 » Sat Mar 07, 2009 1:29 pm

Hey guys, I hope this is the appropriate SubForum

Im attempting to detect outliers that might distort the correlation between 2 variables.
In both variables the same Ladies have a value about +2SD, and a scatter plot makes it quite obvious that they are outliers.
Now I wanted to detect them bivariate.
If I understood Mahalanobis distances correctly, it is the distance from the single sample points not the center of the entire sample, but a line (otherwise it wouldnt make any sense to use it unless we want the remaining data to be a sphere)
What does not make sense to me, is that the Distances change dramatically if I switch the dependent and independent variable. In one case I have Distances that exceed the critical values, in the other I don't.
Can somebody explain to me why this is and what the appropriate way to deal with this is?

Thanks alot
statman
Administrator
Posts: 2721
Joined: Tue Jun 12, 2007 12:08 pm
Location: Florida, USA

Re: Mahalanobis Distances

Postby statman » Sun Mar 08, 2009 6:37 pm

Based on SPSS' description of Mahalanobis, "it is a measure of how much a case's values on the independent variables differ from the average of all cases. A large Mahalanobis distance identifies a case as having extreme values on one or more of the independent variables." Thus, it appears to me that your definition is different.

Even so, it would be natural to get different results if variables change - you have changed between the IV and the DV so I must be missing something?
See the note below

NOTE: Please read the Posting Guidelines and always tell us your OS, the SPSS version and information about your study and data!

Statman
Statistical Services
r4ph437
Posts: 2
Joined: Thu Jan 08, 2009 10:41 pm

Re: Mahalanobis Distances

Postby r4ph437 » Mon Mar 09, 2009 8:34 pm

Thanks for the answer
"Differing from the average of all cases", doesnt really make sense to me, because if we use it to detect outliers we are supposing the values should be distributed in a spere, defeating the purpose of a bivariant outlier-detection, which should include some sensitivity for 2 correlated variables.
http://en.wikipedia.org/wiki/Mahalanobi ... xplanation

Maybe I got it all wrong though.

And the reason why I was surprised that moving the IV to DV and vice versa, is that the scatterplot and the correlation and the outliers all of course stay the same, so if i want to use the Mahalobis distances to detect outliers, they should too.

So maybe Im totally lost.
So here is my basic question, I have scatterplot with two psychometric measures and two measures seem to be outliers distorting the correlation, how do I formally detect them and what can I do with them?

Thanks alot
statman
Administrator
Posts: 2721
Joined: Tue Jun 12, 2007 12:08 pm
Location: Florida, USA

Re: Mahalanobis Distances

Postby statman » Tue Mar 10, 2009 12:06 pm

Check out standardizing the variable via Descriptives and save the scores then look at high + or - scores or filter > a specified amount

Who is online

Users browsing this forum: No registered users and 3 guests

cron