Mahalanobis Distances

Moderators: statman, Analyst Techy, andris, Fierce, GerineL, Smash

r4ph437
Posts: 2
Joined: Thu Jan 08, 2009 10:41 pm

Mahalanobis Distances

Hey guys, I hope this is the appropriate SubForum

Im attempting to detect outliers that might distort the correlation between 2 variables.
In both variables the same Ladies have a value about +2SD, and a scatter plot makes it quite obvious that they are outliers.
Now I wanted to detect them bivariate.
If I understood Mahalanobis distances correctly, it is the distance from the single sample points not the center of the entire sample, but a line (otherwise it wouldnt make any sense to use it unless we want the remaining data to be a sphere)
What does not make sense to me, is that the Distances change dramatically if I switch the dependent and independent variable. In one case I have Distances that exceed the critical values, in the other I don't.
Can somebody explain to me why this is and what the appropriate way to deal with this is?

Thanks alot
statman
Posts: 2760
Joined: Tue Jun 12, 2007 12:08 pm
Location: Florida, USA

Re: Mahalanobis Distances

Based on SPSS' description of Mahalanobis, "it is a measure of how much a case's values on the independent variables differ from the average of all cases. A large Mahalanobis distance identifies a case as having extreme values on one or more of the independent variables." Thus, it appears to me that your definition is different.

Even so, it would be natural to get different results if variables change - you have changed between the IV and the DV so I must be missing something?
See the note below

Statman
Statistical Services
r4ph437
Posts: 2
Joined: Thu Jan 08, 2009 10:41 pm

Re: Mahalanobis Distances

"Differing from the average of all cases", doesnt really make sense to me, because if we use it to detect outliers we are supposing the values should be distributed in a spere, defeating the purpose of a bivariant outlier-detection, which should include some sensitivity for 2 correlated variables.
http://en.wikipedia.org/wiki/Mahalanobi ... xplanation

Maybe I got it all wrong though.

And the reason why I was surprised that moving the IV to DV and vice versa, is that the scatterplot and the correlation and the outliers all of course stay the same, so if i want to use the Mahalobis distances to detect outliers, they should too.

So maybe Im totally lost.
So here is my basic question, I have scatterplot with two psychometric measures and two measures seem to be outliers distorting the correlation, how do I formally detect them and what can I do with them?

Thanks alot
statman