Postby **statsanon** » Fri Nov 06, 2015 3:30 am

Hi

Ignoring the fact that there are many critiques of the use of median splits (such as losing huge amounts of information, distorting your distribution, treating everyone in the high and low groups as equivalent, making an arbitrary distinction between an IQ of 99 and 101 etc.). Also for a measure such as IQ are you using the published median (100) for IQ or the median of your sample.

Some tests in SPSS include a cut point as an option. For example the independent groups t-test allows you to define your groups on a continuous variable using a cut point (e.g., your median). SPSS uses ">= cut" to define the higher group.

Alternatively create a new variable (IQ_group) with two values (Low, High or 0,1). Then use Transform->Compute Variable or write some simple syntax to classify your cases into the two groups.

eg

COMPUTE IQ_group = 0 if IQ <= (or <) <insert your median here>.

COMPUTE IQ_group = 1 IF IQ > (or >=) <insert your median here>.

EXECUTE.

One obvious problem with median splits is that depending upon the nature of your variable (eg how close it approaches a continuous variable) and your sample size, the sizes (and characteristics) of your low and high groups could vary simply on the basis of whether you use <= or < for the low group and > or >= for the high group when you split your data. As you know the theory behind a median is that there is a 50% chance of being above or below but by forcing a split using < or <= you may end up with very unequal groups. With IQ you may have a reasonable frequency distribution with a reasonable range so it may not be an issue. If you have a variable with a small number of discrete values you could have problems.

Consider the following 10 approximately normal data values with a median of 5:

0 1 4 5 5 5 5 5 6 7

Using a median split could result in the following groups (0 1 4 5 5 5 5 5) and (6 7) or (0 1 4) and (5 5 5 5 5 6 7). The only way around this if you wanted more equal groups would then be to do a random sample from the cases in your median to give you some semblance of a reasonable split. Getting messy really

Correlation, regression and other types of linear models are much preferred approaches than forcing data into nominal variables simply to use t-tests and ANOVAs.