creating overall score, aggregating by manager ID

Moderators: statman, Analyst Techy, andris, Fierce, GerineL, Smash

a.a.marshall84
Posts: 3
Joined: Thu Jan 08, 2009 9:46 pm

creating overall score, aggregating by manager ID

Postby a.a.marshall84 » Mon Jan 12, 2009 12:22 am

I posted before explaining a problem I am having replicating the way a computer system computates the overall score for managers. If a manager has 4 employees, the system adds up all of the individual answers to each question, and then divides by the total number of answers recieved.

I have the raw data set and need to aggregate by manager ID. If I combine all of the items to create an overall score, such as Compute an overall variable like, MEAN(q1,q2,q3,q4..), then I get an average overall score for each individual employee, and when I aggregate by manager ID, it computes the mean of those employee means, by manager. This results in slightly different numbers than when all of the employee items under one manager are added together, and divided by number of items.

Previously, it was suggested that when I aggregate, instead of getting the mean, I combine the items into a sum, and then divide by the number of items mulitiplied by the N_Break variable. The problem with this, I found out, is that not everything is being divided by a consistent number, because items that were not answered (system missing) are not included in the overall count.

I essentially need a way to group output by manager id, where all items are added together and divided by the number of items that were added.

Anyone have any ideas?
Fierce
Moderator
Posts: 391
Joined: Sun Dec 28, 2008 5:55 pm
Location: Belo Horizonte, Brasil

Postby Fierce » Mon Jan 12, 2009 3:20 pm

Can ou define missing values as 0 or as a very high number? Probably yes (if 0 appears as a value, then 10000000 shouldn't appear).
For this example, let's call you vriables ID, v1, v2, v3, v4 and v5.
First thing, you'll need a variable consisting of the missing frequencies (TRANSFORM>COMPUTE).

Code: Select all

COMPUTE v_miss=SYSMIS(v1)+SYSMIS(v2)+SYSMIS(v3)+SYSMIS(v4)+SYSMIS(v5). 
EXECUTE.
After that, you can redode you missing values (if possible, as 0, it will be easier; (TRANSOFMR>RECODE INTO SAME VARIABLES).

Code: Select all

RECODE v1 v2 v3 v4 v5 (MISSING=0). 
EXECUTE.
Now you need to sum the 5 variables and aggregate by manager ID (DATA>AGGREGATE). You also need to aggregate the v_miss variable.

Code: Select all

COMPUTE v_sum=v1+v2+v3+v4+v5.
EXECUTE.
AGGREGATE 
  /OUTFILE=* MODE=ADDVARIABLES 
  /BREAK=VAR00001 
  /V_sum_sum=SUM(v_sum) 
  /V_miss_sum=SUM(v_miss).
And now, finally, divide by the number of cases. Notice that the missing values were incluede as zero, so is like they were not included, but now you have the sum for all variables, eve the ones who had missing values. Of course, you want to divide by the total number os variables excluding missing, so you'll use the v_miss variable (that's why you aggregate it). Now is easy to compte the mean (v_mean)

Code: Select all

COMPUTE v_mean=v_sum_sum/(10-v_miss_sum).
EXECUTE.
IF you can't use 0 for missing, use something like 1000000 (after computing the v_miss, recode the missing values as 1000000). Compute all sums but, before compute v_mean, change the sum vaiables:

Code: Select all

COMPUTE v_sum_correct=v_sum_sum-v_miss_miss*1000000.
EXECUTE.
It will "remove the excess" of numbers. COmpute v_mean using v_sum_correct and not v_sum_sum.

After doing all of that, define 0 or 1000000 as missing again.

Who is online

Users browsing this forum: No registered users and 1 guest

cron