Performance Reviews of Performance Reviews and Bayesian Blindness

Recently while researching the pros and cons of performance appraisal systems I cam across a lecture from the Deming’s Institute by an educator David Langford, which seemed pretty good.  But, sadly, just to prove a point about how bad social science research is, here’s a comment made about the value of education.

Wanting to show the positive effect of school education the speaker cites data showing students who went through the school system had significantly lower rates of unemployment (less than 5%) compared to students who had not graduated from high school (40% unemployment). It was an 11 year study tracking students until they were 24 to 27 year olds. The speaker then notes:

So we knew from just looking at that statistic that we are creating people who can go out and [look at the next system].

(the last bit of that quote is garbled from the audio, but the idea I think is that he meant the graduates were able to be successful — in some sense — in society compared to early school leavers.)

So what’s the big problem here? Seems fairly definitive right? Wrong!

Although the study says something useful, all it tells me is that early school leavers are unlikely to find consistent employment on average, and school graduates are able to find employment. Is this not what the study tells you?

Yes, sure.

What this cited data does not show at all is that school helps people find employment.

It may of course be true, but there is no evidence for this in the data. It is like these social science researchers have Bayesian blindness. If you do not know what I mean then this is not your WordPress favourite. (Go look up “Bayesian inference”.) The point is, even without going through school, those top students would be much more likely to find employment. It is not necessarily going to school that influences future employment rates, there is a prior correlation between probability of staying and doing well in school and being able to find employment.

*    *    *

Now, to be even-handed, there is one really nice bit in Langford’s talk that was a little eye-opener for me:

The number one factor in variability of performance is time.

Cool to know!

Ah yes, but now can we trust this guy with his flimsy research methods? In this case I’m prepared to risk a bit of trust. No one is wrong all of the time. Still, I’m not going to go around quoting this cause of performance variability as if it were gospel. But it was a nice semi-factoid.

Furthermore, I’ve heard Sir Roger Penrose say something about this on more than one occasion. When he was a school student he was very dull-witted at mathematics (apparently). He did poorly on the school tests. Luckily though he had a lovely mathematics teacher who took an interest and recognised young Penrose’s ability to focus and work hard, so he told Penrose he could take as long as he liked on the tests.

Result: Penrose was superb at mathematics. But he was very slow. Why? Because he tried to work out everything himself, not taking too much for granted. He was deriving results rather than simply mindlessly applying rote formulae. You can imagine the young Albert Einstein might have told similar anecdotes about school life.

*    *    *

While doing my research I also found a lot of convergences between scholastic tests & exams and the ubiquitous employee performance appraisal. My conclusion is that Edwards Deming was a genius, a true humanitarian, and almost all organizations and managers who support performance review systems are blindingly stupid, or ignorant, or evil.

This goes for the much lauded ex-Google head of People Operations, Laszlo Bock. He did some good things. But Google have the luxury of being able to hire high performing people who are not in need of performance appraisals. Like the school value example, Google employees will phreakin’ vie to outperform each other in drinking water contests without touching the glass. They will vie to outperform each other in flatulence aroma. You can give them anything and they will compete for fun. Under such a culture doing performance assessments is always going to show results. But it proves nothing about the performance rating system. All it proves is that these people love to compete. (Of course some don’t, but they will still be top coders or whatever.) You hire the best, you get the best.

And nor does any of this justify behavioural management. These Googlers are not responding to carrot and stick rewards systems and incentive pay or whatever. They are just basically playing at games they naturally enjoy. It is completely cognitive psychology. It just looks like performance rewards are working, but that’s a chimera. (Give me a million dollar research grant and I’ll prove it for you with robust statistics. … I’m only half joking about that! )

Truly, I was so overwhelmed by the pathetic quality of research that supports the use of performance appraisals (it is all of the same ilk as that ill-considered comment about the value of schooling)  — please shoot me if I ever publish “research findings” that make such spurious claims  — that I wrote a long 20 page memo to my department.  It was not well-received.  People get so agitated and fearful when they cannot see a criticism of a system is not a criticism of the people within the system.  Even after trying to explain my motives, the response was, “well, you should have informed management first before emailing your memo to everyone.  You have created disharmony. ”

Well, I could understand their fear.  But I still find it hard to understand the bad quality research literature.  Or maybe I do understand it, since it is ironically part of the same problem.  People publish fast and loose research not because they wish to, but because they have performance appraisal pressures that basically say various versions of “publish or perish”. Under such career pressure academics will publish any rubbish that they can dress up as respectable, and a kind of intellectual myopia sets in whereby they eventually cannot even see that their research is rubbish.  The thing is, 90% of it is not rubbish at all, it is often really good work. At least the data is usually ok.   It’s just the conclusions and summary that are trash.

In fact, I become so incensed that I wrote a research grant proposal to simulate the effects of performance ratings systems in the academic work environment, using evolutionary models.  I tend not to listen to the publish or perish meme.  I do feel ambient stress related to it, but I actively craft my work to make it deform away.  Consequently, you might not see my proposal turn into a paper any time soon, but when published I’ll write a note on it at OneOverEpsilon  for sure.


CCL_BY-NC-SA(https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode)