Archive for August, 2012

Spurious Correlations and Ratios

Freitag, August 24th, 2012

Kronmal, Richard A. (1993) „Spurious Correlation and the Fallacy of the Ratio Standard Revisited“, Journal of the Royal Statistical Society. Series A (Statistics in Society), 156 (3), pp. 379-392.

This has been on my mind for a while. A lot of our research uses looks at cost overruns as the variable to measure the project performance. More precisely we most often use Actual/Estimated Cost – 1 to derive a figure for the cost overrun. A project that was budgeted for 100 and comes in at 120 thus has +20% cost overrun. If the scale needs to be transformed, which in most cases it does, the simple Actual/Estimated ratio offers some advantages, i.e., figures being non-negative.

Most criticism for this comes from the corner of Atkinson (1999)*, i.e., that the holding the project accountable for its initial Cost-Benefit-Analysis (+Time) is an unfairly narrow view that ignores the value of building stuff itself, the wider and possibly non-quantitative benefits for the organisation and the wider and most likely non-quantitative benefits for the stakeholder community.

However, a second corner of critics has also a powerful argument. Ratios cause all sorts of statistical headaches. First, dividing a normally distributed variable by another normally distributed variable creates a log-normal distributed variable, i.e., it creates outliers that are solely an artefact of the ratios.

More importantly than distributional concerns are spurious correlations. This is an example from the article

„… a fictitious friend of Neyman (1952), in an empirical attempt to verify the theory that storks bring babies, computed the correlation of the number of storks per 10000 women to the number of babies per 10000 women in a sample of counties. He found a highly statistically significant correlation and cautiously concluded that ‚. . . although there is no evidence of storks actually bringing babies, there is overwhelminge videncet hat, by some mysteriousp rocess, they influencet he birth rate‘!“ (Kronmal 1993:379)

What happened in that example. The regression should have been the test of the number of storks and the number of babies in a county. The argument for the ratio is that it will control for the number of women in the county. The argument against it is that that creates a spurious correlation. Better would be an ANCOVA type structure. Or as the article puts it

„This example exemplifies the problem encountered when the dependent variable is a ratio. Even though Y, the numerator of the ratio, is uncorrelated with X, the independent variable, conditional on Z, the ratio is significantly correlated to X through its relationship to Z, the denominator of the ratio.“ (Kronmal 1993:386)

Three more observations are made in the article

  1. Using the two variables and their interactions instead of a ratio commonly makes for a worse model than using the ratio, particularly in stepwise regression models.
  2. Ratios are an interaction and can only be adequately interpreted in an equation that includes both of these variables (the main effects)
  3. Use a full regression model with interactions, then include the ratio if it adds to it

The final advice is

But what if the ratio is the ’natural quantity of interest‘, just like in our performance measurement?

The division of the outcome by the estimate is to remove its effect from the numerator variable. Kronmal questions whether „this is the optimal way to accomplish this“. He goes on further „…even when such rates are used, there is no reason not to include the reciprocal of the population size as a covariate. For other ratios, the purpose of the denominator is usually to adjust for it. In these instances, there is little to commend the use of this method of adjustment.“ (Kronmal 1993:391)

I will think about this a while, get in touch if you want to share thoughts on this.

* Atkinson, Roger (1999) „Project management: cost, time and quality, two best guesses and a phenomenon, its time to accept other success criteria“, International Journal of Project Management, 17 (6), pp. 337-342.

How to write a good essay

Mittwoch, August 8th, 2012

Yesterday at lunch I had a discussion with two of our MSc students on how to write. We started of on how to write a good thesis and ended up talking about how to write a good essay. This morning I got an e-mail from the Chair of the Examiners, who is the person running a committee that decides the marks for student work.

N.B. marking in Oxford is its own case study of accountability, transparency, and power. I don’t understand how such an intricate system has evolved that relies on double-blind processes combined with committee decisions and multiple-levels of hierarchy to quality control all to derive ‚objective‘ marks while the revelation that facts are constructed came to this institution as a big surprise.

The email I got this morning asked me to give some students feedback on one of their essays. I have to admit switching from communication by powerpoint to communication via unformatted, double-spaced, prose was one of the greatest challenges of starting with this DPhil. I also just read Dan Ariely’s brilliant blog post and the subsequent op-ed in the LA Times on this topic.

Drum roll. Here is my list on ‚How not to write you essay

  1. Answer a different question. Well, why wouldn’t you. Time is short, the deadline looms. Luckily, in this other course there was a required reading, which you still remember and which could shine some new light on the question. Brilliant idea! Of course there are bonus points to be earned for bringing in new literature. This is perfect murder of two birds with one stone. Unfortunately the execution often falls through. The argument, already a basket case full of apples and oranges, doesn’t get the cream and chocolate sprinkles on top, which it deserves but rather gets a completely new addition, which looks more like a block of cheese with a smell of old socks rather than a fresh idea.
  2. Look up the etymology of the key concepts. No argument has ever been advanced by looking up the etymology, well outside the realm etymologists anyways. It is always good to know that the word project can be traced back to 1450. Always good way to use space.
  3. Give good solid definitions for all concepts. A good essay ought to start with a long laundry list of working definitions for key concepts. Let’s define risk, organisations, bias, projects, and my favourite major programmes. Once that is out of the way we can actually start looking at the question. Again a great way to use the space.
  4. Write up the lecture slides. Just on the off-chance that the marker hasn’t read the slides, just copy them and expand the text a little. Did you make a recording of the lecture. Even better. Easy peasy lemon squeezy.
  5. Cover everything that has been touched upon in class. Decision-making is hard, to decide what concepts to use and which ones to ignore is risky. Avoid cutting something out whenever possible. On the flip side if you cut something out you should not talk about why you took a specific lens.
  6. Make shit up. Drop names. I do have 10 years of experience in this, so let me tell you what I think. I think that the following 8 factors are the key to success in the field. Also, since it is my own opinion I don’t need to add references. Time saved! Damn, they want a reference. Let’s just put an article here whose title sounds as if they would agree with my thinking. Done!
  7. Be Malcom Gladwell „A cursory reading of 5 journal articles has brought me here today to tell you…“

My list for a good essay

  1. 1 idea per paragraph, first sentence explains how this is important to answer the question, last sentence gives the so what? and answers the question. Sounds simple, then go on and do it!

My background is in Computer Science and my old prof Eric Schoop introduced me to information mapping most essays I have to read would certainly benefit from bringing stronger principles to writing.