The importance of birth month for professional athletes’ pt.2

Terje Kristensen

Birth quarter for 736 players in the 2014 World cup

Well, hey look at that. If you further (see last post) divide the players into their respective continents, this pattern becomes even clearer. Of the 92 players representing Asia (incl. Australia) in this world cup, only 7 of the 92 (7.5 %!) players are born in the last three months of the year. In case you wonder, no, Kagawa (17.03.89) is not one of them.  For Europe this number is 53 out of 299 (~17.7%). Interestingly, this pattern appears to be opposite for African players.

If you’d like to work with the data, just leave me a note.

Posted in Sports | Tagged , , , , , , , | Leave a comment

Your Birth Month May Determine Your Chances Of Playing In The World Cup


I analyzed the 736 players in this years world cup: ~30% are born in the first quarter, while only ~20% are born in Q4.


So why is this so?

Here’s what author Malcolm Gladwell said to ESPN (about similar skewness among Canadian hockey-players):

It’s a beautiful example of a self-fulfilling prophecy. In Canada, the eligibility cutoff for age-class hockey programs is Jan. 1. Canada also takes hockey really seriously, so coaches start streaming the best hockey players into elite programs, where they practice more and play more games and get better coaching, as early as 8 or 9. But who tends to be the “best” player at age 8 or 8? The oldest, of course — the kids born nearest the cut-off date, who can be as much as almost a year older than kids born at the other end of the cut-off date. When you are 8 years old, 10 or 11 extra months of maturity means a lot.

So those kids get special attention. That’s why there are more players in the NHL born in January and February and March than any other months. You see the same pattern, to an even more extreme degree, in soccer in Europe and baseball here in the U.S. It’s one of those bizarre, little-remarked-upon facts of professional sports. They’re biased against kids with the wrong birthday.

Posted in Uncategorized | Tagged , , , , , , , | Leave a comment

Error bars in statistics. Do you know the difference?

And if you don’t, you’re not alone. Difficulties with use and interpretation of error bars are not only an issue for undergraduates and Malcolm Gladwell, but also experienced researchers get these wrong from time to time. The most common error bars are the range, standard deviation (SD), Confidence interval (CI) and standard error (SE). Different bars, different information.

Descriptive error bars: Range and standard deviation

Standard deviation of different data spread (same sample size)

Standard deviation of different data spread (same sample size)

Descriptive error bars tells the reader about the spread of the data. The range is simply a measure of the most extreme values (max – min), while the standard deviation is roughly the average (typical) difference between each of the data points and the overall mean. One can expect 2/3 of all data points to be within 1 SD of the mean and ~95% to be within 2 SD. The length of these bars does not necessarily change with the number of observations you have, they only say something about the spread (variability) of the data.



Inferential error bars: Confidence intervals and standard error (of the mean)

Error bars

While the widths of the 95% CI and SE decreases with increasing sample size, SD remains relatively unaffected.

Instead of sampling the entire population, we usually collect a number of random observations from the population. Why? In some cases its matter of time and cost, while other times, such as when the nurse collect blood samples, well…you’d probably prefer to have some blood left. Because we only have a subsample of the population, we can only present a sample mean. When the sample mean is presented together with either a confidence interval, or standard error, it gives an indication of where you can expect the ‘real’ mean (of the whole population (μ) to lie). The more random observations you have, the more likely it is that your mean is close to the true mean.

So what’s the difference between SE and confidence interval?
SE measures the amount of variability in the sample mean. If we did a new collection of random data from the same population, our mean would likely not be exactly the same; it would vary from time to time. The SE is a measure of how we would expect the mean to vary, purely by chance.

Confidence interval 95% CI

The top portion of this figure presents the population of scores with a mean of 50 (blue dotted line) and a standard deviation of 10. The bottom portion of the figure presents the sample means (shaded circles) and the 95% CIs about each mean (bars) for 20 independent samples from the population.

Many people (even some textbooks) gets confidence intervals wrong; you cannot say that you are 95% sure that the true mean is within the confidence limits. Suppose we compute the sample means of all possible samples of size 20 and constructed the 95% CI for the population mean for each of these sample means. Then, 95% of these intervals would contain the true population mean and 5% would not. You don’t know if the confidence interval you see contains the true mean. So the confidence interval you are seeing is just one interval from among a large sample of different CI’s for a given parameter in which 95% of the intervals would capture the population parameter.

Posted in Statistics | Tagged , , , , , | Leave a comment

Spoiler alert: 2014 FIFA World Cup winners predicted by bookmaker consensus rating

A new Econ paper predicts the FIFA 2014 World Cup winners. The technique used correctly predicted the EURO 2008 final, with better results than other rating/forecast methods, and correctly predicted Spain as the 2010 FIFA World Champion and EURO 2012 Champion.

Using a bookmaker consensus rating – obtained by aggregating winning odds from 22 online bookmakers – the clear favorite is the host Brazil with a forecasted winning probability of 22.5%, followed by three serious contenders. Neighbor country Argentina is the expected runner-up with a winning probability of 15.8% before Germany with 13.4% and Spain with 11.8%. All other competitors have much lower winning probabilities with the “best of the rest” being the “insider tip” Belgium with a predicted 4.8%. (Zeileis et al. 2014 working paper)

Download Working Paper from Faculty of Economics and Statistics, University of Innsbruck:

EDS. Stephen Hawking finds the winning formula for England:


Posted in Uncategorized | Tagged , , , , | Leave a comment

How to load data from excel into R

Like many other researchers I use excel for organizing the data, and R for analysis. Here’s a short, but useful code snippet:

# Install package
# Load package
# Or specify directory of package
library("XLConnect", lib.loc="C:/Users/YOURNAME/Documents/R/win-library/3.0")
# Load your spreadsheet (with default header = TRUE)
df <- readWorksheet(loadWorkbook('C:/Users/YOURNAME/Documents/SpreadsheetName.xlsx'),sheet=1)
# Or specify what to load
df <- readWorksheet(SpredsheetName, sheet = "mtcars",
	startRow = 3, startCol = 3, endRow = 15, endCol = 18)

Posted in Uncategorized | Tagged , | 1 Comment

The worst place to be stung by a bee

This study rated the painfulness of honey bee stings over 25 body locations in one subject. Pain was rated on a 1–10 scale, relative to an internal standard, the forearm (see Schmidt Sting Pain Index). The author, Michael Smith from Cornell’s Department of Neurobiology received approximately 5 stings per day for three months. The worst place? First nostrils, then the upper lip and surprisingly only third..the penis shaft. There is no interest in replicating the study in order to verify its findings. But hey, kudos to Michael for the willingness to get his scrotum and shaft stung by angry bees!


Posted in Uncategorized | Tagged , , , | 1 Comment

Pulling Conclusions Out of a Black Bowler Hat

“Overall, these empirical patterns suggests that we need to be less cavalier in addressing questions of human nature on the basis of data drawn from this particularly thin, and rather unusual, slice of humanity”.  Joe Henrich and his colleagues are shaking the foundations of psychology and economics. A few years old (2010), but still a goodie.

Behavioral scientists routinely publish broad claims about human psychology and behavior in the world’s top journals based on samples drawn entirely from Western, Educated, Industrialized, Rich, and Democratic (WEIRD) societies. Researchers – often implicitly – assume that either there is little variation across human populations, or that these “standard subjects” are as representative of the species as any other population. Are these assumptions justified?

Here, our review of the comparative database from across the behavioral sciences suggests both that there is substantial variability in experimental results across populations and that WEIRD subjects are particularly unusual compared with the rest of the species – frequent outliers. The domains reviewed include visual perception, fairness, cooperation, spatial reasoning, categorization and inferential induction, moral reasoning, reasoning styles, self-concepts and related motivations, and the heritability of IQ. The findings suggest that members of WEIRD societies, including young children, are among the least representative populations one could find for generalizing about humans. Many of these findings involve domains that are associated with fundamental aspects of psychology, motivation, and behavior – hence, there are no obvious a priori grounds for claiming that a particular behavioral phenomenon is universal based on sampling from a single subpopulation.

Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world?. Behavioral and Brain Sciences, 33(2-3), 61-83.

Full article:

Continue reading

Posted in Uncategorized | Tagged , , | Leave a comment