Are women Nobel Prize winners younger than men on average?
When our impression and data collide …
The story:
The motivation of this analysis comes from a random comment I heard after the Nobel Prize announcements 2020. The comment was that women seemed to win Nobel Prizes at relatively younger ages than men. Then they listed a few female winners of recent years as supporting evidence, including ‘Malala Yousafzai’, who won the 2014 Nobel Peace Prize at the age of 17, and this year’s winners for Physics and Chemistry. My first reaction was that this was a hand-wavy statement based on cherry-picking evidence, hence unlikely to be true.
Before you cast an opinion of your own, allow me to digress to a concept in machine learning. I am digging neural networks (a branch of machine learning) and artificial intelligence (AI) lately. The holy grail of AI is to build human brain-like machines. However, our brain is not a perfect neural network. It suffers from high bias (prejudice; models too simple for the data) and high variance (overfitting; models too complicated for the data). There is an essential step in machine learning called ‘cross-validation’. A machine learning model built from a training set must be tuned and validated on the ‘cross-validation’ dataset before being applied to new datasets. This validation step anchors the neural network to the ground truth so that the model has minimal bias and variance. For those interested in the basics of cross-validation, I refer to Andrew Ng’s Machine Learning course (week 6) and this podcast.
Do we have a validation step when we use our brain? It certainly isn’t enforced, otherwise we would not live in a world plagued by bias and misinformation. Using our brain without a validation step is scary. For example, if we have a bias on something (e.g., racism), increasing the size of the training data (aka life experience) would not help us see the truth. On the other hand, if we overfit something (e.g., over-reacting), our thoughts would fit a contrived scenario but fail to reflect the general reality. Intriguingly, or scarily, this is where AI can outperform human beings, because the built-in cross-validation step may get them closer to the ground truth of the universe… muahaha.
I digress too far. This article is not about neural networks or The Singularity. It is simply an example of doing a cross-validation step before going with our ‘impressions’. Spoiler: the answer I get at the end of this analysis is not what I have initially guessed!
Dataset:
I compile a dataset for Nobel Prize recipients from 1901 to 2020. The data for 1901–2016 are collected from a project on Datacamp’s data scientist career track and the data for 2017–2020 are collected from the Nobel prize website (www.nobelprize.org). I also use another dataset for the average life expectancy of different countries of the past few centuries from Our World in Data.
The analysis is done in python, using toolboxes such as pandas and seaborn. You can download the full code, Jupyter notebook and data on Github and run the analysis yourself.
Load the data: Jupyter notebook example here.
Data visualisation: overall
I put the recipients’ age as a function of the award year on the same plot (Figure 1). This quick overview plot helps us decide what kind of analyses are needed to get the answer.
My take-away points for Figure 1:
- The recipients’ age is correlated with the award year. The average age of Nobel Prize recipients has increased from ~45 years old to ~60 years old. Is this owing to the increasing life expectancy with time? To gain insights, I overlay the growth of the average life expectancy in the recipients’ birth countries on Figure 1. This analysis does not focus on the effect of life expectancy so I will just leave it there.
- The data are dominated by male recipients (see also Figure 2–3). Data points for women are subject to small number statistics. We should keep this in mind throughout the analysis.
- The shaded areas for the blue (male) and orange (female) groups overlap, meaning that the age vs year correlation of the two groups is not different at a 95% statistical confidence level. However, this doesn’t answer the question posted in the article’s title yet. We will dig deeper below.
Skewed data and by category:
The data is skewed towards the male gender. Six percent of the total recipients over the past 120 years are women (Figure 2).
To explore the data a bit more, we can divide the data by the award categories (Figure 3). The STEM fields (Chemistry, Medicine, and Physics) are shown in the top row.
Over the 120 years’ history of the world’s most prestigious scientific award, there are only 4 women recipients in Physics, and 2 in Economics. The fraction of women in each category is less than 5% in the STEM fields (including Economics). For Literature and Peace, the fraction is in the low range of 12–14%. These are well-known facts among the scientific community, but seeing them in figures gives me a refreshed shock. These low numbers are relevant here because small numbers set the limit of our analysis.
The positive correlation between the age and year is present for most fields except ‘Peace’, which has a negative correlation. If we exclude the ‘Peace’ category, Figure 1 would look like this (Figure 4). The difference between the Male and Female groups is now less pronounced. We can treat the Peace category separately (Table 1 below).
Statistical tests:
Pearson test for the correlation. To quantify the statistical significance of the age vs. award year correlation for each category, we can use Pearson tests. The Pearson coefficient r ranges from -1 to 1 and is sensitive to outliers. A perfect positive (negative) correlation will have r = 1 (-1), and 0 means no correlation. The p value indicates the statistical significance of the null hypothesis that there is no linear correlation between the age and year. The smaller the p, the more significant the correlation.
The Pearson tests confirms what we have already seen in Figure 1, 3 and 4 statistically: that the correlation between the age and year is significant, and the Peace category has a negative correlation. When dividing the Peace recipients into Male and Female groups, the statistical significance of the linear relation becomes too weak to draw confident conclusions (based on the p-value).
A few words about the p-value. The p value is widely used in statistics. It is a popular index in evaluating machine learning models too. It is easy to calculate, but does not say much about the real uncertainties of parameters (that is where the more computationally expensive Bayesian statistics shine). The debates about the use of p value in hypothesis testing are never ending. I find it super useful to have some intuitions about p so that you can stand your own ground in those debates.
Here are my intuitions. p stands for probability. A p value of 0.07–0.12 is akin to a probability of getting the same side of a coin 3–4 times in a row in the coin-tossing game; it is uncommon but possible in the statistical fluctuations of our Universe. A p value of 0.01–0.03 is like landing the same side of a coin 6–7 times in a row, which is bizarre and begs for an investigation. This is why many people choose a p value of less than 0.02 or even 0.005 as a threshold to claim a signal or reject a null hypothesis. Of course, if you are looking for those extremely rare events in the Universe, as rare as throwing a coin 20 times and get the same side 20 times, then, feel free to set an extremely low p value (0.000001) to claim your signal. The ‘flexibility’ in threshold of p is where most debates about p values come from. But the little p is a powerful number when it is used in the right context. I recommend this recent paper and podcast for good insights about p.
Kolmogorov–Smirnov (KS) test for the age distribution difference. The effort so far is to establish the correlation between the age and year, so that we can answer our question most intelligently. However, if we want the simplest answer, and ignore the dependence of age on the year, then it is only a one-step analysis.
We can use the p-value of the KS test to evaluate the statistical significance of the null hypothesis that two age distributions are from the same parent sample. When p drops to a very low (<0.005) number, we reject the null hypothesis.
So the simple answer we seek is in Figure 5, where the age distribution of the male and female recipients are shown. The KS test yields p=0.3655, meaning no difference in the two distributions. Using the coin tossing analogy again, a p=0.36 is as random as throwing a coin 2 times and getting the same side 1 or 2 times. So, we can not reject the hypothesis that the age distribution (including the mean and the spread) of the male and female groups is the same. If the title of this article is more precisely phrased as ‘Are women recipients younger than men for the sample of all Nobel winners in the past 120 years?’. Then the answer is no.
But wait. Didn’t the commenter in the story used data from ‘recent years’ as evidence? Maybe the age distributions have changed in recent years? We can loop over the years and plot the KS p-value as a function of the starting year of our sample (Figure 6). Indeed, the male and female age distributions start to differ significantly after the year 1990.
So if the comment was that “women seem to win Nobel Prizes at relatively younger ages than men in recent years”, then it is supported by this statistical analysis. And the recent years count from 1990. Owing to the small numbers of women recipients, we do not divide the years further into more subsamples.
There is one last piece to figure out. The mean age of the female group (58) is smaller than that of the male group (66) (Figure 6). How significant is this mean? The p-value evaluates the distribution function as a whole; it does not yield a specific probability for the mean. This problem can be phrased as what are the odds of a randomly selected woman being younger than a randomly selected man from the 1990–2020 recipient pool. We can get this number by randomly selecting the recipients from the distributions in Figure 6 tens of thousands of times, and count the number of times that a woman is younger (algorithms can be found in the GitHub code). As I am bored of my own writing, I think I should just show the numbers in Table 2.
Conclusions:
The answer to the title of this article is no. But if we add an extra condition to the question, “Are women Nobel Prize winners younger than men on average in recent years?” Then the answer is yes (I didn’t see this coming), and the odds of that yes are summarised in Table 2.
Personally, I am not sure how much value there is in asking this question or writing this article. Two valuable things that I get from this analysis: 1) I should not reject a claim based on how hand-wavy it is, 2) The only useful statistics in the Physics category is that I cannot do statistics with 4 women (3 individuals as Marie Curie won twice) vs 206 men. Science knows no gender. We have such a long way to go.
Thanks for reading, and feel free to comment and cross-validate!