Size Matters

Merle van den Akker
Jan 28, 2021
4 min read

Whether you like the innuendo or not. Size matters. And I'm referring to sample size, you dirty dog.

In my previous article I outlined that a lot of findings within behavioural science have gone bust, rather than having been robust (give me credit for this pun please). What predominantly came out of this article was that a lot of people had been fooled by shady scientists, trying to sell books. Shame on them. Now besides there being shady people everywhere, not just in behavioural science, there’s also people who do research with the best intentions. They have a set methodology, test it, and find some really cool things and as a result, are likely to publish that research on the basis of the coolness of the results. Coolness sells and this research will soon be picked up by other academics, practitioners and people who want to sell you books, courses and consultancy. Now if you’re an avid reader of mine you know where this is going. Even before I dropped the “BS” article, I wrote about loss aversion, and the difficulty of it replicating on a larger scale. Because that’s just it. One of the most fundamental assumptions of behavioural science doesn’t (always) hold up to scale. Guess what? Size matters.

When it comes to sample sizes, it matters how many people you test. I look into personal finance myself, how people spend money, and I don’t leave the house for a sample under a thousand. Unless you’re doing repeated observations, than the sample is measured in observations rather than participants, then you can have fewer participants, as long as there’s plenty of observations, again preferably above a thousand. Now a lot of older studies didn’t exactly test 1000 people. They tested approximately 50 (if we’re lucky) students in the lab (which is problematic not just size-wise). Their entire findings are based on a group of 50 people. Hmmmmm… What happens when you increase sample size? You increase certainty that you result isn’t just based on a whole bunch of outliers. Let’s exemplify this: I’ve got a group of 100 people. Twenty are Christian, thirty are Muslim, ten are Jewish and the remainder (forty) is atheist. I want to figure out how this group of people, without knowing who is who, feels about the colour purple (nice and non-controversial). Now let’s blatantly assume that religion influences the participant’s individual perception of the colour purple: Christians hate it, Muslims love it, the Jewish prefer it to blue and the atheists don’t give a damn. If I sample only 10% of this group, without replacement, so 10 different people, it is in fact possible that I sample only the Jews. So then my idea of this entire group is that 100 people think purple is preferred to blue. If I sample 10 people they can also quite easily end up being only from the majority: atheists. Suddenly my idea is that the entire group of people don’t give a damn about purple. Might as well take it out of the damn rainbow then. This example is obviously very simple. And there are ways around it as well: taking a representative sample. If I were to test only 10 people in this group, and I knew the divisions beforehand, I just had to pick 4 atheists, 3 Muslims, 2 Christians and 1 Jew. That’s (sort of) fine. But this ignores yet another important issue: heterogeneity within groups. Just because your Christian, that doesn’t automatically mean you hate purple. And this goes for all the other blanket statements made as well. And suddenly you need to subdivide the earlier categories further: you need to subset for other characteristics: age, gender, level of education, nationality, income, etc. etc. And then you need to know those divisions within the population (e.g. 20% is above 65 years of age, so out of a group of 500 you’d need 100 people to be over the age of 65 etc.), and divvy them up accordingly. As a result, to make sure we get a somewhat accurate representation of what’s going on, your initial sample size of 100 has increased itself to over 1000, just to fit in a representative of each category combination you could think of. And you’re going to need more than one representative…

So that’s what a large sample size is meant to do: account for variations and representativeness within a population. Now I’m not saying testing larger samples is the be-all-end-all of behavioural science, but it is most definitely more desirable than testing 50 white, undergraduates in a lab. Ideally, we would test people from all walks of life. So not just participants that are WEIRD (people from Western, Educated, Industrialized, Rich and Democratic countries). Because raising the sample from 50 to 50.000, whilst only testing rich white men, is going to tell us absolutely nothing about the decision-making processes of those who are poorer, non-white, or non-male, or a combination of those characteristics. It will, however, make us more certain about the processes of the men. So that’s something, I suppose.

The increasing of sample sizes is one of the (according to me) best things to come out of data science. We’re finally able to deal with a wealth of data, from a wealth of people. And this is great. This will give us so much more insight, into so many more different people, but also into people as a whole. It will show us where decision-making processes overlap, and where they don’t. And that findings might hold for one group, and not for the other. That’s interesting stuff. That’s worth researching!