A while back a meta-analysis on “The effectiveness of nudging…” was published and it took off, after it received a reply attacking its main conclusion. Now I’ve written a post about the “Death of a Nudge” already, which I’d be very happy for you to read. That post focusses almost exclusively on the topic of nudging and its role in behavioural science, however. What I want to talk about today is a bit different. From the sh-t-storm that quickly spiraled on Twitter it became rather apparent to me that most people don’t seem to understand how to read a meta-analysis. Is it possible, theoretically, to have two meta-analyses reaching opposite conclusions? Yes it is. How? Let me explain.
A meta-analysis is a systematic review of all the existing work/results out in a particular field, in a particular topic/method (or sometimes both). I’m going to stick with the example of the recent nudging meta-analysis, because it’s in my field and nice and fresh in my hippocampus. Meta-analyses tend to be structured in a way where the punchline is upfront, and the method, the actual meat of the paper, is at the end, sometimes even only presented in the supplementary materials. This “nudge meta” was no exception to that. There’s two quick paragraphs on the method before diving into the results. The only relevant two things these two paragraphs highlighted was the sample size (n>200) and the main questions they were trying to answer. There was nothing more. After the results, discussion and conclusion came the methods and materials section. This section told you most of what you needed to know. In this section we find the frameworks the authors used. Here specifically, we find references to Siddaway et al (2019) and Moher et al (2009), with the former being the guidelines for conducting systematic reviews and the latter being the Preferred Reporting Items for Systematic Reviews and Meta-Analyses standards. This is about half of what you need to know, because it explains how the paper is written, what is reported, and how the authors went about it. If you want more information on meta-analysis, both on how to read and write them, I do really recommend the Siddaway paper. However, unless the authors of other meta-analyses used radically different frameworks (unlikely) you still don’t tend to end up with a different conclusion. What really determines the outcome of a meta-analysis is the sample you end up working with.
In the nudge meta the authors ended up basing their analysis on 212 papers. However, those 212 papers are a result of a search and then a cull. The search is very well explained in the methods section. They mention the databases used and which keywords were entered into all of them. Whatever results popped up where probably scraped and aggregated in a lovely Excel file. This lead to 9,934 papers found. They also looked for unpublished work, and here the selection gets a lot more vague. To the extent where a direct replication would be difficult, if not impossible. Which is just bad science. Regardless, this lead to another 617 papers and results to analyse. Now the fun stuff: the inclusion/exclusion criteria. The authors excluded work as follows:
Studies that were published no earlier than 2008;
Studies that empirically tested one or more choice architecture techniques using a randomized controlled experimental design (RCT);
Studies that had a behavioral outcome measure that was assessed in a real-life or hypothetical choice situation;
Studies that used individuals as the unit of analysis;
Studies that were published in English;
Studies that examined choice architecture in combination with other intervention measures, such as significant economic incentives or education programs, were excluded from our analyses to isolate the unique effects of choice architecture interventions on behavior.
If you’re thinking that this was a neat list in the method section you’re partially right, but some exclusion criteria got mentioned a lot earlier on, so I still had to kludge this together. Again, not great. Now if you want to see what the impact of these exclusions are, you’re up against a hunt and a half. The exclusion criteria of this meta can only be found on page 5 of the supplementary materials, which is a .pdf file which can be downloaded from the supporting materials section. If you’re not keen on a scavenger hunt, see the picture below.
I think, after kludging this information together from different parts of the paper, I wouldn’t struggle to replicate this specific meta. It would be doable. Bit of a pain, but doable. If you didn’t trust their statistical analysis of the 212 papers that were left, they do provide a list of which papers they ended up analysing from page 17 of the .pdf file onwards. So that should be replicable. Which it is. And that is also what happened. A reply was written to the 2021 nudge meta, also published in PNAS. This reply didn't change sample, it changed analysis. Instead of using Egger's regression as seen with the original work, the authors in the reply used "a newly proposed bias correction technique, robust Bayesian metaanalysis (RoBMA)". From that direct quote you should be able to gather what the reply is attacking: the level of publication bias associated with nudge. Running RoBMA, the authors find an even lower effectiveness of nudge, when correcting for publication bias. This is entirely driven by the analysis used. What does this mean for replicating meta-analyses? Do I expect to be able to get the same results as the authors did if I use the same approach, same sample and same analysis? Yes. But as soon as one of these changes, we’re doomed.
A while back I wrote a post on another meta-analyses showing that teaching people personal finance does work. Now within the behavioural finance domain this is quite the controversial topic, as it seems that every 5 years we’ve got a new meta-analysis rejecting the findings of the prior one. It seems that nudging will get the same treatment. How is that possible? Now that we’ve talked through the approaches and method (sample selection) of meta-analyses, you ought to know the answer! Change your approach, change your outcome! Now the “lucky” thing with the personal finance metas are that they kept using the same sample criteria. So that was consistent. However, a lot of research can be conducted in 5 years, and so two metas conducted several years, if not a decade, apart can find radically different things. Just because the sample grew. Same criteria, different sample due to the growth in interest in the topic. As it happened for this respective personal finance the sample more than tripled compared to the previous meta. Of course, the same goes for the analysis. As newer models and methods become available, it'll remain to be seen what does, and doesn't stand the test of time.
Most of the nudging metas do suffer from this temporal aspect as well, however, they also run into the issue of having different sample criteria and now also different analyses. And as a result, they aren’t comparable. What now?! If we moved away from metas for a second and started discussing this for a different kind of study, let’s say a very simple dictator game amongst different kinds of people (young rich kids vs. poor seniors) we’d not be remotely surprised if we end up with completely different results. We’d almost expect them. Why? Different samples. Duh. This is also why WEIRD has been such an issue for behavioural science – it over-extended theories we had to cover much more of human behaviour than it should have. So is it shocking to have two meta-analyses on the same topic claim opposite results? No. Once you understand how a meta works, and that it’s all in the framework, exclusion criteria, and methods of analysis, there’s no real surprise to be had. And that is exactly how to read a meta. You can read the result first, no issue at all. But then it’s time to start digging into how they got there. And keep in mind, science is as objective as the people practicing it.