Issues with Self-Reported Data

Merle van den Akker
Dec 16, 2021
5 min read

How do you test if people are adhering to their medication? What their beliefs are about certain political issues. How often they exercise per week, if they exercise at all. Or how much alcohol they consume per week? Well, you ask them. Don’t you? A large part of behavioural scientific research is grounded in simply asking people about their behaviour. Sure, there have been technological advances that allow us to see if people are actually telling us the truth. Expenditure trackers, Fitbits, social media data scraping and the likes allow us to observe behaviour directly, without the “interpretation” of the participant of their own behaviour playing a role. And that can be very helpful. Issue is, you can’t do this with everything. Not to knock data scraping but I don’t expect someone trying to reduce my alcohol consumption, behavioural scientist and doctor alike, to scrape all my social media platforms to find out if I did or did not have a double G&T that week (chances are, I did). Neither can you do this for finding out everyone’s political opinion and intention to vote. How often have you seen that the intention to vote said one thing, and then the actual voting results came out and we were presented with a whole different outcome?! Sometimes even technology doesn’t have the answer. Sometimes, you just have to ask. And hope people tell you the truth, to the best of their ability. And that is, unfortunately, a bit of an issue.

I’m writing this article as a dear friend of mine is currently in the business of working with self-reports and seeing if they’re remotely valid, in terms of medicine adherence. And let me tell you, this is fraught with difficulties. With certain behaviours and interventions, people know exactly what “the right” answer is. We know what is socially desirable. There’s even a bias named after it: the social desirability bias. So this bias refers to the tendency of people to give socially desirable responses instead of choosing responses that are reflective of their true feelings. The bias in responses due to this personality trait becomes a major issue when the scope of the study involves socially sensitive issues such as politics, religion, and environment. Or much more personal issues such as drug use, cheating, and smoking. We know that people in groups, online or offline, often become more extremist as they try to adhere to the ideal of a group. Social desirability works similarly. We do not condone cheating, so when asked about it, no one condones cheating. Cheating happens all the time, so clearly some people do (sort of) condone it. But no one is going to come out and say that… Solutions have been proposed to deal with the social desirability aspect of survey responses. A socially desirable scale, yes, it’s literally a scale measuring how prone you are to the bias, can be incorporated into the survey. However, you can ask yourself how much you need to adjust a participant’s answers when you do find that they are very prone to the bias.

So there is the idea that people alter their answers based on what they think the right answer is. These types of answers aren’t really lies, they’re just a small adjustment of the truth. Anchored in our true value, we move it down or up a bit, to make it sound better than it is. Most people do this. We run into bigger issues when we’re noticing that participants are coming out with some very interesting answers, especially when their answers can be validated through other means (saying you exercise five times a week but your Fitbit says otherwise…). It’s unfortunately also entirely possible that participants are straight up lying. Lying in surveys can happen for a variety of reasons:

One reason can be extreme embarrassment about the truth, where a slight shift up or down doesn’t even come close to the desirable answer. You can see this with people who have to track their own calorie intake during a rigorous weight loss program. They might claim they only eat 1500 calories per day, knowing damn well the truth lies closer to 1500 calories per meal. In this case the truth obviously comes out as no weight is actually being lost: other sources of data (in this case just looking at a person and weighing them) can validate the truth of their responses. Or in this case, invalidate the response.
A second reason for lying can also be not knowing – or any form of ignorance. Maybe the person in the weight loss program is genuinely convinced they only ate 1500 calories, because they are tracking their calories wrong. Often, technology can help out here as well. But even food tracker apps can only tell you the calories of the foods you logged into them. Research by Abby Sussman et al. has also found that people don’t count exceptional expenses: splurges, both in terms of food and spending, are not part of the normal “every day” behaviour so they aren’t counted. To exemplify: someone on a strict diet wouldn’t count the one BigMac they had that week, because they normally don’t eat BigMacs. So it doesn’t count. Issue is, it does.
A third reason for lying can be carelessness. Does the participant actually give a damn about your survey? Or are they just rushing through it to get some money, maximizing their reward/time ratio by just clicking the first option every time and writing one word answers, if they don’t just put down gibberish. This is why a lot of surveys, especially online surveys, have attention checks in them. If you answer the question: “what colour is the sky today?” with either selecting the first multiple choice option (“red”) or just jam your keyboard (“qetb”) to continue, you’ll be filtered out of the analysis. Issue is, if you filter out a lot of people this way you might have a very small sample left. On top of this, it’s likely that you’ll have a sample left that cares strongly about the issue at hand, which is likely not representative of the population as a whole, skewing your results.

Even if people don’t know what “the right” answer looks like and aren’t in the business of lying to you, there is a bigger issue with self-reports, but also with research in general. Participants, unless randomly selected out of the population of a whole (say country, region, continent, company), can be just as biased. In line with the third reason for lying, some people deeply care about certain issues. So if there’s surveys about that topic coming out, they’ll be much more likely to fill them in, and fill them in (semi-)truthfully. Luckily, survey platforms such as Prolific Academic allow you to collect data from a population that is representative for the country you’re in (or want to collect data from). This does go a long way to solving the issue, however another remains: to get to this representative sample, they need to have a Prolific account. Meaning, they are aware of this survey platform existing, and are happy to participate in surveys against compensation. And suddenly, we’re back to where we started.

I don’t propose any solutions for these issues. I don’t work with a lot of self-reports because I know that all these issues are rampant, especially within my niche: personal finance. No one wants to admit to having broken the bank for a bag they couldn’t afford by a long shot. There is a large literature out there discussing these issues, proposing solutions, testing those solutions and caveating their implementation. I just wrote this article to make you aware of these issues and hopefully direct you to further literature on this topic, as it is a topic that warrants further study and discussion!