Surveying the Surveys: Bias in our Online Questionnaires?

Let’s start off on the right foot. I like surveys about hikers. I like surveys about gear. I like surveys about food. I like surveys about tents and tarps. I like surveys about trail magic. I like them all.

However, the academic in me . . . the statistician in me . . . feels the need to delve into the use of surveys among our community. I worry that novice hikers and hikers considering thru hikes place too much value on the internet surveys we launch, perhaps relying on them in lieu of compiling research from multiple sources and weighing facts independently to come to their own best conclusions. Ironically, I don’t have data to support this fear. However, as I think this blog post hurts nothing and could be of interest to our readers, I feel no compunction about putting in my two cents’ worth.

What are surveys intended to do?

Surveys are designed to measure or investigate some characteristic of a given population. The population includes everyone who fits the survey criteria. For example, when a survey asks all past and present thru-hikers for responses, the population is quite literally every single person who has ever thru-hiked a trail.

Well-conducted surveys ask the question of a representative sample of the population. What in the world is that? It’s a sample that accurately represents the population in question. In other words, a sample which has characteristics which mirror faithfully the characteristics of the populations. That’s really wordy . . . what do we need? Pictures!

This is a population:


A representative sample would look something like this:

A non-representative sample would look something like this:

Let’s talk about the population of thru-hikers. Heck, let’s just narrow it down to AT thru-hikers from 2016 because I am one and I feel comfortable throwing out some generalizations about this population. We included: solo-hiking males, solo-hiking females, parents with children, couples, seniors, recent college graduates, veterans, mid-life crisis hikers, survivalists, NOBOs, SOBOs, flip-floppers, glampers, minimalists, photographers, neophytes, experts, handicapped hikers, professional hikers (hey there Hiking Viking! It was super cool meeting you!), and wow – that’s just off the top of my head. My point is that we were (like every class of thru-hikers is) an incredibly diverse group of people from myriad backgrounds and walks of life.

Are our samples representative of all thru hikers?

Now – when I post a survey on a Facebook or on The Trek – my question to you is is that a representative sample of thru-hikers? Is a representative sample of thru-hikers in our Facebook groups? Are they on The Trek? Are they on Are the surveys truly capturing a picture of the population they purport to represent? And my fear, the raison d’être for this article, is that they don’t.

So what?

That’s the next question, right? SO WHAT?

Bias can impact our results.

Well, I’ll tell you what. It means that these surveys are potentially full of bias. Bias comes in many forms, but boils down to the fact that our survey results don’t represent the results they purport to represent. There are dozens of types of bias, so I’m just going to talk a bit about those to which I worry we are the most susceptible.

  • Undercoverage: This is systemic under-sampling of a particular demographic. As a fictive example, let’s imagine that the minimalist hikers also eschew technology. Questionnaires launched only online, then, will always under-represent this group of hikers and therefore not accurately reflect our population.


  • Nonresponse: Some people just don’t respond to surveys. They don’t like them, they don’t care, they don’t think their opinions are important – whatever. We must therefore be comfortable making the assumption that, when we’re collecting survey data, our non-respondents do not differ from our respondents in any marked way. Unfortunately, there’s not a lot we can do about this one, but it still bears keeping in mind.


  • Voluntary Response: This is the other side of the same coin where we found non-response bias. Volunteers have a well-documented history of having more ardently held opinions than non-respondents. Just think about this one in terms of politics for a second. Who votes consistently and loyally in every election? Voters (of both parties) who fervently believe in a specific position upheld by one party or the other, or in a particular candidate for whatever reason. In terms of hikers, gearheads are most likely to respond to gear surveys, hikers who experienced an injury may be most likely to respond to questionnaires about health on the trail, hikers with dietary requirements may feel more inclined to answer surveys about food procurement and nutrition on the trail. These are all suppositions to illustrate the point that when a survey is posted in one of our groups, it is likely that the same group of users is responding to them and this may be skewing results.


  • A few other pitfalls await the brave soul who posts the next survey. Among these is insuring that you’re not asking leading questions. Leading questions subtly prompt survey-takers to respond in a certain way.


  • Another potential difficulty is designing surveys such that the “social desirability” factor is mitigated. For example, if you propose a survey which is not anonymous and you ask thru-hikers how often they showered on the trail, Susie-Thru may well tell you that she found herself a nice hot shower every week because she doesn’t want to tell you that, really, she only showered once every three weeks. She’s worried that if she tells the truth, she’ll be a stark outlier of dirtiness and judged negatively for how little she showered on the trail. Multiple this by dozens of respondents and you’ll see a movement towards socially acceptable answers and away from (in this example) the dirty truth. Keeping surveys anonymous or aggregating data is often a tidy way to mitigate this problem.


  • The last danger I want to mention here is researcher bias. Are you, the author, inadvertently letting your opinion creep into your survey? Or, as you write up results, are you inadvertently placing more emphasis or weight on results you favor or hypothesized to be true? The best way to avoid this is to either have a qualified editor (e.g., someone who knows as much as you do about tents or shoes or whatever you happen to be writing about) re-read your first draft and comment on possible bias, or to re-read yourself very carefully (and critically).

We need to recognize bias to mitigate bias

Bias in any of these forms is problematic if we don’t recognize it and address it. There are fancy mathematical ways to tackle these problems, and intense survey methods used to insure a representative sample. But how do we mitigate these problems realistically? Short of standing on the trail and peppering current thru-hikers with questions as they pass of you, short of going to Trail Days and eliciting responses from every hiker you meet, short of constructing the perfect survey, what we can do is to recognize the group that has been surveyed when presenting survey results. Let’s present surveys posted on The Trek’s Facebook pages as having surveyed The Trek’s Facebook group users. Let’s present internet surveys as just that. Let’s be clear to our readers that (most if not all of) our surveys do not represent all thru hikers or all thru hikers’ opinions. Let’s recognize the fact that thru hikers are a diverse community with myriad opinions which might not all have been captured by our online survey techniques. Let’s encourage the readers of our surveys to conduct their own research – especially when it comes to gear and equipment – in addition to making use of the information the surveys have gleaned.

I’m going to re-iterate here – I like our surveys. I participate in our surveys. I read your survey articles and enjoy them. I just hope that this short post better illuminates some of the potential problems of writing and analyzing surveys and encourages all of our readers to keep thinking critically about our data and findings.

