Dealing with demographic data in language education research

Every time you are asked to complete a survey, chances are that it includes a section with questions that elicit demographic data. In language education research, these demographic questions usually ask participants to provide details about their age, the composition of their family, their parents’ jobs and income, and so on. Perhaps because these questions are so common, researchers tend to think of their design as a relatively straightforward task — and as is often the case in educational research, this is a rather dangerous assumption to make.

When I was vetting such questionnaires in my role as research supervisor at the Experimental school where I used to work, what struck me was that, although most student researchers could speak intelligently about all the items in the surveys they had designed, they seemed unable to provide a rationale for the demographics section. All too often, I was told that this section was there ‘because all questionnaires must include one’. This was a disappointing response, not only because the purpose of any student research project must be to help students learn the hows and the whys of educational research, but also because it leads to many avoidable mistakes in data collection.

So, in the interest of doing things just a little better, this post looks into the demongraphic sections of questionnaires. The post consists of three parts:

  1. First I discuss the purposes of generating demographic data;
  2. Then, I point out some possible pitfalls;
  3. And finally, I offer some suggestions that could help to improve practice.
Embed from Getty Images

What’s the demographics section for?

There are two primary reasons why a researcher might want to collect demographic questions in a questionnaire survey:

  1. because demographic data is needed to answer the research questions; and
  2. because they help to compare your findings with other studies.

Answering your research questions

In some studies, answering the research questions requires demographic data. For example, a researcher might be interested in finding out how family income or parental educational attainment impacts learning. Or she might want to understand the relation between teacher qualifications and teaching style.

When such explicit connectins exist between the data collected and the expected findings, that’s great. What is less great is a tendency to collect any data that might conceivably be relevant, in the hope that an effect might show up in analysis. Such an approach is very often a waste of respondents’ and the researchers’ time. Not only that, but (with the exception of exploratory studies), hypothesising after the results are known (or HARKing) is also frowned upon as a questionable research practice.

Making your findings comparable

Even if demographic data are not used in the analysis, researchers collect and report data about their sample in order to better understand how their findings compares to other studies. A new study can confirm an older one if it finfs similar findings in a similar population. If similar findings are discovered in a different population, it can extend the generalisability of a hypothesis. If results are different, knowing about the samples of the studies could help us to hypothesise why this might be the case.

In all these cases, where researchers are interested in a summary description rather than individual responses, it makes sense to check whether such data is already available. For instance, it might not be necessary or efficient to ask every participant about their family income, if you already know that the school’s catchment area is a middle-class neighbourhood.

Such data are surprisingly easy to find: In our school, for example, summary information about the demographics of the students (age groups, family size, parental education, and family income) was available to researchers upon request. Useful information can also be found in census reports, commercial databases such as ACORN, or, perhaps, from the local education authorities. Published research which explicitly describes the demographics of geographical areas might also be found in the literature. In addition to saving time, using information from such sources ensures that the data are easier to compare across studies. So, in short, before embarking on data collection, just ask!

Embed from Getty Images

What’s wrong with collecting demographic data from scratch?

There are three potential problems associated with demographics sections in a questionnaire survey, especially if they haven’t been constructed with due care:

  1. they risk alienating the respondents;
  2. they generate respondent fatigue; and
  3. they create possible liabilities.

Let’s take a closer look at each one.

You do not want respondents to get upset

When personal questions are included in a survey, especially at the beginning of a questionnaire, they risk unduly alienating or alarming respondents. Even when the usual reassurances about confidentiality and anonymity are provided, some respondents may be reluctant to share information that they consider sensitive, or information through which they feel that they might be identified.

In my experience, information about family income is considered sensitive by many Greeks, and students often avoid answering them. Other respondents may be uncomfortable sharing information about their family status, religious affiliation, languages spoken at home, etc. Asking such questions has a way of creating distrust, and should be handled tactfully.

You do not want respondents to get tired

No matter how much time and energy you are willing to invest in analysing as much data as you can get, your respondents, especially young learners, are very unlikely to match your enthusiasm. On the whole, it is a bad idea to waste their time, energy and good will, by making them fill out long forms with information that may not be strictly necessary. Long demographics sections cause respondent fatigue, which means that respondents might either quit the questionnaire before it is completed, or —even worse— they might engage with the last sections in a very superficial way (e.g., by selecting the same answer in all items).

You do not want respondents to get suspicious

Finally, demographics sections risk making respondents more self-aware, or even identifying them. This is especially true in small-scale surveys, such as the ones typical of student research. In the last school where I was employed, I was the only male MFL specialist, so once I had answered the demographic questions, I had to self-censor my responses because I knew that anyone who read the questionnaire would be able to match the answers to my person. I eventually learned to strategically omit responses to the demographics section, so as to be able to answer the rest of the questionnaire with more freedom. Still, this is a problem that could be solved more easily if the researchers were more discrete.

It has also been my experience that many student researchers seem blithely unaware of the responsibilities involved in collecting personal and sensitive data. They also tend to lack the experience and resources to comply with legal requirements for processing such data. The type of data that we generate in educational research is not usually very sensitive, but a data breach is always problematic, and any complaint is likely to set your research back for a long time. In this sense, collecting more information than one needs seems like an unnecessary risk.

Embed from Getty Images

What can we do to improve the demographics section?

From the paragraphs above, it should be clear that you had best avoid collecting information that you don’t strictly need. When it comes to collecting information that is indeed necessary, here are some tips:

  • Avoid embarrassing the respondents: For example, some respondents may feel uncomfortable placing themselves in the highest age-group, or the lowest educational attainment group. You can easily avoid such problems by adding more possible responses to your questions: For instance, I often recommended that the questions asking the respondents’ age included options ‘51-60’, ‘over 60’, rather than just ‘over 50’, so as to avoid putting respondents on the spot.
  • Allow respondents to opt out: Respondents should always be given the option of not answering any or all the answers in the demographics section. At minimum, a ‘prefer not to say’ option should be included in every item. You may also want to include a statement reaffirming consent at the top of the demographics section. Here’s a possible format: “This section asks questions about you. This information is necessary [insert brief justification]. The data you share with us will not be used to personally identify you, and will not be passed on to anyone else. If you prefer not to answer these questions, tick the following box …”
  • Place the demographics section at the end of the questionnaire. This will help to minimize the effects of respondent fatigue (see above).
  • If you only need the demographic information for descriptive purposes only, i.e., if you do not plan to analyse it in conjunction with other questions in the survey, consider placing the demographics section on a separate page. This can then detached from the rest of the questionnaire and analysed independently from the other data as an additional anonymity safeguard.

In summary, demographic sections in questionnaires should be designed on a strict ‘need-to-know’ basis; alternative sources of data must be considered before personal or sensitive data are collected; and their format and sequencing needs to be such that it does not impact other sections of the questionnaire.


I hope that the information in this post has been helpful. If you are working on a student project which involves the use of questionnaire surveys, you may also want to take a look at other posts of this series. These examine the wording, and bias, structure and sequencing of questionnaire items, describe how scale items can be used to elicit information, and give some tips on overall questionnaire layout. If you have additional questions that were not answered by these posts, feel free to drop a line!

Good luck with your project and feel free to use the social sharing buttons below to share this content with any fellow students who might find it useful!

3 Replies to “Dealing with demographic data in language education research”

  1. very interesting topic…add more information on how to present background of study/research. thank you.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.