Welcome! Chances are that you landed on this page looking for information on Likert scales, ordinal data and the best way to make sense of them. If that is the case, you will probably be want to skip directly to the part of this post where I talk about a common mistake people make with ordinal data and mean values. You should also take a look at the list of additional resources. If you have time to kill, feel free to read the other sections as well. Or not; just go ahead and crush my feelings :p
Contents of this post
Some background
I was prompted to write this post while reading an edited collection on applied linguistics which I was asked to review. I have published substantive comments on the book elsewhere (it’s a good book), but we linguists are not natural statisticians – if anything most of us are in this job because we were bad at STEM at school. So it’s perhaps not surprising to find a common statistical mistake in one of the book chapters. At the time, I felt shocked and somewhat livid (how can there be a mistake in a published book?), but it’s perhaps more productive help explain how do things in a better way.
Specifically, what sparked my interest was one study in the collection, which used Likert scales to record participants’ attitudes towards a certain educational construct. Those who are not familiar with the fascinating minutiae of quantitative research can find a discussion of Likert scaling and ordinal data in the section that immediately follows. Those of you who are unlucky enough to have studied statistics may want to skip to the next section.
Let us go then, you and I...
Likert scales and ordinal data
What are Likert scales?
A Likert-type question (or item, to be precise) asks respondents to select one of several (usually four or five) responses that appear in order of strength. Here’s an example:
Indicate what you think about the following statements using the scale below:
(1) Strongly Agree; (2) Agree; (3) Neither agree nor disagree; (4) Disagree; (5) Strongly Disagree
| a. Apples are rubbish | 1 | 2 | 3 | 4 | 5 |
| b. Yoghurt is my favourite food | 1 | 2 | 3 | 4 | 5 |
| c. Beans are evil | 1 | 2 | 3 | 4 | 5 |
| d. Fish fingers and custard taste great | 1 | 2 | 3 | 4 | 5 |
Each of these items measures a variable, i.e., a construct about which we want to learn more. Sometimes, we might want to disperse sets of similar items across the questionnaire. This helps researchers to probe different aspects of the same construct (or ‘latent variable‘), by putting together information from all the related items. I will not go into any of this in more detail here, but if you want to find out more, this post has some additional information.
We frequently use Likert scales when we want to measure constructs like satisfaction rates, attitudes towards different things, and more. They are very flexible and very useful, provided you use them carefully.
Many researchers tend to use Likert scales to do things that they were never designed to do
Interpreting Likert scales
Likert items and scales produce what we call ordinal data, i.e., data that we can rank. In the example above, people who select response (1) to item (d) are more fond of fish fingers and custard than people who choose responses (2), (3), (4) and (5). People who choose response (2) like this snack more than those who choose responses (3), (4) and (5), and so on. In addition to ranking ordinal data, we can also tally them. For example, I might want to divide by sample by age group, count how many people chose each of the responses, and compare results across ages. This, however, is almost the extent of what one can do with ordinal data.
The problem with Likert items is that many researchers –including the ones whose paper prompted this post– tend to use them for things that they were never designed to do. Calculating average scores is one of them, and here’s why it’s wrong.
Imagine that you are conducting a survey and ask ten participants about their attitudes towards fish fingers and custard. The table below shows a hypothetical distribution of answers:
| n | % | |
|---|---|---|
| Strongly agree | 1 | 10 |
| Agree | 1 | 10 |
| Neither agree or disagree | 3 | 30 |
| Disagree | 2 | 20 |
| Strongly disagree | 3 | 30 |
The wrong way to do it
If I want to describe the attitudes of a ‘typical person’ (whoever that might be), then I might be tempted to calculate a mean score for this data. ‘Mean’ is a technical word for ‘average’. To do this, I might use the following formula:
[(number of people who selected response 1)*(weighting of response 1) + (number of people who selected response 2)*(weighting of response 2)… (number of people who selected response n)*(weighting of response n)] / (total number of respondents)
In the example above, this would yield:
[(1*1)+(1*2)+(3*3)+(2*4)+(3*5)]/10 = 3.5
Going back to the descriptors, I would then ascertain that an ‘average’ response of 3.5 corresponds to something between ‘no opinion’ and ‘disagreement’. I would therefore go on to write something along the lines of: ‘Our study revealed mild disagreement regarding the palatability of fish and custard (M=3.5)’.
A better way
Plainly put, the option suggested above is statistical nonsense not an optimal interpretation (update: I feel less strongly about this than I used to in 2013, but I still think it is usually wrong).
For this interpretation to be valid, I would need to make assumptions like the following:
- Firstly, I would be need to assume that the psychological distance between ‘strong agreement’ and ‘agreement’ is the same as that between ‘agreement’ and ‘no opinion’.
- A corollary of the above would be that the distance between ‘agreement’ and ‘strong disagreement’ is four times greater than that between ‘agreement’ and ‘strong agreement’.
The mathematical model needs these assumptions in order to work, but they are simply not in the questionnaire design. And even if we forced them into the questionnaire, that would constitute a gross distortion of psychological attitudes and the social world to fit our statistical model.
Ordinal data cannot yield mean values.
If you think they can, proceed at your own risk.
To put it in the simplest terms possible: Ordinal data cannot yield mean values. If you think that they can (and some statistics guidance websites might encourage you to think so), you can still take your chances. But please make sure you justify your rationale well when you write up your methods section.
A safer way forward, if you really need to describe what the ‘average’ or ‘typical’ person might answer, is to look at the median response. The median is a type of average value, like the mean, except that it shows the number that is exactly in the middle of the data, i.e., at the same distance from the highest and lowest value in the dataset. You can find out more about how to calculate the median here.
Summary
- Likert items ask people to choose from ordered responses (e.g., strongly agree → strongly disagree).
- These items generate ordinal data. We can rank responses, but not assume equal spacing between them.
- Calculating the mean of Likert data is tempting but usually misleading, because it imposes false assumptions about equal distances between categories.
- A safer choice is to use the median or to analyse response distributions directly.
Frequently asked questions about Likert scales
What is a Likert scale?
A Likert scale is a survey format where respondents indicate their level of agreement with statements on an ordered scale (e.g., strongly agree to strongly disagree). It is widely used to measure attitudes, opinions, and perceptions.
Are Likert scales ordinal or interval data?
Strictly speaking, Likert items produce ordinal data: we can rank the responses, but we cannot assume that the distance between categories (e.g. between agree and strongly agree) is the same as between others. Some researchers treat them as interval, but this is methodologically questionable.
Can you calculate the mean of Likert scale responses?
It is not advisable. A mean assumes equal spacing between categories, which Likert data do not guarantee. Reporting a mean score like M = 3.5 risks distorting the results. Instead, consider the median or present the distribution of responses.
What is the best way to analyse Likert scale data?
Use frequencies and percentages to describe distributions. Report the median as a measure of central tendency. Consider grouping multiple items that measure the same construct into a composite scale, but justify the method carefully.
More to read about Likert scales
If you came to this page looking for information on Likert scales, you may find the following posts useful: Things you don’t know about Likert scales, and On Likert scales, levels of measurement and nuanced understandings. I also recommend reading this overview of Likert scales and this post by Stephen Westland (University of Leeds), for a more nuanced understanding of Likert scaling and an excellent discussion of how to analyse the data that these scales produce.

On the peer review process
As I wrote at the beginning of this post, one of the papers in the volume that I reviewed made the statistical mistake that I just described, namely it described a set of findings that had been generated by extracting mean values from Likert items. In the authors’ defence, they were neither the first nor the last to engage in this controversial practice: averaging ordinal data is as widespread as it is wrong. Unfortunately, this problem had gone unnoticed by the editors of the collection, and by the peer-reviewers employed by the press. As the book was already in print, I was left wondering whether it was productive to flag this mistake at that stage.
What went wrong with peer review in this instance?
Readers often take it on faith that the people who conducted a study knew what they were doing.
This faith is sometimes misplaced.
It is the nature of the peer-review process that the people who review academic articles can make intelligent substantive judgments on the findings, but might not always have the requisite background to comment on the research process (or visa versa) . For better or for worse, research methods are too diverse and too specialized for reviewers to have more than a passing acquaintance with most of them. In addition, there are limits to the time one can reasonably spend providing unpaid service to the profession, and these often preclude reading up on research methodology every time one comes across a novel research design.
Every now and then, reviewers have to take it on faith that the people who conducted a study knew what they were doing, and they must trust that there are no major flaws in the methods. So, rather than double checking on such matters, we tend to focus our feedback on more substantive aspects of the research (e.g., Are the claims commensurate to the scope of the study? Do the findings add significantly to the existing body of knowledge?). Mistakes in the methodology will, on occasion, slip by.
What can you do if you come across mistakes in published research?
So the question I faced was: what should I do when asked to provide an informed opinion on the quality of a study that has a major flaw? This was not made any easier by the knowledge that the people responsible for finding this flaw had failed to spot it, or deemed it unimportant. So, in this case, I decided to let it pass. Besides, the findings of this particular study were inconclusive and broadly consistent with what was already known about the phenomenon in question. I therefore thought that there was little harm in having one more voice in the literature to add some more weak agreement to the prevailing views – even if the empirical evidence that informed this voice was not very strong.
If there is a take-home message from all this, it is that as a reader you should not put too much faith in the published literature. Just because something has made it to the printing press, it isn’t always right.
Before you go
Are you still reading? I hope that the content of this post was useful to you. Feel very free to jump in the conversation in the comments below, or share the post with anyone that might find it interesting.

About me
Achilleas Kostoulas is an applied linguist and language teacher educator. He teaches at the Department of Primary Education at the University of Thessaly, Greece. Previous academic affiliations include the University of Graz, Austria, and the University of Manchester, UK (which is also where he was awarded his PhD). He has extensive experience teaching research methods in the context of language teacher education.
About this post
This post was originally written in 2013, when I was still a doctoral student. For reasons I do not fully understand, it has come to rank very highly on SERPs about Likert scale measurement. It has been revised several times since, with a view to making it more useful to readers who are looking for statistical advice. The last update was on 9 September 2025. The featured image is from Adobe Stock, and is used with license. The content of the post does not represent the views of my past or present employers.



Leave a Reply