Likert scales: Four things you may not know (Achilleas Kostoulas)

Likert scales are among the most frequently used instruments in questionnaire surveys. Because they are relatively simple to design and fairly straightforward to interpret, we tend to use them a lot in applied linguistics research. This post has some information that you need to know if you are using Likert-scales in your research project.

In this post, you will learn the following four things:

How to correctly pronounce ‘Likert’;
If it is better to use odd or even numbers of responses;
What the best number of responses is;
Why you should not use weighted averages when analysing Likert data.

Some preliminaries

Before we start, I will very briefly go over what Likert scales are and what my assumptions are about you, the intended reader. This will help to ensure that we are sharing the same initial assumptions.

Who is this post is for?

When writing this post, the primary audience that I had in mind is students in an applied linguistics or language education course, perhaps working towards the completion of their dissertation or similar projects. I will therefore assume that you understand basic mathematics, such as calculating averages. I will also assume that you do not need to know much about the technical aspects of Likert measurement. Finally, I will assume that you are competent with using statistical software, so I will not cover any of that here.

That said, much of the information in this post is likely to be useful for people working in diverse fields where Likert measurement is used.

What is a Likert scale?

A Likert scale is a group of statements and predefined responses that measure the intensity of the respondents’ feelings towards the preceding statement. Each statement and the answers that go with it are called an item. The construct that an item measures is called a variable. Here’s an example of an item:

	Strongly Agree	Agree	Disagree	Strongly Disagree
I just love Mondays!

A Likert scale typically has multiple items, all of which measure the same underlying construct (or ‘latent variable‘). In the example below, the four items measure a latent variable of ‘garfieldness’.

	Strongly Agree	Agree	Disagree	Strongly Disgree
I just love Mondays
I am very fond of lasagna
I am afraid of spiders
I am quite uncomfortable at the vet’s office

1. Lick, not Like

Likert scales were created by Rensis Likert, a sociologist at the University of Michigan. The proper pronunciation of his name is “Lick – uhrt”. The pronunciation “like – uhrt”, though common, is incorrect.

2. Getting even helps

A Likert item consists of a prompt and a set of responses, often ranging from Strongly agree to Strongly disagree. There are usually five responses for each item, but seven-item scales are also quite common. When using an odd number of responses, the midpoint is a ‘neutral’ option, such as “no opinion”, “neither agree nor disagree”, “not sure” or some phrase to that effect.

What’s wrong with an even number of options?

Providing respondents with an even number of options has some advantages, but there are also two somewhat important problems, at least for our purposes in language teaching and applied linguistics research.

Firstly, many respondents tend to avoid voicing extreme opinions or taking a stand on controversial topics. This means that respondents are likely to select a ‘safe’ choice at the centre of the scale if one is available, rather than reveal their ‘true’ opinion – a phenomenon called the central tendency bias. This is especially the case when respondents are conscious of power imbalances (e.g., students responding to a questionnaire designed by their professors or teachers engaging with university-based research).

A second potential problem with middle options is that they can be hard to interpret. While we might assume that it means something along the lines of ‘I have no strong views either way’, this may not be true of all respondents. For some respondents, for example, the ‘neutral option’ could mean that ‘I don’t care either way’; for others it may mean that ‘I have no knowledge of this’.

Is there a better way to do this?

We can avoid some of these problems by using items that have an even number of responses. In the following example, respondents are presented with four ‘true’ options, which encourage them to voice a positive or negative opinion. This response format is called a ‘forced choice’ or ‘ipsative’ item.

	Strongly agree	Agree	Disagree	Strongly disagree	Never tasted it
Fish fingers and custard taste great

The table above shows an ipsative item. This contains four ‘proper’ responses under the statement, in order to force respondents to register some agreement or disagreement.

There is also an additional ‘opt-out’ option for those respondents who truly cannot respond, but the wording of the item and the layout discourage its unnecessary use.

Disclaimer 1: Whether you use a ‘neutral’ option or not will depend a lot on your research aims, and the power dynamics in your research context. You might want to read more about the pros and cons of adding a neutral option in this article by TalentMap.

3. Less is more

Some Likert items contain large numbers of possible response options (7, 9 or 10) to capture a variety of nuanced positions. While such scales seem quite sensitive and accurate, they are not always very helpful. For one thing, any benefit from large numbers of options is subject to the law of diminishing returns. From the 7-option format and upwards, the scales just become too cumbersome to use. At that point, any additional benefits are cancelled out by respondent fatigue, and reliability plummets. Secondly, a large number of options might compromise the analytical sensitivity of the scales, because respondents tend to interpret the scales in different ways: what I describe as “often” may mean the same, in absolute terms, as what you might call “sometimes”. This phenomenon becomes more pronounced when the number of potential responses is large.

When interpreting the data, Likert items with many potential responses can sometimes be helpfully condensed into fewer, more meaningful categories. If you have an item with seven or nine responses, but a small sample size, this could mean only a small number of respondents have selected each option. This is problematic because small numbers of respondents often limit the effectiveness of certain statistical procedures. In such cases, it might make sense to group all the ‘positive’ and ‘negative’ answers together. Doing so involves the loss of some analytical detail, but this is an imperfect universe…

4. The mean is meaningless

The most common mistake in interpreting Likert scale data is reporting the mean values for responses. I have ranted about this practice elsewhere, but here’s the gist:

To facilitate coding or save space on a questionnaire, we sometimes use numbers to represent response options in Likert items (e.g., Figure 1, top). These numerals are just descriptive codes, not ‘true’ numbers. From a mathematical perspective, a ‘Strongly Agree’ response indicates more agreement than ‘Agree’, but it does not show agreement that is five times stronger than ‘Strongly Disagree’. We could just as easily have used colours to anchor the responses, or any other symbol to show the same effect (e.g., Figure 1, bottom).

In other words, we can use the data from Likert items (ordinal data, to be technical) if we want to rank responses, but that’s about the limit of what we can do with them .

*Figure 1. Two ways of doing the Science Fiction Attitude Survey*

Is it such a bad thing to calculate means?

To make this even clearer: We would be very unlikely to say that ‘the average response is agree and three quarters‘. Using numbers to express the same idea makes no more sense. Similarly, when we describe the fruit on a grocery stand, we can say that strawberries are smaller than apples, which are smaller than watermelons, and we can count how many fruit of each type are on sale, but we would never say that ‘the fruit on display are, on average, apples’. Reporting that “the average of two agreements and one strong disagreement equals ‘plain’ disagreement” is just as bizarre.

Once more: when it comes to analysing the data that Likert items produce, reporting the mean makes very little mathematical sense (I am being charitable: others have called it an ‘indefensible‘ practice, and one of the seven ‘deadly sins‘ of statistics).

Averaging out the responses of Likert items is a problematic procedure.

So what should one do instead?

The following set of posts contains some advice on how to analyse and interpret Likert scale data. The gist is that the safest metric of central tendency to use is the median. The mode is also a safe, but less useful metric.

For similar reasons, when we want to estimate the spread of responses in a Likert scale, it is best to use Range and InterQuartile Range (IQR). The Standard Deviation is a not a good choice, for reasons like the ones we have mentioned above.

It is also safer to avoid statistical procedures that rely on the mean (e.g. t-tests). Non-parametric tests, such as the Mann‐Whitney U-test, the Wilcoxon signed‐rank test and the Kruskal‐Wallis test are better alternatives.

For presenting data, it’s best to use bar charts, rather than histograms.

Here is some more advice about using Likert scales

How to interpret Ordinal Data: Median and Interquartile Range for Likert Scales

by Achilleas Kostoulas 23 February 201430 December 2024

How to Interpret Likert Scales: Midpoints, Means, Medians and Statistical Significance

by Achilleas Kostoulas 19 November 20136 October 2025

How to summarise Likert scale data using SPSS

by Achilleas Kostoulas 15 December 201431 December 2024

Disclaimer 2: Under certain circumstances, a Likert scale (i.e., a collection of Likert items) can produce data that are suitable for calculating means, or running statistical tests that rely on the mean. These can be called ‘ordinal approximations of continuous data’. Experienced statisticians can probably get away with this, and they might be able to argue convincingly why their approach was appropriate. But if you’re doing a student project, the conservative approach suggested here is safer.

Shelf of books on research methods — Shelf of books about research methods

Additional reading about Likert scales

The advice and opinions in the previous sections were written to help you use Likert scales more effectively in your research projects. It has not been my intention to create an authoritative or comprehensive research methods guide, and I strongly encourage you to follow up on some of the things that you’ve just read. Some more resources that you may find helpful include the following:

General reading

Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 22(140).
Cohen, L., Manion, L., & Morrison, K. (2000). Research methods in education (5th edn, pp. 253-255). Routledge.
Gilbert, G. N. (2008). Researching social life (3rd edn, pp. 212ff.). SAGE.

Limitations of Likert scales

Jamieson, S. (2004). Likert scales: how to (ab) use them. Medical Education, 38(12), 1217-1218.
Matell, M. S., & Jacoby, J. (1971). Is there an optimal number of alternatives for Likert scale items? Educational and Psychological Measurement, 31(3), 657-674.
Jacoby, J., & Matell, M. S. (1971). Three-point Likert scales are good enough. Journal of Marketing Research, 8(4), 495-500.

Some different views about Likert scales

The articles listed below describe perspectives on Likert scaling that are not in line with the recommendations I have made above.

Norman, G. (2010). Likert scales, levels of measurement and the “laws” of statistics. Advances in Health Sciences Education, 15(5), 625-632. [This is a ‘rogue’ article, where the argument is made that, despite what purists claim, parametric procedures are robust enough to yield usable findings even when fed with ordinal (i.e., Likert-type) data.]
Sullivan, G. M., & Artino, A. R. (2013). Analyzing and interpreting data from Likert-type scales. Journal of Graduate Medical Education, 5(4), 541–542. [This article extends the argument put forward by Norman (above). The authors concede that parametric tests tend to yield ‘correct’ results even if their assumptions are violated, but point out that “means are often of limited value unless the data follow a classic normal distribution and a frequency distribution of responses will likely be more helpful”. ]

Before you go: If you’ve landed on this page while preparing for a student project, I wish you good luck with your work. I hope that this information was helpful, but if there’s anything that was not clear, feel free to drop a line in the comments below or send me a message using the contact form. Also, please feel free to forward this information to anyone who might find it useful.

About me

Achilleas Kostoulas is an applied linguist and language teacher educator. He teaches at the Department of Primary Education at the University of Thessaly, Greece. Previous academic affiliations include the University of Graz, Austria, and the University of Manchester, UK (which is also where he was awarded his PhD). He has extensive experience teaching research methods in the context of language teacher education.

About this post

This post was originally written in September 2013, based on lecture notes from a research methodology seminar that I was teaching at the time. It was last updated in July 2025. The featured image, from Adobe Stock, is used with license.

Comments

19 responses to “Likert scales: Four things you may not know”

goldandfish

29 September 2015

Can I transform the Likert scale of variable X: Let say the previous scholars using 5-points Likert scale for measuring variable X and I intended to use the same measurement but with 7-Likert scale. If possible, is there any restrictions or rules? Thank you.

Loading…

1. Achilleas
  
  30 September 2015
  
  Yes, you can and no there’s nothing to be concerned about!
  
  Loading…
  
  1. goldandfish
    
    19 January 2016
    
    Thank you!
    
    Loading…
Nusirat Yusuf

14 February 2016

Thanks the page helps alot

Loading…

Jane Li

23 February 2016

I found this post quite helpful to me, esp. the “getting even helps” part.
If I’d like to cite this part in my paper, how would I cite or which paper(s) should I cite?

Loading…

1. Achilleas
  
  23 February 2016
  
  I’m glad it was of some help! It’s likely that your tutors will prefer a reference to a proper statistics book, rather than a blog, but if you really want to cite me, here’s how to do it in APA style:
  
  In the text, you can use my name and the date of publication in brackets at the end of the sentence (Kostoulas, 2013).
  
  In the list of works cited, you can list the following information:
  Kostoulas, A. (2013). Four things you didn’t know about Likert scales. Retrieved from [url] on [date].
  
  Other citation formats will present this information in different orders, but I think this is all the information you need.
  
  Loading…
  
  1. Jane Li
    
    23 February 2016
    
    Thank you for the information.
    
    Loading…
  2. Jane Li
    
    23 February 2016
    
    It’s true that it’d be better to cite a book or a published paper but no book or paper that I read is so positive about using a scale of four or six items.
    
    Loading…
David C

14 September 2017

Hi Achilleas, when you say “these numbers are just descriptive codes, devoid of numerical value” I know where you’re coming from. They are in one sense completely arbitrary. But a typical Likert item does have at least ‘ordinal’ numeric value. So, a 5 is greater than a 4 wrt to the concept measured. There’s clear consensus that ‘ratio’ quality is lacking (4 is not double 2). The disagreements tend to occur regarding the extent to which such items (or scales when aggregated) have ‘interval’ properties. Most would agree that it’s wrong to assume equal difference between a 5 and 3, as compared to a 4 and 2. However, from Nunnally onwards, many in social science have felt that Likert scales often have sufficient ‘equal interval’ properties to support the use of means, t-tests etc.
An interesting discussion here:
https://www.researchgate.net/post/Is_a_Likert-type_scale_ordinal_or_interval_data

Loading…

1. Achilleas
  
  14 September 2017
  
  Thanks for this, David. What I was trying to say is that these numerical descriptors do not have the precision one would associate with numbers. I think you have explained it better than I have.
  
  Loading…
  
mah

27 August 2018

hi Achilleas Thanks for your insightful sharing. I have a query regarding Likert Scale Score Codes.

During my thesis my supervisor asked me to give scoring code of 1 to Strongly Agree and of 5 to Strongly Disagree, thus my high mean score is either (1 or 2). But the problem is can high mean value be given low codes (1 and 2 instead of 5 and 4), is it right practice or not?

Can I justify it by using your words that ” these numbers are just descriptive codes, devoid of numerical value” with some literature to support it.

Loading…

1. Achilleas Kostoulas
  
  28 August 2018
  
  Hi! Like you said, ‘high’ and ‘low’ are relative terms: they depend on what you define as the ‘top’ and ‘bottom’ of the scale, not the numerical value of the descriptor. Hope that helps, and good luck with your project!
  
  Loading…
  
mah

28 August 2018

thanks a lot again for your insightful sharing, Now I got clear understanding !

Loading…

Samuel kobina otu

8 September 2019

Please Achilleas, my test value is =2.5, but in the questionnaire the Likert scale was 1 to 5 ,starting from strongly agreed strongly disagreed. My supervisor said,my test value which is 2.5 should rather be somewhere 3 since the Likert scale is 1 to 5. What should I do?

Loading…

1. Achilleas Kostoulas
  
  8 September 2019
  
  Your supervisor is the person you need to discuss this question with.
  
  Loading…
  
Senapathy

3 March 2020

Revered Professor Achilleas Kostoulas,
All the scales are psychometric measurements of the statements or items which is preferably selected by the researcher by the researcher according to this nature of his research. I am from Rural Development background that I am guiding the scholars now. Either 7 point scale or 10 point scales are advisable to construct the statements and the type of research focus on Micro, and Small Enterprises activities at zonal level. Looking for your valuable advise from you Prof.

Loading…

1. Achilleas Kostoulas
  
  3 March 2020
  
  Dear Senapathy,
  
  Thanks for your message. I would be happy to help if I can, but I am not sure I understand what you are trying to research and what advice you need. Would you like to re-phrase your question for me, please?
  
  Loading…
  
Christine espana

9 February 2021

Why is the weighted mean not match to the degree of the agreement

Loading…

1. Achilleas Kostoulas
  
  9 February 2021
  
  Because e.g. ‘strongly agreeing’ does not mean ‘agreeing’ x 2. These are ordinal data.
  
  Loading…