Four things you probably didn’t know about Likert scales

Likert scales are among the most frequently used instruments in questionnaire surveys. They consist of statements and pre-defined responses which measure the intensity of one’s feelings towards the preceding statement. Here’s an example of such as a statement, or ‘item’:

Likert item consisting of prompt and five options
Figure 1. Example of a Likert item (Purists might argue that the response options must be displayed horizontally. They are technically correct, but I can’t be bothered to change the figure, so they can argue on…)

Likert scales are easy for respondents to understand, and easy for researchers to interpret, which accounts for their widespread use in research and student projects. However, despite their popularity (or maybe because of it), they are often used in loose ways that are not always optimal. Here are four tips to help you avoid common pitfalls. In this post you will learn:

  1. How to correctly pronounce the name of the measurement technique
  2. If it is better to use odd or even numbers of responses
  3. What the best number of responses is
  4. Why you should not use weighted averages when analysing Likert data
Embed from Getty Images

1. Lick, not Like

Likert scales were created by Rensis Likert, a sociologist at the University of Michigan. The proper pronunciation of his name is “Lick – uhrt”. The pronunciation “like – uhrt”, though common, is incorrect.

2. Getting even helps

A Likert item consists of a prompt and a set of responses. Most frequently, there are five responses for each item, often ranging from Strongly agree to Strongly disagree. Seven-item scales are also quite common. When using an odd number of responses, the mid-range is a ‘neutral’ option, such as “no opinion”, “neither agree nor disagree”, “not sure” or some phrase to that effect.

Such a practice can be problematic for at least two reasons: Firstly, many respondents tend to avoid voicing extreme opinions or taking a stand on controversial topics. This means that respondents are likely to select a ‘safe’ choice at the centre of the scale if one is available, rather than reveal their ‘true’ opinion – a phenomenon called the central tendency bias. This is especially the case when respondents are conscious of power imbalances (e.g., students responding to a questionnaire designed by their professors or teachers engaging with university-based research).

A second potential problem with middle options is that they can be hard to interpret. While we might assume that it means something along the lines of ‘I have no strong views either way’, this may not be true of all respondents. For some respondents, for example, the ‘neutral option’ could mean that ‘I don’t care either way’; for others it may mean that ‘I have no knowledge of this’.

Slide2
Figure 2. A ‘forced-choice’ Likert item

We can avoid some of these problems using items that have an even number of responses. In the following example, respondents are presented with four ‘true’ options, which encourage them to voice a positive or negative opinion. Such a format is called a ‘forced choice’ or ‘ipsative’. Figure 2 shows an ipsative item. This contains four ‘proper’ responses under the statement, in order to force respondents to register some agreement or disagreement. An additional ‘opt-out’ option is provided for those respondents who truly cannot respond, but the wording of the item and the layout discourage its unnecessary use.

Disclaimer 1: Whether you use a ‘neutral’ option or not will depend a lot on your research aims, and the power dynamics in your research context. You might want to read more about the pros and cons of adding a neutral option in this article by TalentMap. 

Embed from Getty Images

3. Less is more

Some Likert items contain large numbers of possible response options (7, 9 or 10) to capture a variety of positions. While such scales seem quite sensitive and accurate, they are not always very helpful. For one thing, any benefit from large numbers of options is subject to the law of diminishing returns. From the 7-option format and upwards, the scales just become too cumbersome to use, any additional benefits are cancelled out by respondent fatigue, and reliability plummets. Secondly, the analytical sensitivity of the scales is compromised by the fact that respondents tend to interpret the scales in different ways: what I describe as “often” may mean the same, in absolute terms, as what you might call “sometimes”. This phenomenon is amplified when the number of potential responses is large.

When interpreting the data, Likert items with many potential responses can sometimes be helpfully condensed into fewer, more meaningful categories. If you have an item with seven or nine responses, but a small sample size, this could mean that most responses have been selected by very few participants. This is problematic because small numbers of respondents often limit the effectiveness of certain statistical procedures. In such cases, it might make sense to group all the ‘positive’ and ‘negative’ answers together. Doing so would involve the loss of some analytical detail, but this is an imperfect universe…

Embed from Getty Images

4. The mean is meaningless

The most common mistake in interpreting the data that Likert scales generate is reporting the mean values for responses. I have ranted about this practice elsewhere, but here’s the gist: To facilitate coding or save space on a questionnaire, we sometimes substitute the possible responses to Likert items with numbers (e.g., Figure 3, top). These numbers are just descriptive codes, not ‘true’ numbers. From a mathematical perspective, a ‘Strongly Agree’ response indicates more agreement than ‘Agree’, but it does not show agreement that is five times stronger than ‘Strongly Disagree’. We could just as easily have used colours to anchor the responses, or any other symbol to show the same effect (e.g., Figure 3, bottom). In other words, the data from Likert items (ordinal data, to be technical) can be used to rank responses, but that’s about the limit of what we can do with them .

Slide4Slide5
Figure 3. Science Fiction Attitude Survey

To make this even clearer: we would be very unlikely to say that ‘the average response is agree and three quarters‘ — using numbers to express the same idea makes no more sense. Similarly, we can describe the fruit on a grocery stand, noting that strawberries are smaller than apples which are smaller than watermelons, and we can count how many fruit of each type are on sale, but we would never say that ‘the fruit on display are, on average, apples’. Reporting that the average of ‘two agrees and a strongly disagree’ is ‘disagree’ is just as bizarre. To reiterate, when it comes to analysing the data that Likert items produce, reporting the mean makes little sense mathematically (I am being charitable: others have called it  an ‘indefensible‘ practice and one of the seven ‘deadly sins‘ of statistics). Other metrics, such as the median or the mode are more appropriate. (You can find out more about these metrics here).

For similar reasons, the spread of answers in a Likert scale is best estimated using Range and Inter-Quartile Range, but not Standard Deviation. Also, statistical procedures, such as t-tests, that rely on the mean should be avoided in most cases, and non-parametric tests such as the Mann‐Whitney U-test, the Wilcoxon signed‐rank test and the Kruskal‐Wallis test might be used in their place. When it comes to presenting data, it’s best to use bar charts, rather than histograms.

Disclaimer 2: Under certain circumstances, a Likert scale (i.e., a collection of Likert items) can produce data that are suitable for calculating means, or running statistical tests that rely on the mean. These can be called ‘ordinal approximations of continuous data’. Experienced statisticians can probably get away with this, and they might be able to argue convincingly why their approach was appropriate, but if you’re doing a student project, the conservative approach suggested here is safer.

~

The advice and opinions presented in the previous sections are intended to help you use Likert scales more effectively in your research projects. This is not an authoritative or comprehensive research methods guide, and I strongly encourage you to follow up on some of the things that you’ve just read. Some more resources that you may find helpful include the following:

  • Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 22(140).
    This is the seminal article which introduced Likert measurement.
  • Cohen, L., Manion, L., & Morrison, K. (2000). Research methods in education (5th edn). New York: Routledge. (pp. 253-255)
  • Gilbert, G. N. (2008). Researching social life (3rd edn). London: SAGE. (pp. 212 et seq.).
    Two widely available manuals which cover the use and limitations of Likert scales.
  • Jamieson, S. (2004). Likert scales: how to (ab) use them. Medical Education38(12), 1217-1218.
    A two-page article on the limitations of Likert scaling. The article covers much of the same ground as this post, in a somewhat more eloquent way.
  • Matell, M. S., & Jacoby, J. (1971). Is there an optimal number of alternatives for Likert scale items? Educational and Psychological Measurement, 31(3), 657-674.
  • Jacoby, J, & Matell, M.S. (1971). Three-point Likert scales are good enough. Journal of marketing research, 8(4), 495-500.
    These articles use empirical data to put forward the claim that Likert scales with too many options tend to be unreliable.
  • Norman, G. (2010). Likert scales, levels of measurement and the “laws” of statistics. Advances in Health Sciences Education15(5), 625-632.
    This is a ‘rogue’ article, where the argument is made that, despite what purists claim, parametric procedures are robust enough to yield usable findings even when fed with ordinal (i.e., Likert-type) data.
  • Sullivan, G. M., & Artino, A. R. (2013). Analyzing and interpreting data from Likert-type scales. Journal of Graduate Medical Education5(4), 541–542. 
    This article extends the argument put forward by Norman (above). The authors concede that parametric tests tend to yield ‘correct’ results even if their assumptions are violated, but point out that “means are often of limited value unless the data follow a classic normal distribution and a frequency distribution of responses will likely be more helpful”.

Before you go: I hope that this information was helpful, but if there’s anything that was not clear, feel free to drop a line in the comments below. You may also want to check out some more posts I have written on quantitative research, including:

If’ve come to this page while preparing for one of your student projects, I wish you all the best with your work. There’s a range of social sharing buttons below in case you feel like sharing this information among fellow students who might find it useful. Also feel free to ask any other questions you may have, using the contact form.


Featured Image by Michael Kwan [CC BY-NC-ND]

18 Replies to “Four things you probably didn’t know about Likert scales”

  1. Would you like to provide me simple and clear details step by step what correlation test to use with a 5 likert scale data? 1= strongly disagree, 3= neither agree nor disagree, 5= strongly agree. Variables are: Beliefs, attitude, intention, social pressure, behavior,…
    I am just confused whether I should use Pearson or Spearman. If Spearman (as recommended by most papers) how to start? here is my big problem.
    Thank you

  2. Can I transform the Likert scale of variable X: Let say the previous scholars using 5-points Likert scale for measuring variable X and I intended to use the same measurement but with 7-Likert scale. If possible, is there any restrictions or rules? Thank you.

  3. I found this post quite helpful to me, esp. the “getting even helps” part.
    If I’d like to cite this part in my paper, how would I cite or which paper(s) should I cite?

    1. I’m glad it was of some help! It’s likely that your tutors will prefer a reference to a proper statistics book, rather than a blog, but if you really want to cite me, here’s how to do it in APA style:

      In the text, you can use my name and the date of publication in brackets at the end of the sentence (Kostoulas, 2013).

      In the list of works cited, you can list the following information:
      Kostoulas, A. (2013). Four things you didn’t know about Likert scales. Retrieved from [url] on [date].

      Other citation formats will present this information in different orders, but I think this is all the information you need.

      1. It’s true that it’d be better to cite a book or a published paper but no book or paper that I read is so positive about using a scale of four or six items.

  4. Hi Achilleas, when you say “these numbers are just descriptive codes, devoid of numerical value” I know where you’re coming from. They are in one sense completely arbitrary. But a typical Likert item does have at least ‘ordinal’ numeric value. So, a 5 is greater than a 4 wrt to the concept measured. There’s clear consensus that ‘ratio’ quality is lacking (4 is not double 2). The disagreements tend to occur regarding the extent to which such items (or scales when aggregated) have ‘interval’ properties. Most would agree that it’s wrong to assume equal difference between a 5 and 3, as compared to a 4 and 2. However, from Nunnally onwards, many in social science have felt that Likert scales often have sufficient ‘equal interval’ properties to support the use of means, t-tests etc.
    An interesting discussion here:
    https://www.researchgate.net/post/Is_a_Likert-type_scale_ordinal_or_interval_data

    1. Thanks for this, David. What I was trying to say is that these numerical descriptors do not have the precision one would associate with numbers. I think you have explained it better than I have.

  5. hi Achilleas Thanks for your insightful sharing. I have a query regarding Likert Scale Score Codes.

    During my thesis my supervisor asked me to give scoring code of 1 to Strongly Agree and of 5 to Strongly Disagree, thus my high mean score is either (1 or 2). But the problem is can high mean value be given low codes (1 and 2 instead of 5 and 4), is it right practice or not?

    Can I justify it by using your words that ” these numbers are just descriptive codes, devoid of numerical value” with some literature to support it.

    1. Hi! Like you said, ‘high’ and ‘low’ are relative terms: they depend on what you define as the ‘top’ and ‘bottom’ of the scale, not the numerical value of the descriptor. Hope that helps, and good luck with your project!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.