Four things you probably didn’t know about Likert scales

Likert scales are among the most frequently used instruments in questionnaire surveys. They consist of a statement and a range of pre-defined responses which measure the intensity of one’s feelings towards the statement. Here’s an example:

Figure 1 Example of a Likert scale

Likert scales are easy for respondents to understand, and easy for researchers to interpret, which accounts for their widespread use in research. However, despite their popularity (or maybe because of it), they are not always used optimally. Here are some tips for increasing their effectiveness.

1.     Lick, not Like

Likert scales were created by Rensis Likert, a sociologist at the University of Michigan whose name is properly pronounced “Lick – urt”. The pronunciation “like – urt”, though common, is incorrect.

2.     Getting even helps

Most frequently, Likert scales consist of five items (as in the example above), and seven-item scales are also quite common. The mid-range of the scale is a ‘neutral’ option, such as “no opinion”, “neither agree nor disagree”, “not sure” or some phrase to that effect.

Such a practice is problematic for two reasons: Firstly, most respondents tend to avoid voicing extreme opinions – a phenomenon called the central tendency bias. Secondly, many respondents are averse to taking a stand on controversial topics. The combined effect of these tendencies is that, when presented with a ‘safe’ choice at the centre of the scale, respondents are likely to select that, rather than reveal their ‘true’ opinion.

By contrast, a four-item scale (or indeed any scale with an even number of items), encourages respondents to voice an opinion. In the following example, four items have been used in the Likert scale proper. These are flanked by an additional ‘opt-out’ option, for those respondents who truly cannot respond, but the wording of the item and the layout discourage its unnecessary use.

Figure 2 A ‘forced-choice’ Likert scale

3.     More is not better

Some Likert scales contain large numbers of items (7, 9 or 10) to capture a variety of responses. While such scales seem quite sensitive to nuanced positions, they are not always very helpful. For one thing, any benefit from large numbers of options is subject to the law of diminishing returns. From the 7-item scale and upwards, the scales just become too cumbersome to use, any additional benefits are cancelled out by respondent fatigue, and reliability plummets. Secondly, the analytical sensitivity of the scales is compromised by the fact that respondents tend to interpret the scales in different ways: what I describe as “often” may mean the same, in absolute terms, as what you might call “sometimes”. This phenomenon is amplified when the number of potential responses is large.

When interpreting the data, Likert scales with large numbers of items can sometimes be helpfully condensed into fewer, more meaningful categories. In Figure 3, eight items have been condensed into three categories. This may have resulted in a loss of analytical detail, but the new categories are easier to interpret.

Figure 3 Condensing categories

4.     Mean is meaningless

The most common mistake in interpreting data produced by Likert scales is generating mean values for responses. I have ranted about this practice elsewhere, but here’s the gist: To facilitate coding or save space on a questionnaire, we sometimes substitute Likert-type data with numbers (e.g., Figure 4, top). However, these numbers are just descriptive codes, devoid of numerical value, except in the sense that they can be ranked (thanks, David, for pointing out my unhelpful wording!). Put differently, while a response of Strongly Agree shows more agreement than Agree, a response of Strongly Agree (5) does not show agreement that is five times stronger than Strongly Disagree (1). We could just as easily have used colours, or any other symbol to show the same effect (e.g., Figure 4, bottom).

Figure 4 Science Fiction Attitude Survey

What all this means is that adding, multiplying or dividing Likert-type values does not make mathematical sense. It also means that calculating mean values makes as much sense as suggesting that ‘the average of a watermelon and two strawberries is an apple’. Other metrics, such as the median or the mode are more appropriate. It follows that, one should avoid statistical methods that rely on the mean: variability in a Likert scale should be estimated using Range and Inter-Quartile Range, but not Standard Deviation. Parametric tests, such as t-tests, must be avoided, and non-parametric tests such as the Mann‐Whitney U-test, the Wilcoxon signed‐rank test and the Kruskal‐Wallis test might be used in their place. For similar reasons bar charts, rather than histograms, must be used to present data.

[NB. Some very well-designed Likert scales can, indeed, produce data that are suitable for calculating means, or running statistical tests that rely on the mean. These scales are the product of careful weighting and extensive testing across large numbers of respondents. Unless one is an experienced statistician, one is advised to not follow their example.]


The advice and opinions presented in the previous sections are intended to help readers use Likert scales more effectively in their research projects.  Some more resources, which readers may find helpful, include the following:

  • Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 22(140).
    This is the seminal article which introduced Likert measurement.
  • Cohen, L., Manion, L., & Morrison, K. (2000). Research methods in education (5th ed.). New York: Routledge. (pp. 253-255)
  • Gilbert, G. N. (2008). Researching social life (3rd ed.). London: SAGE. (pp. 212 et seq.).
    Two widely available manuals which cover the use and limitations of Likert scales.
  • Jamieson, S. (2004). Likert scales: how to (ab) use them. Medical education38(12), 1217-1218.
    A two-page article on the limitations of Likert scaling. The article covers much of the same ground as this post, in a somewhat more eloquent way.
  • Matell, M. S., & Jacoby, J. (1971). Is there an optimal number of alternatives for Likert scale items? Educational and psychological measurement.
  • Jacoby, J, & Matell, M.S. (1971). Three-point Likert scales are good enough. Journal of marketing research 8(4),  495-500.
    These articles use empirical data to put forward the claim that Likert scales with too many options tend to be unreliable.
  • Norman, G. (2010). Likert scales, levels of measurement and the “laws” of statistics. Advances in health sciences education15(5), 625-632.
    This is a ‘rogue’ article, where the argument is made that, despite what purists claim, parametric procedures are robust enough to yield usable findings even when fed with ordinal (i.e., Likert-type) data.

Featured Image by Michael Kwan [CC BY-NC-ND]

18 Replies to “Four things you probably didn’t know about Likert scales”

  1. Would you like to provide me simple and clear details step by step what correlation test to use with a 5 likert scale data? 1= strongly disagree, 3= neither agree nor disagree, 5= strongly agree. Variables are: Beliefs, attitude, intention, social pressure, behavior,…
    I am just confused whether I should use Pearson or Spearman. If Spearman (as recommended by most papers) how to start? here is my big problem.
    Thank you

  2. Can I transform the Likert scale of variable X: Let say the previous scholars using 5-points Likert scale for measuring variable X and I intended to use the same measurement but with 7-Likert scale. If possible, is there any restrictions or rules? Thank you.

  3. I found this post quite helpful to me, esp. the “getting even helps” part.
    If I’d like to cite this part in my paper, how would I cite or which paper(s) should I cite?

    1. I’m glad it was of some help! It’s likely that your tutors will prefer a reference to a proper statistics book, rather than a blog, but if you really want to cite me, here’s how to do it in APA style:

      In the text, you can use my name and the date of publication in brackets at the end of the sentence (Kostoulas, 2013).

      In the list of works cited, you can list the following information:
      Kostoulas, A. (2013). Four things you didn’t know about Likert scales. Retrieved from [url] on [date].

      Other citation formats will present this information in different orders, but I think this is all the information you need.

      1. It’s true that it’d be better to cite a book or a published paper but no book or paper that I read is so positive about using a scale of four or six items.

  4. Hi Achilleas, when you say “these numbers are just descriptive codes, devoid of numerical value” I know where you’re coming from. They are in one sense completely arbitrary. But a typical Likert item does have at least ‘ordinal’ numeric value. So, a 5 is greater than a 4 wrt to the concept measured. There’s clear consensus that ‘ratio’ quality is lacking (4 is not double 2). The disagreements tend to occur regarding the extent to which such items (or scales when aggregated) have ‘interval’ properties. Most would agree that it’s wrong to assume equal difference between a 5 and 3, as compared to a 4 and 2. However, from Nunnally onwards, many in social science have felt that Likert scales often have sufficient ‘equal interval’ properties to support the use of means, t-tests etc.
    An interesting discussion here:

    1. Thanks for this, David. What I was trying to say is that these numerical descriptors do not have the precision one would associate with numbers. I think you have explained it better than I have.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s