Likert scales are among the most frequently used instruments in questionnaire surveys. They consist of a statement and a range of pre-defined responses which measure the intensity of one’s feelings towards the statement. Here’s an example:
Likert scales are easy for respondents to understand, and easy for researchers to interpret, which accounts for their widespread use in research. However, despite their popularity (or maybe because of it), they are not always used optimally. Here are some tips for increasing their effectiveness.
1. Lick, not Like
Likert scales were created by Rensis Likert, a sociologist at the University of Michigan whose name is properly pronounced “Lick – urt”. The pronunciation “like – urt”, though common, is incorrect.
2. Getting even helps
Most frequently, Likert scales consist of five items (as in the example above), and seven-item scales are also quite common. The mid-range of the scale is a ‘neutral’ option, such as “no opinion”, “neither agree nor disagree”, “not sure” or some phrase to that effect.
Such a practice is problematic for two reasons: Firstly, most respondents tend to avoid voicing extreme opinions – a phenomenon called the central tendency bias. Secondly, many respondents are averse to taking a stand on controversial topics. The combined effect of these tendencies is that, when presented with a ‘safe’ choice at the centre of the scale, respondents are likely to select that, rather than reveal their ‘true’ opinion.
By contrast, a four-item scale (or indeed any scale with an even number of items), encourages respondents to voice an opinion. In the following example, four items have been used in the Likert scale proper. These are flanked by an additional ‘opt-out’ option, for those respondents who truly cannot respond, but the wording of the item and the layout discourage its unnecessary use.
3. More is not better
Some Likert scales contain large numbers of items (7, 9 or 10) to capture a variety of responses. While such scales seem quite sensitive to nuanced positions, they are not always very helpful. For one thing, any benefit from large numbers of options is subject to the law of diminishing returns. From the 7-item scale and upwards, the scales just become too cumbersome to use, any additional benefits are cancelled out by respondent fatigue, and reliability plummets. Secondly, the analytical sensitivity of the scales is compromised by the fact that respondents tend to interpret the scales in different ways: what I describe as “often” may mean the same, in absolute terms, as what you might call “sometimes”. This phenomenon is amplified when the number of potential responses is large.
When interpreting the data, Likert scales with large numbers of items can sometimes be helpfully condensed into fewer, more meaningful categories. In Figure 3, eight items have been condensed into three categories. This may have resulted in a loss of analytical detail, but the new categories are easier to interpret.
4. Mean is meaningless
The most common mistake in interpreting data produced by Likert scales is generating mean values for responses. I have ranted about this practice elsewhere, but here’s the gist: To facilitate coding or save space on a questionnaire, we sometimes substitute Likert-type data with numbers (e.g., Figure 4, top). However, these numbers are just descriptive codes, devoid of numerical value, except in the sense that they can be ranked (thanks, David, for pointing out my unhelpful wording!). Put differently, while a response of Strongly Agree shows more agreement than Agree, a response of Strongly Agree (5) does not show agreement that is five times stronger than Strongly Disagree (1). We could just as easily have used colours, or any other symbol to show the same effect (e.g., Figure 4, bottom).
What all this means is that adding, multiplying or dividing Likert-type values does not make mathematical sense. It also means that calculating mean values makes as much sense as suggesting that ‘the average of a watermelon and two strawberries is an apple’. Other metrics, such as the median or the mode are more appropriate. It follows that, one should avoid statistical methods that rely on the mean: variability in a Likert scale should be estimated using Range and Inter-Quartile Range, but not Standard Deviation. Parametric tests, such as t-tests, must be avoided, and non-parametric tests such as the Mann‐Whitney U-test, the Wilcoxon signed‐rank test and the Kruskal‐Wallis test might be used in their place. For similar reasons bar charts, rather than histograms, must be used to present data.
[NB. Some very well-designed Likert scales can, indeed, produce data that are suitable for calculating means, or running statistical tests that rely on the mean. These scales are the product of careful weighting and extensive testing across large numbers of respondents. Unless one is an experienced statistician, one is advised to not follow their example.]
The advice and opinions presented in the previous sections are intended to help readers use Likert scales more effectively in their research projects. Some more resources, which readers may find helpful, include the following:
- Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 22(140).
This is the seminal article which introduced Likert measurement.
- Cohen, L., Manion, L., & Morrison, K. (2000). Research methods in education (5th ed.). New York: Routledge. (pp. 253-255)
- Gilbert, G. N. (2008). Researching social life (3rd ed.). London: SAGE. (pp. 212 et seq.).
Two widely available manuals which cover the use and limitations of Likert scales.
- Jamieson, S. (2004). Likert scales: how to (ab) use them. Medical education, 38(12), 1217-1218.
A two-page article on the limitations of Likert scaling. The article covers much of the same ground as this post, in a somewhat more eloquent way.
- Matell, M. S., & Jacoby, J. (1971). Is there an optimal number of alternatives for Likert scale items? Educational and psychological measurement.
- Jacoby, J, & Matell, M.S. (1971). Three-point Likert scales are good enough. Journal of marketing research 8(4), 495-500.
These articles use empirical data to put forward the claim that Likert scales with too many options tend to be unreliable.
- Norman, G. (2010). Likert scales, levels of measurement and the “laws” of statistics. Advances in health sciences education, 15(5), 625-632.
This is a ‘rogue’ article, where the argument is made that, despite what purists claim, parametric procedures are robust enough to yield usable findings even when fed with ordinal (i.e., Likert-type) data.
Featured Image by Michael Kwan [CC BY-NC-ND]