This post discusses central tendency measures in Likert scales, i.e., whether you should calculate means or medians, and what differences are large enough to make a difference. In the paragraphs that follow, you will read:
- What the ‘true midpoint‘ is in a Likert scale;
- Why a median is a preferable metric to the mean (or ‘average’);
- How to estimate statistical significance in Likert scale data.
Before we do any of this, let’s do some ground clearing. While writing this post, I will assume that you know what Likert items and scales are (if not, this post may be of some help), and you are familiar with the differences between levels of measurement (you can read more about this here). I will also assume that you know how to process data in SPSS. Finally, I will assume that you are a student working on an applied linguistics or ELT/TESOL project, although the content of the post is likely valid in most other disciplines as well.
With this in mind, let’s go ahead and answer the student’s question:
Could you please offer a brief explanation: what is the true mid-point of a 7-point Likert scale? Also, how much deviation in “mean” or “average scoring” is significant? In other words, is the difference between 2.09 and 2.39 significant? Or does the gap need to be wider to be of consequence? Thank you.
What is the midpoint of the scale?
The mid-point of a 7-point Likert item would be 4, i.e., the point equally distant from both ends of the scale. You should remember that Likert items extend symmetrically in two directions around this central point. This means that if the two ends of the scale are ‘strongly agree’ and ‘strongly disagree’, the mid-point represents a truly neutral stance.
If you had an item with an even number of response anchors, the midpoint would not be expressly represented in the scale. Rather, it would be a hypothetical point between the two options closest to the middle. In a scale between 1 and 4, this would be 2.5. Respondents would not be able to select it, but it would still be useful when it comes to analyse data.
Means and medians in Likert scales
When it comes to calculating the central tendency (‘average’) in data, we have three options: the mode, the median and the mean. You can read more about the differences between these metrics here.
Some researchers (including many who lack formal training in statistical methods) may disagree, but I strongly believe that calculating the mean for data generated by a Likert item is a statistically flawed procedure. That is to say, a Likert scale cannot yield average values like, e.g., 2.09 or 2.39. The reason for this is because the data Likert items produce is not continuous.
Rather, Likert items produce ordinal data. This means that it is a safer choice to report their median. The median is the point that separates the upper and lower halves of the data if we arrange them from largest to smallest. You can read more about calculating the median here.
To estimate statistical significance, you need two numbers: the p-value (a metric that shows how likely it is for your results to be due to chance), and a cut-off value (which indicates how certain you need to be). When the p-value is smaller than the cut-off value, then the findings are significant. The p-value depends on the distribution of your results (which are provided in the student’s question above), and your sample size (which you are not), so it is difficult to tell, on the basis of the information in the example, whether the findings are likely to be significant. Normally, SPSS will flag statistically significant values with one or more asterisks. If it hasn’t, then the results are likely inconclusive.
With regard to the cut-off point, that is something of a judgement call. When putting forward the concept of statistical significance, Ronald Fischer was initially happy to accept any p value under 0.05 as significant (i.e., if the probability that the results were due to chance was smaller than 1/20, he accepted them as statistically significant). This is, however, an arbitrary decision, and conventions vary across fields. In medical research, for instance, a 5% probability of error would be unacceptable, so the cut-off thresholds are much lower (p<0.005, or even p<0.001). To decide what an appropriate threshold of statistical significance is for your study, you’d need to know the conventional cut-off points for your discipline. Published literature will provide some clues as to what is customary in your field.
About this post: This post was originally written on 19th November 2013 in response to a question I was sent. The post was last revised on 8th February 2020. The featured image is by Michael Kwan, and shared via a CC BY-NC-ND license.