Suppose you have created a questionnaire with Likert-type questions, and you now want to interpret the data. This post contains some guidance on how to do this: Specifically, we will look into central tendency measures, such as means and medians. We will also see how one might decide if differences are large enough to show something (statistically significant).

In the paragraphs that follow, you will read:

- What the ‘true midpoint‘ is in a Likert scale;
- Why a median is a preferable metric to the mean (or ‘average’);
- How to estimate statistical significance in Likert scale data.

## Assumptions

Before we do any of this, let’s do some ground clearing. While writing this post, I will assume that you already know what Likert items and scales are (if not, this post may be of some help), and that you are familiar with the differences between levels of measurement. I will also assume that you know how to process data in SPSS, so I will not give detailed instructions about this. Lastly, I will frame the discussion around the assumption you are working on an applied linguistics or ELT/TESOL project. However, the content of the post is likely valid in most other disciplines as well.

With this in mind, let’s go ahead and answer this question that a student asked:

Could you please offer a brief explanation: what is the true mid-point of a 7-point Likert scale? Also, how much deviation in “mean” or “average scoring” is significant? In other words, is the difference between 2.09 and 2.39 significant? Or does the gap need to be wider to be of consequence? Thank you.

## What is the midpoint of a Likert scale?

Likert scales are ‘bivalent’ and ‘equidistant’. This is a sophisticated way to say that they extend symmetrically in two directions around a midpoint. The distance is measured in ‘steps’ around the mid-point, and these steps are usually numbered for convenience. These numbers are just labels, not true quantities: you should be very careful not to misuse them in calculations.

The mid-point of the scale is the ‘number’ that is equally distant from both ends of the scale. In the example above, the student is using a scale from 1 to 7, so the mid-point is 4, i.e., the point equally distant from both ends of the scale. If the 7-point scale ranged between 0 and 6, the mid-point would be four. Note that this would not change anything in the data or the information that the numbers encode.

Some scales have an even number of response anchors. In this case, the midpoint is expressly represented in the scale. Rather, it is a hypothetical point between the two options closest to the middle. In a scale between 1 and 4, this would be 2.5. Respondents are not be able to select it, but it is still quite useful when it comes to analysing data.

One word of caution about interpreting data clustered around the mid-point: the correct interpretation of a mid-point is that it shows a truly neutral stance. Suppose you ask me to select an option between “1: strongly disagree” and “5: strongly agree” that statistics is hard. If I select “3”, this means that I do not have an opinion either way. This is not the same as arguing that statistics is “somewhat hard”.

## Central tendency in Likert scales: Means and medians

The mid-point is the ‘centre’ around which the scale was constructed. The central tendency is the ‘centre’ around which the responses have clustered. Imagine a 5-point scale, asking people whether pizza is tasty. The mid-point is 3, as is always the case with 5-point scales. But the central tendency is likely to be closer to 5 or 1, depending on how you have coded ‘strongly agree’/

When it comes to calculating the central tendency (‘average’), we generally have three options: the mode, the median and the mean. These will usually produce similar results, but they show different things. You can read more about the differences between these metrics here.

Some researchers (including many who lack formal training in statistical methods) may disagree, but I strongly believe that calculating the mean for data generated by a Likert item is a statistically flawed procedure. That is to say, a Likert scale cannot yield average values like, e.g., 2.09 or 2.39. The reason for this is because the data Likert items produce are not continuous.

Rather, Likert items produce **ordinal data**. This means that it is a safer choice to report their median. The median is the point that separates the upper and lower halves of the data if we arrange them from largest to smallest. You can read more about calculating the median in the post below.

## Estimating statistical significance

Sometimes, when analysing data, you may have to comment on the difference between two sets of responses. It is very unlikely that both sets will have identical central tendencies, but just stating that one is larger than the other is not very helpful. We also need to explain whether this difference means something, or whether it was just a product of chance. To do this, you need to estimate the if the difference is statistically significant.

To estimate statistical significance, you need two numbers: the **p-value** (a metric that shows how likely it is for your results to be due to chance), and a **cut-off value** (which indicates how certain you need to be). The p-value depends the difference between the items you are comparing, on the distribution of your results (which are provided in the student’s question above), and your sample size (which are not), so it is difficult to tell, on the basis of the information in the example, whether the findings are likely to be significant. It is a fairly complicated calculation, but if you’re using a statistical analysis package, the p-value will be automatically calculated and displayed in the output.

With regard to the cut-off point, that is something of a judgement call. Ronald Fischer, the statistician who came up with the concept of statistical significance, seemed happy to accept any *p* value under 0.05 as significant. This meant that results would beconsidered significant, if the probability of them being due to chance was smaller than 1/20. This is, however, an arbitrary decision, and conventions vary across fields. In medical research, for instance, a 5% probability of error would be unacceptable, so the cut-off thresholds are much lower (p<0.005, or even p<0.001). To decide what an appropriate threshold of statistical significance is for your study, you’d need to know the conventional cut-off points for your discipline. Published literature will provide some clues as to what is customary in your field.

When the p-value is smaller than the cut-off value, then the findings are significant. Normally, SPSS will flag statistically significant values in the output with one or more asterisks. If it hasn’t, then the results are likely inconclusive.

## Before you go

If you arrived at this page while preparing for one of your student projects, I wish you all the best with your work. There’s a range of social sharing buttons below, in case you feel like sharing this information among fellow students who might also find it useful. Also feel free to get in touch and ask any other questions you may have.

**Achilleas Kostoulas**

Achilleas Kostoulas is an applied linguist and language teacher educator. He teaches at the Department of Primary Education at the University of Thessaly, Greece. Previous academic affiliations include the University of Graz, Austria, and the University of Manchester, UK (which is also where he was awarded his PhD). He has extensive experience teaching research methods in the context of language teacher education.

**About this post: **This post was originally written on 19th November 2013 in response to a question that a student asked me. The post was last revised on 26th April 2023.

## Leave a Reply