**Welcome!** Chances are that you landed on this page looking for information on Likert scales and averages. If that is the case, you will probably be want to skip directly to the part of this post where I talk about a common mistake people make with ordinal data and mean values. You should also take a look at the list of additional resources.

This post has been prompted by an edited collection that I was recently asked to review. Substantive comments on the book have been published elsewhere; but what I want to do in this post, instead, is share some thoughts regarding a common statistical mistake and a common misconception about published works.

Specifically, what sparked my interest was one study in the collection, which used Likert scales to record participants’ attitudes towards a certain educational construct. Those who are not familiar with the fascinating minutiae of quantitative research can find a discussion of Likert scaling and ordinal data in the section that immediately follows. Those of you who are unlucky enough to have studied statistics may want to skip to the next section.

## Likert scales and ordinal data

### What are Likert scales?

A Likert-type question (or ‘item’) asks respondents to select one of several (usually four or five) responses that are ranked in order of strength. Here’s an example:

*Indicate what you think about the following statements using the scale below:**(1) Strongly* *Agree; (2) Agree; (3) Neither agree nor disagree; (4) Disagree; (5) Strongly Disagree*

a. Apples are rubbish | 1 | 2 | 3 | 4 | 5 |

b. Yoghurt is my favourite food | 1 | 2 | 3 | 4 | 5 |

c. Beans are evil | 1 | 2 | 3 | 4 | 5 |

d. Fish fingers and custard taste great | 1 | 2 | 3 | 4 | 5 |

Each of these items measures a *variable*, i.e., a construct about which we want to learn more. Sometimes, sets of similar items are dispersed in the same questionnaire. This helps researchers to probe different aspects of the same construct (or ‘*latent variable’*), by putting together information from all the related items. I will not go into any of this in more detail here, but if you want to find out more, this post has some additional information.

Likert scales are very frequently used to measure constructs like satisfaction rates, attitudes towards different things, and more. They are very flexible and very useful, provided you use them carefully.

### Interpreting Likert scales

Many researchers tend to use Likert scales to do things that they were never designed to do

Likert items and scales produce what we call ordinal data, i.e., data that can be ranked. In the example above, people who select response (1) to item (d) are more fond of fish fingers and custard than people who choose responses (2), (3), (4) and (5). People who choose response (2) like this snack more than those who choose responses (3), (4) and (5), and so on. In addition to being ranked, ordinal data can be tallied: for example, I might want to divide by sample by age group, count how many people chose each of the responses, and compare results across ages. This, however, is almost the extent of what one can do with such data.

The problem with Likert items is that many researchers –including the ones whose paper prompted this post– tend to use them in order to do things that they were never designed to do. Calculating average scores is one of them, and here’s why it’s wrong:

Imagine that ten survey participants were asked about their attitudes towards fish fingers and custard. The table below shows a hypothetical distribution of answers:

n | % | |
---|---|---|

Strongly agree | 1 | 10 |

Agree | 1 | 10 |

Neither agree or disagree | 3 | 30 |

Disagree | 2 | 20 |

Strongly disagree | 3 | 30 |

#### The wrong way to do it

If I were interested in knowing the beliefs of a ‘typical person’ (whatever that might be), then might be tempted to calculate a *mean score* for this data. ‘Mean’ is a technical word for ‘average’. To do this, I might use the following formula:

* [(number of people who selected response 1)*(weighting of response 1) + (number of people who selected response 2)*(weighting of response 2)… (number of people who selected response n)*(weighting of response n)] / (total number of respondents)*

In the example above, this would yield:

* (1*1)+(1*2)+(3*3)+(2*4)+(3*5)/10 = 3.5*

Going back to the descriptors, I would then ascertain that an ‘average’ response of 3.5 corresponds to something between ‘no opinion’ and ‘disagreement’. They would therefore pronounce something along the lines of: ‘Our study revealed mild disagreement regarding the palatability of fish and custard (*M*=3.5)’.

#### A better way

~~Plainly put~~, the option suggested above is ~~statistical nonsense~~ *not an optimal interpretation (update: I feel less strongly about this than I used to in 2013, but I still think it is usually wrong)*.

For this interpretation to be valid, I would need to make assumptions like the following:

- Firstly, I would be need to assume that the psychological distance between ‘strong agreement’ and ‘agreement’ is the same as that between ‘agreement’ and ‘no opinion’.
- A corollary of the above would be that the distance between ‘agreement’ and ‘strong disagreement’ is four times greater than that between ‘agreement’ and ‘strong agreement’.

The mathematical model needs these assumptions in order to work, but they are simply not in the questionnaire design. And even if we forced them into the questionnaire, that would constitute a gross distortion of psychological attitudes and the social world to fit our statistical mould.

Ordinal data cannot yield mean values. If you think they can, do so at your own risk.

To put it in the simplest terms possible: Ordinal data cannot yield mean values. If you think that they can (and some statistics guidance websites might encourage you to think so), you can still take your chances. But please make sure you justify your choice well when you write up your methods section.

A safer way forward, if you are interested in finding what the ‘average’ or ‘typical’ response is, is to look at the **median **response. The median is a type of average value, like the mean, except that it shows the number that is exactly in the middle of the data, i.e., at the same distance from the highest and lowest value in the dataset. You can find out more about how to calculate the median here.

### More to read about Likert scales

If you came to this page looking for information on Likert scales, you may find the following posts useful: Things you don’t know about Likert scales, and On Likert scales, levels of measurement and nuanced understandings. I also recommend reading this overview of Likert scales and this post by Stephen Westland (University of Leeds), for a more nuanced understanding of Likert scaling, and an excellent discussion of how to analyse the data that these scales produce.

## On the peer review process

As I wrote at the beginning of this post, one of the papers in the volume that I reviewed made the statistical mistake that I just described, namely it described a set of findings that had been generated by extracting mean values from Likert items. In the authors’ defence, they were neither the first nor the last to engage in this controversial practice: averaging ordinal data is as widespread as it is wrong. Unfortunately, this problem had gone unnoticed by the editors of the collection, and by the peer-reviewers employed by the press. As the book had already been published, I was left wondering whether there was anything to be gained by flagging it at this stage.

### What went wrong with peer review in this instance?

Readers often take it on faith that the people who conducted a study knew what they were doing. This faith is sometimes misplaced.

It is the nature of the peer-review process that the people who review academic articles can make intelligent substantive judgements on the findings, but might not always have the requisite background to comment on the research process (or visa versa) . For better or for worse, research methods are too diverse and too specialized for reviewers to have more than a passing acquaintance with most of them. In addition, there are limits to the time one can reasonably spend providing unpaid service to the profession, and these often preclude reading up on research methodology every time one comes across a novel research design.

Every now and then, reviewers have to take it on faith that the people who conducted a study knew what they were doing, and they must trust that there are no major flaws in the methods. So, rather than double checking on such matters, we tend to focus our feedback on more substantive aspects of the research (e.g., *Are the claims commensurate to the scope of the study? Do the findings add significantly to the existing body of knowledge?*). Mistakes in the methodology will, on occasion, slip by.

### What can you do if you come across mistakes in published research?

So the question I faced was: what should I do when asked to provide an informed opinion on the quality of a study that has a major flaw? This was not made any easier by the knowledge that the people responsible for finding this flaw had failed to spot it, or deemed it unimportant. So, in this case, I decided to let it pass. Besides, the findings of this particular study were inconclusive and broadly consistent with what was already known about the phenomenon in question. I therefore thought that there was little harm in having one more voice in the literature to add some more weak agreement to the prevailing views – even if the empirical evidence that informed this voice was not very strong.

If there is a take-home message from all this, it is that as a reader you should not put too much faith in the published literature. Just because something has made it to the printing press, it isn’t always right.

**About this post: **This post was originally written in 2013 for my blog (www.achilleaskostoulas.com). For reasons I do not fully understand, it has come to rank very highly on SERPs about Likert scale measurement. It has been revised several times since, with a view to making it more useful to readers who are looking about statistical advice. The last update was on February 2020. The featured image comes from The Leaf Project @ Flickr and is shared via a CC BY-SA 2.0 license.

Hi. I’m doing this experiment on liking and I used likert scale to gather data.

I only have one question. To be answered by this rating scale:

5-like very much

4-like much

3-moderately like

2-slightly like

1-not like

What am I going to do with the data? I’ve just tallied the results and I don’t know what to do next. Please help me.

PS. I used two groups, 20 respondents each.

Hi,

In my opinion, the best way to present a data set such as yours would be as a bar chart. This would help readers see how all the responses are distributed, without any loss of detail.

However, if you want to condense this information into a single number, you’ll have to use a measure of central tendency: the mode or the median. SPSS, or any equivalent statistical package, will generate both values for you, and I believe MS-Excel also contains a function for calculating these values. Even if you don’t have access to such software, your dataset is not too big, so it should be easy to calculate them manually.

The mode is the easiest to calculate: it’s just the option that most people chose. To calculate the median, you will have to arrange your responses in order of magnitude e.g.,

1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,4,4,5,5,5

Then you start crossing out one number from the end of the list and one from the beginning, and repeat until you are left with a single number (or two). That number is your median.

After describing the distribution of responses and their central tendency (mode/median), you would normally want to compare the two groups. If the responses are very different, you should then try to account for what might be causing the difference.

Hope that was of help.

Hi sir

I am using likert scale in a survey as

1=fully implemented

2=partially implemnted

and 3= not implemented yet

i want to analyse using the mean value. can you tell how to calculate the range for means to denote them to likert scale according to mean range?

The scale you are describing is not a Likert Scale. It is a simple ordinal scale, and as such it does not yield a mean. You can calculate a median value, following instructions found elsewhere in this blog.

sir then how to mark them as 1 2 or 3 based upon their median value?

Here is how to calculate the median: http://achilleaskostoulas.com/2014/02/23/how-to-interpret-ordinal-data/ Note that it is not necessary to replace verbal descriptors with numbers.

i am very thankful to you sir thanku very much.

plz tell me the difference b/w likert scale and ordinal scale as i have to defend myself and also share how to calculate ranges for means of a likert scale…. thnx alot

Pls, reveal the proper way of doing the analysis then. I don’t mind having it in my inbox. Thanks

There are several statistical procedures you may want to use, depending on your research questions. If you need a measure of central tendency, you will have to calculate the median. You can refer to previous comments for instructions and examples of how to calculate that.

Hello, my case is a bit different: I applied a 4-item scale anchored from 1(strongly disagree) to 5(strongly agree) to evaluate a certain variable (eg. attitude toward drug consumption) in an experiment with 80 subjects. Have I made a mistake by calculating the mean of answers of every single subject for that scale in order to summarize the data and create a macrovariable “attitude”?

Say Peter´s answers were, 2, 3, 4, 2. Peter´s mean for the variable attitude: 2.75. Can I work with that having in mind that the objective of the study is to observe relationships between different variables (use ANOVA; correlations, etc)? Thanks in advanced.

Hi Milena,

I’d hesitate to answer without more context. My feeling is that for this kind of calculation to work, you need to assume that the ‘distance’ between “strongly disagree” and “disagree” is the same as the one between “disagree” and “not sure”, and so on. If you can convincingly argue that this is how your respondents interpreted the scale (perhaps the wording of the question encouraged them to think so?), the mean attitude might make some sense.

Otherwise, I think it would be preferable to use the median in each set of scales: So, to use your example, if Peter’s answers were 4, 3, 2, 2, the median would be 2.5. In most cases you will find that the difference is small in practice, but -in my opinion- it is theoretically important.

If re-calculating the variables is no longer an option, I think that the best way forward would be to acknowledge in the write-up that this was not the statistically optimal treatment, and then go on to argue that it is nevertheless a robust procedure that will generate plausible results, even when its underlying assumptions are violated.

Thank you very much for your answer. I wish you all the best in your further research.

Hi, I have conducted a survey to see what influences people in choosing a destination… One of the questions is a likert scale to measure likeliness in visiting their dream destination, of 10 points, with 1 = not likely at all, and 10 = very likely (i have not given a definition for the points in between)… does this mean that i can calculate the ‘mean’? what statistics should i calculate pls? thanks :)

Hi Joanne,

The short answer is, yes, you can calculate the mean. Here’s a somewhat longer one, which explains why:

To begin with, I am not sure I’d call this type of item a ‘Likert’ item. A key difference is that Likert items are bivalent (i.e., they extend in two directions) and symmetrical, which is not the case with your item. Moreover, the ranks of a Likert scale could extend infinitely on both directions, which is not the case here. What you have is a rating scale that measures the likelihood of something happening.

The way I understand it, likelihood or probability can, in principle at least, be measured on a continuous cline ranging from ‘impossible’ (0%) to ‘certain’ (100%). Your scale reflects this, except that for reasons of simplicity, you sensibly asked respondents to position themselves on a scale with 10 equidistant (evenly spread out) anchors. This type of rating scale generates what we call ‘interval’ data. These data are just like the ones that Likert-type items produce, with the added advantage that -by virtue of being evenly distributed- they can be used in more sophisticated ways. This includes reducing them to a mean value, and calculating the standard deviation to how widely the responses are spread around the mean.

Hello, Achilleas!

Thank you for a useful and informative post (which I’ve just discovered) – though I disagree with your strong view of Likert scales always eliciting ordinal data.

*The* problem is assuming that the interval between two adjacent response options is always the same. This doesn’t make sense when labelling all the options, as this clearly makes the data ordinal (or nominal). However, if only the first and the last response options are labelled and the respondent is asked for the strength of their reported opinions/ feelings (e.g., on a scale of 1-6, where 1=very bad and 6=very good), then the intervals can be assumed to be equal.

I am not hoping to persuade you – I just think it is fair that this alternative point of view is added to this discussion.

Copy-pasting below something I wrote about this a while ago, with some references:

‘There has been some controversy regarding the nature of the data produced by self-reported scales, these being considered a grey area between ordinal and continuous variables (Field, 2009; Kinnear & Gray, 2008). Although attitudes and feelings cannot be measured with the same precision of pure scientific variables, it is generally accepted in the social sciences that self-reported data can be regarded as continuous (interval) and used in parametric statistics (Agresti & Finlay, 1997; Pallant, 2007; Sharma, 1996). […] Blunch (2008, p. 83) maintains that treating self-reported scales as interval/ continuous variables is most realistic if the scales have at least 5 possible values and the variable distribution is “nearly normal”.’

Agresti, A., & Finlay, B. (1997). Statistical methods for the social sciences (3rd ed.). Upper Saddle River, NJ: Pearson Education.

Blunch, N. J. (2008). Introduction to structural equation modelling using SPSS and AMOS. London: Sage.

Field, A. (2009). Discovering statistics using SPSS (3rd ed.). London: Sage.

Kinnear, P. R., & Gray, C. D. (2008). SPSS 15 made simple. Hove: Psychology Press.

Pallant, J. (2007). SPSS survival manual: A step by step guide to data analysis using SPSS for Windows (3rd ed.). Maidenhead: Open University Press.

Sharma, S. (1996). Applied multivariate techniques. New York: Wiley.

Hi there!

Thank you for this great post and for the effort you make to reply to all of the questions. I used a 3- point Likret scale (agree, disagree, uncertain) to measure attitudes towards writing. I feel that my data should be nominal. Is that correct? Another thing, it gets me confused when I try to interpret the mean as some studies treat the mean of 2.3 as uncertain.

Nominal/categorical data are data that are categorically discrete (e.g., baby names, places of birth etc.). You could treat your data as nominal, if you want, but an argument can be made that these responses can be ranked: someone who ‘agrees’ has stronger beliefs than someone who is ‘uncertain’. If you treat the data as nominal, the only measure of central tendency you can use is the mode (i.e., you can only report which answer is the most common). If you treat them as ordinal data, then you can also calculate the median, which is a slightly more refined measure of central tendency. Either way, I would not recommend calculating a mean. The studies you mention seem to calculate a mean for these data, and -unless they have a good rationale for doing so- may be statistically suspect.

sir, my questionnaire is a four point scale-how will i interpret the data.what should i’m going find of???

hi sir, i am doing the research with five point likert scales. may I noe Likert scales is consider as continuous scale or categorical scales? As I need to know use what test suitable to compare my demographic profile of respondents with my dependant variables? Thanks sir

Hi,

You can find the information you asked for in the post above, or in this post.

I have devoloped statements of likert scale i.e 14 statements to be administed on 100 teachers .please tell me how to present..

Bar charts work well.

Respected sir ,Hi thank you sir for your response,still I want to know few more things ,I am student and fresh in research work.I have administered Liker t scale to 100 teachers .I have collected data.Please tell me how to measure median and mode,and how to express variability in terms of the range and inter quartile range.And how to display data in a dotplot.

Is this helpful?

Yes sir ,It will be helpful for me as I am doing MED.That information will be help me to complete my dissertation.

best of luck!

hi sir.im conducting a survey for my final paper.Im using two questionnaires-one is a three point likert scale and the other one is a five point likert scale.My study is about learning styles and language learning strategies.How am i supposed to find out the extent of relationship between these two variables and the extent by which my subjects utilize these variables?

Many thanks sir.

Spearman’s rank correlation can show whether there is a relationship between the two variables.

Hello sir, I wanted to ask on how to key in the data for my research. You see, my group are doing research about city bus users’ experiences on the safety and technical perspective. We are using likert scale with two different categories of questions which is safety and technical itself, where each category has its own questions. Such as, ‘the bus condition is clean’ as a part of technical and ‘I’ve ever been harassed in the bus’ as a part of safety. So how do I calculate from the respondents’ answers and how do I key in in the SPSS? Thank you.

Hello Nadia,

I am not sure I understand your question. You should normally define a different variable for each questionnaire item. SPSS will automatically name the variables for you, but you can use names such as SAF01, SAF02, TEC01, TEC02, etc. for convenience. Each variable should have several values (one for each possible answer, such as ‘strongly agree’, ‘agree’, and so on). You can use numbers to correspond to these values (e.g. Strongly Agree = 1, Agree = 2 etc.). Then you write in the answers given by each participant. That is, Questionnaire 1, has replied “1” to SAF01, 3 to SAF02 etc. That should take care of keying in.

To answer “how you calculate”, I would need to know what you are trying to find out. It is your research questions, not the data, that drive the methods.

Hello sir, I am working on my paper which aims to determine the level of students’ awareness on disaster risk reduction (specifically in the following natural disasters: earthquake, fire, flood, and typhoon). A 48-item test was made to determine their level of awareness. It is composed of 12 items per disaster (3 questions for earthquake prevention and mitigation, 3 questions for earthquake preparedness, 3 questions for earthquake response and 3 questions for earthquake recovery and rehabilitation). This would sum up to an item of 48 questions.There are 167 students as my respondents (students from different grade levels Grade 5,6,7,8,9). Please help me on how will I used the data collected to interpret the level of awareness. The data I gathered were the scores of the students per grade level and the number of errors per item. Would it be appropriate to use likert scale on the level of awareness (1-not at all aware, 2-slightly aware, 3-moderately aware, 4-very aware,5-extremely aware) and assign the range of scores per level of awareness or do I need to use a statistical tool to interpret my data. Please shed a light to my concern. thank you and more power.

Hi Helen,

If I understand correctly, you have two questions: (a) what kinds of statistical procedures you need to run in order to analyse your data, and (b) how to do these procedures (i.e., what ‘statistical tool’ to use).

The methods of analysis you should use depend on your research questions, i.e., what you are interested in finding out. I am afraid that I cannot answer your question without knowing what the research questions are. However, you will likely need to estimate the central tendency and dispersion for each item: you can find some ideas here. You might also need to compare responses given by different groups of your sample, e.g. responses pertaining to earthquakes against responses pertaining to fire. This is done by crosstabulating: creating a contingency table where one variable (e.g., Grade) forms the rows and another (e.g., questionaire item) forms the columns. Ideally, you should also run a test like x-square to determine the statistical significance of your findings, but it seems that you have too many contigencies and the sample is not large enough for that. It might also be possible to conflate some of the items to reduce the number of contingencies, but I can’t advise you about that without knowing how your questionaire items were worded.

As for the latter question, a statistical package such as SPSS could make your work much easier, but if you don’t have access to that, many of the procedures can be done with widely available packages, like Excel. There is some advice about that in this post.

My sincere gratitude for your quick reply to my concern. I am interested to know the level of awareness of my students with regards to earthquake, fire, flood and typhoon along the four thematic areas : prevention and mitigation, preparedness, response, and recovery and rehabilitation.

Specifically this is the statement of my research

The study seeks to analyze the integration of risk reduction in the science curriculum along prevention and mitigation, preparedness, response, recovery and rehabilitation.

Specifically this study seeks to:

1. Determine the level of awareness of students on disaster risk reduction

2. Identify the integration of disaster risk reduction concepts in the Science Curriculum along the four thematic areas.

3. Identify the strategies in integrating disaster risk reduction as utilized by the teachers.

The first question will be answered using the 48 item questionnaire (multiple choice type of test with the table of specification).

Estimating the central tendency (median) and dispersion (interquartile range) is a good start.

Great article on ordinal data. Learnt quite a lot.

Thanks for saying so! Appreciated!

I thoroughly enjoyed your post,

I would like to know if you have some reference that I could use, in special that mention the formula that you used to calculate a mean score?

Thanks Erica! I haven’t used a reference because I was writing from memory (In the post, I only intended to mention statistics in passing). That said, I think that most statistics books will have some reference on how to calculate the mean (e.g., Muijs 2004: 99). You will probably find it harder to find a reference about calculating the mean for likert scales in particular, because it is considered by many statisticians to be a poor practice. This post contains some references to statistics books where it is argued that you can, in some cases, calculate the mean – I am not sure whether any of them provides a formula though.

Hello Achilleas,

Thanks for your quick answer. I’m doing a survey in my area of research. I used the Likert Scale method to formulate the questions. But I’m having difficulty finding the best way to analyze the responses. I’ll look the references suggested. Thank you for your time.

Best regards,

Hi.. thank you so much for your post.. it is extremely helpful..

I just have one question..

in case my likert has both positive and negative statment .. how should I calculate total score?

You reverse the coding for the negative statement(s). So for instance, if ‘strongly agree’ was coded 5, you make it 1; ‘agree’ becomes 2, and so on. Thanks for the kind comment!

Hi Sir,

I used a survey to collect data on a 5-point Likert scale. my research operating model is based on Structural Equation Modelling (SEM). I am planning to use SPSS AMOS to do the analysis. The question is, how can I use my Likert scale data in the AMOS? Please help.

Thansk

I am sorry, I do not provide AMOS tutorials in this space.

How are you Achlleas? i cinduct my research work in the topic of corporate social responsibility of the hotels. the aim is to identify the corporate social responsibility practices of the hotels. i use five point likert scale (yes, occutionnally, no, dontknow and not applicable and five point strongly agree… strongly dis agree ). i want to use mean point to identify the activities in cut off point . thus. it is possible to use to reject and acdept the mean points below and above. thanks in advance .

Hi Tesfaye. A few points:

1. The first scale you describe is not a Likert scale.

2. Both scales produce ordinal data, so calculating their mean is problematic. You should use the median instead.

3. I am not sure I understand what you mean by cut-off point.

Cut off point means to identify the activities of the hotels based on the responses,add the five scales (5+4+3+2+1÷5=3).BELOW 3 taken as negative and rejected above 3 teken as positive and accpted .based on this i identify the activities of the hotel corporate activities. Thank you. There is olso one more question. Very larg level, large level, nolevel, low-level, very low level questions considered a likert scale questions. Thanks very much. I like your comments

Sent from Samsung Mobile

No, it’s not. It is a scale, but not a Likert scale. Likert scales are symmetrical, around a central point.

Hi. I am using an eating behaviour questionnaire with 1-5 Likert response scale. The questionnaire has 3 subscales measuring 3 dimensions of eating behaviour. Can I use the mean scores for each subscale for further analysis? I am unable to use the total of scores of each subscale as there are missing data points in some questionnaires. Also the distribution of the responses (mean scores) for each subscale are highly scewed. I have a sample size of 500+ . Can I use Pearsons correlation or do I have to go with a Spearmans? Thanks

The first problem to address is the missing data. These have to be filled in. There are several ways to do this (e.g., insert the mode, use a central value as a default answer), but if you are working on an academic paper you must document how you addressed this problem in the methods section of your paper.

Regarding the other questions:

– It’s best not to use the mean.

– I do not know why the responses are skewed, and cannot advise you further without consulting your data, questionnaire and research questions.

– It depends on what part of the data you are using.

Thanks for the advice. On the missing data, when you say insert a central value do you mean the median? Can I insert the mode/central values in SPSS? Also would it be wrong to replace the missing values with the ‘Persons mean’ or ‘Item mean’ as suggested by Downey & King 1998.Also how about multiple imputation? (sorry about the string of questions!).

I used the means scores in analysis as this was the method used in papers by the Original author of the questionnaire as well ( The dutch eating behaviour questionnaire).I presume their data followed a normal distribution. Im my case the data is scewed as many responses were low values (1 or 2) on the likert scale for certain eating behaviour subscales.

Inserting the median value for each item (the item median) is one option. Another one is inserting the median value for each person, assuming that your scale is cohesive. Or you might argue that participants who didn’t answer have no strong views either way, so you would insert the central value (i.e., 3 in a five-point scale). Using Person and Item means are also reasonable options, but I would need to know more about your data before recommending either, and I am also rather wary about using means in ordinal data. Overall, there is no single ‘right’ method. Rather, there are advantages and disadvantages associated with each alternative, but going into them would involve an extended discussion, which is outside the scope of a blog comment.

It is possible that the data in the original paper were normally distributed, in which case an argument can be made for using parametric methods, but it seems like a bad idea in your case.

Gud eve sir. how should I use the 8-point Likert scale to compute the results of my questionnaire taken from Mohammadi’s research? The questionnaire consisted of 27 items which asked learners to rate their replies on an 8- point Likert scale ranging from “Strongly Agree = 8” to “Strongly Disagree = 0”. It assessed learners’ views towards second language learning on five areas, namely self-image, inhibition, risk-taking, ego-permeability and ambiguity tolerance. thank you.

I am afraid that I am not familiar with the research you are describing, so I cannot comment on it. The way you will use your data depends on your research questions – not your instrument, i.e. it depends on what you are trying to find out with your research.

Hi Achilleas

This blog post has really sorted most of my concerns regarding data analysis. But I still have a few queries, and the list is a little bit expansive.

I am doing a research on perception of credibility of online information and news.

1. I am asking respondents the degree with which they agree to a few statements, viz. “online information is believable” on a scale of 1-10, where 1 means not believable at all, and 10 means 100% believable. There are four similar statements, and their mean will make up the credibility index.

So should I consider it as a simple rating scale or as a likert scale? And if it is a likert scale, should the analysis be done by taking the median?

Previous researches have used mean to calculate the same.

2. There is another series of questions on a 1-10 scale, where I ask respondents to tell, “how often do they do the following online: “check information’s author”, “check author’s credentials”, etc.”

So how to analyse this set of data? My inclination is towards calculating a mean of this set of data.

Thanks in advance

Thanks for your kind comments. The scales you’re describing are not Likert scales, and the data they produce is continuous (or ‘scale’). You can use the mean in these cases.

Thanks. Solves the problem completely.

[…] agree nor disagree based on a simple math that the mean of two 2s and one 5, (2+2+5)/3, is 3. Mr. Achilleas Kostoulas explains is in more details in his blog. Many others point out the limitations of Likert scale, in […]

i have 16 five scale likert questions for one dependent variable and how to complied this 16 questions and 9 independent variable also . which type of correction the best method to see relationship?

I am sorry, I am afraid I cannot answer this question because it’s too vague. To be able to give you useful advice I’d need to know more about your research questions, your sample and your data. However, you might find some help in this post.

The output looks ok. If you go back to your data, you will see a new variable has been created. You can run descriptive statistics (e.g., calcucate the median) on the new variable.

Very well written. I have trouble making people believed that what they did in using average or weighted average for likert scale is not appropriate. They just mentioned that their Universities lecturer taught them to do it this way. I wonder whether I should write to the university and tell them that they are wrong.

Hi Achilleas,

I did a survey about retargeting and want to investigate the influence of cookie knowledge and privacy concern on the attitude toward retargeting. I used one true/false scale (about cookie knowledge) and 4 likert scales (about attitude toward retargeting – 6 questions, 5 point items -, about privacy concern – 5 questions, 7 point items -, about attitude toward advertising in general – 9 questions, 5 point items – and about persuasion knowledge – 6 questions, 5 point items -).

I already put them into SPSS and measured the median of each scale. Now I would like to investigate if a higher cookie knowledge means a higher attitude toward retargeting and if age has an influence on this (e.g. do younger people have a higher cookie knowledge and this a higher or lower attitude toward retargeting).

I don’t know how to look into this the right way and what kind of analysis I need to use to get the right information. I hope you are able to help me.

Thanks in advance.

Greetings, Karen A.

Hi Karen,

One way to approach this problem would be to run a cross-tabulation. This would compare the actual responses of people in the ‘yes’ and ‘no’ categories, against what they might have responded if cookie knowledge did not influence responses. You can find instructions on how to do this here. You can confirm whether the difference is statistically significant by running a test called ‘chi-square’. To do this, you need to check the ‘chi-square’ box in the ‘Cross-tabs: statistics’ dialogue box (the second one in the webpage that I linked). Best of luck with your project.

Hello, thank you for your life-saving post! I tried to combine data for some questionnaire questions and summarize my data using the information above, but the median value is like 2.5 or 1.5 for same items. Mine is quantitative and 5 Likert scale data. What should I do now? How should I interpret my data?

Thanks for your nice comments. Having a decimal in the median is not unusual, and you shouldn’t worry about that. Now, as for interpreting the data, it’s difficult to give advice without knowing more about your project and dataset.

The 2.5 median seems to suggest a balanced set of opinions. This could be because most people answered near the ‘centre’ or because responses were evenly split between very positive and very negative ones. The IQR could give you a clue about that.

Thank you so much!

hi sir archilleas!

You’re a great help to all of us seeking for an answer to our questions.

i need ur ideas on the tools to be used in Experimental Research entitled: ANTI-PRURITIC ACTIVITY OF PLUMERIA ACUMINATA (KALACHUCHI) BARK LATEX IN ALBINO MICE WITH INDUCED LOCAL PRURITUS.

there are 5 treatment groups (6 mice each group) are compared, 2 controlled groups (positive controlled and negative controlled) and 3 experimental groups (50%, 75%, 100% solution). I want to find if there’s a significant difference between 2 groups and also among 5 groups. Should i go for parametric stat or nonparametric stat.

Considering, 4-point Likert Scale is used (0-no pruritus, 1-mild but not causing impairment, 2-moderate causing impairment, 3-severe causing sleepness nights) which is ordinal.

Another 4-point scale is used (0-complete relief, 1-significant improvement, 2-mild improvement, 3-no improvement).

Pls enlighten me.

Thank u in advance and more power!

Hi! I just wanted to point out that the scale you are describing is not a Likert scale, so the caveats I’ve discussed in my post do not necessarily apply. Rather, this is an ordinal (or arguably interval) scale.

Your choice of statistical methods (parametric or non-parametric) will depend on whether you consider the scale to be ordinal or interval; it will also depend on the distribution of your data. In case of doubt, if go for non-parametric procedures, just to be on the safe side.

Hi Achilleas,

please I need your opinion on this. I conducted a research involving 668 respondents using questionnaire with modified 4 – point likert scale. The questionnaire contained 50 items arranged into 10 sections. I analysed my data using frequency count and percentages which to me shows more clarity. But an argument came up that the mean and standard deviation was the appropriate technique for the kind of data I collected. Please, what’s your opinion? Also, what do you think about 4-point likert scale? Thanks.

1. The mean and standard deviation could be argued to be appropriate, as long as your ten scales are internally consistent (i.e., have a high Cronbach alpha), and, ideally, if there responses are distributed normally. In such a case, these measures would give you a more refined picture of your data, compared to modes and IQRs. I would still argue that, on theoretical grounds, it’s still wrong to use such measures with ordinal data, but there are convincing arguments (including Liket’s own thesis) that a well-crafted scale (i.e., a composite of multiple items) will produce data that behave as if they are interval/continuous – so it makes practical sense.

2. Personally, I prefer using 4-point scales, because they make a person choose a positive or negative response (they are sometimes called forced-choice scales). As a result, your data are less likely to show the effect of ‘central tendency bias’, i.e., the tendency among respondents to select the ‘neutral’ response. A possible counter-argument is that doing so will lose some of the granularity of the scale. So, I guess, this is not a question of choosing a ‘right’ or ‘wrong’ method, but rather the method that is a better fit to your research needs.

Thanks for your prompt response, however based on your submission, theoretically speaking it is better to stick to the rules of analysing ordinal data. My worry is that I might not be able to present my report if analysed with mean and standard deviation as well as when I am using percentage and frequency. In conclusion, what are my going to loose if I stick with my frequency and percentage.

Thank you.

Dear Achilleas

You are absolutely amazing you have already saved so much of my time with some of your answers already.

I have one more small question left though, I had an employee engagement survey contained 12 factors and each factor was involving 3 items(over all 36 items). they were measured on 1 to 6-point scale. I am conducting a regression analysis and I would like to know how to find the value for the each factor. For instance, Vigor was measured with 3 items i.e: I feel strong and vigorous when I’m studying (Vigor1, vigor2 and vigor3) and what is the best way to calculate the value for Vigor.

Thank you so much for your time.

Kind regards

Thanks for your message and kind words, Jack. I am afraid I can’t really answer this question without knowing more about your data. But do have a look at this post, which may be of help: http://achilleaskostoulas.com/2014/12/15/how-to-summarise-likert-scale-data-using-spss/

Hi

I have gathered information on a 7-point Likert Scale. Now, I like to add few questions under one of the sub category. When I did that by using average, I get results like 2.2, 3.6, 4.21 etc.., moreover descriptive results of such sub category does not give results against 1-7 point satisfaction scale.

I would be grateful if any one help me in this context.

Hello Iftikhar. I have taken the liberty to edit the all caps in your question, which some people might construe as a sign of rudeness or, at the very least, indifference to the readers. I cannot help you with your question, because I do not understand what you are asking. Sorry.

Hello Achilleas…

please help me to find solution which i have faced while doing my research about impact on employee engagement on turnover intention. There are positive and negative statements in my questionnaire under turnover intention variable which measure responces using five point Likert scale..

How could i use these data when analyzing using spss

Hello Erandi,

There are several suggestions in this comment thread, which might give you some ideas about how to go about analysing your data. You may also want to look at the following posts:

http://achilleaskostoulas.com/2014/02/23/how-to-interpret-ordinal-data/

http://achilleaskostoulas.com/2014/12/15/how-to-summarise-likert-scale-data-using-spss/

I am afraid that I cannot be more specific, as I do not have information about your research questions or the dataset. However, I hope it is of some help.

how will i compute my mean if my likert has range?

example my likert is:

II. Level of Entrepreneurial Experience

Point Value Statistical Range Descriptive Rating

5 4.20 – 5.00 Excellent

4 3.40 – 4.19 Very Good

3 2.60 – 3.39 Fair

2 1.80- 2.59 Poor

1 1.00 – 1.79 Needs Improvement

Here is my data:

Frequency %

41 0.1496

105 0.3832

124 0.4526

3 0.0109

1 0.0036

Total is=274

How Will i compute my mean for that data?.Thanks

If you must compute the mean, which I strongly recommend against, you can use the formula described in the post.

[…] This post has been prompted by an edited collection that I was recently asked to review. Substantive comments on the book will be published elsewhere, so you may want to watch this space for update… […]

Hello :)

My purpose of using a 5-point likert scale (5=strongly agree, 4=agree, 3=not sure, 2=disagree, 1=strongly disagree) is to know which students have high motivation and which have low motivation. The scale described that the higher the score, the higher the motivation is. It consists of 30 items.

If I follow the nature of the 5 point likert scale, the range for qualitative interpretation is…

1.00-1.80 very low

1.81-2.60 low

2.61-3.40 average

3.41-4.20 high

4.21-5.00 very high

Since I used an established inventory, I followed the 5-point likert scale instead of changing it to a 4-point likert scale which could have made this easier since i would not have a middle or average range.

For my study, I am more interested in classifying the students into 2 groups and not 5 groups. HERE IS MY QUESTION: Would it be best to just follow the 5 groupings or is it ok if I make it 2 groups with ranges 1.00-3.00 as low and 3.01-5.00 as high??

My study only involves 14 students and this step of classifying them into 2 groups based on their level of motivation is just the first step in my data analysis. Thanks

If you are using an established inventory, it should come with instructions suggesting possible ways to interpret the data. Prima facie, I see no statistical reason why not to use two groupings rather than five, if it makes more sense in your study. The obvious downside is that it will make it harder to compare results with other researchers who have used the scale.

My main concern is that I am not at all sure you should be calculating a mean. My take is that these are ordinal data, and should not be subjected to this kind of analysis.

please help me hot compute pearson corretaltion of four point likert scale variable and students final grade of students in mathematics.thank you.

There is some information on Pearson correlations here.

hi, i want to find out the significant relationship between English language teachers methodology and students’ performance, what statistical tests I should use for it

None. It is a meaningless research question; and if it could be answered by anything as crude as statistics, it would have been answered anyway.

It is sad how you misinform people. First, calculating the mean of ordinal variables is not as problematic as you think, you can even run parametric tests on ordinal likert scales. Please check the literature (one place to start with some nice references is Norman 2010). Second, the research question by nazir Bano is perfectly valid. If you want to study the effect in real life I would use mixed efffects modelling and data from standardised tests (like PISA) combined with observations and self-reports by the teacher..

Hello, and thank you for your views.

You may find that I have referred to Norman (2010) elsewhere in this blog, and while I think it is a reasonably well-argued line of thinking, I am not sure how it contradicts what I have written here. Norman argues that parametric procedures are robust enough to withstand violations of their assumptions. This does not mean that such procedures are optimal practice, and Norman does not make this argument anyway.

As regards the second part of your comment, you seem to be missing that the question has no operational definitions of ‘effectiveness’ or ‘method’, it assumes that methods are applied consistently across time, and presupposes somehow controlling for any learning that takes place outside the classroom, among other problems. If you think you can answer such a question in any meaningful way, I suggest that you put your money where your mouth is and answer it.

Hi Sir, can use mean using this scale

5 – Very Much Significant

4 – Significant

3 – Somewhat Significant

2 – Less Significant

1 – Not Significant

Νο.

What can you suggest, Sir?

Can’t I use mean even though I will use description?

No, calculating the mean in this ordinal scale is statistically wrong. You can use the median and mode, if you want a measure of central tendency; you can also present frequency distributions for every response.

Hi, Sir. I did not have any conflict understanding your discussion about Likert Scale. I just want to leave this comment for you are cool- that you answer the queries of my co-readers and co-researchers. Thank you, Sir. May God bless you and may you help more more people then on.

Hi Achelleas,

Thank you for your tutorial on calculating mean of ordinal data. I was looking for literature to support my advice to my student (that the mean score and standard deviation he calculated for a Likert scale had no meaning in the context of his study) when I came across your blog.

He surveyed 192 patients concerning quality of hospital service. The responses were: Strongly agree = 42; Agree = 104; Neutral =32; Disagree = 14, Strongly Disagree = 0. He calculated the mean of responses as 2.09 and Standard deviation as 0.82.

Obviously, the mean and SD makes no meaning in interpreting the results. AS suggested in your blogs, I will advise the student to use median and mode to explain the central tendency of the responses.

Hi! You have stated that “The mathematical model needs these assumptions in order to work, but they are simply not in the questionnaire design. And even if we forced them into the questionnaire, that would constitute a gross distortion of psychological attitudes and the social world to fit our statistical mould.” Can you kindly elaborate on this or direct me to the appropriate resources on which these statements are based? Thanks.

Hi! If you follow the links to the posts provided, I believe you will find the information you need. This post in particular (https://achilleaskostoulas.com/2013/09/09/four-things-you-probably-didnt-know-about-likert-scales/) lists a reasonably good collection of literature that you can consult.

say I have 31 responses and say responses were (5,1,8,14,3) for a scale of 1-5

1- very poor

2- poor

3- good

4- very good

5- excellent

how do I do mean score ranking?

You’d only do mean score ranking if you thought that the anchor points are equidistant, i.e., something ‘poor’ is 20% the quality of something ‘excellent’. If you make this assumption, and I don’t think you should make it, then you can use the formula described in the post, which produces a mean value of 3.94

can I ask where to find the explanation about this scales?

I am not sure I understand what you are asking. However, the following post has some resources and bibliography which may be useful. https://achilleaskostoulas.com/2013/09/09/four-things-you-probably-didnt-know-about-likert-scales/

what will be a sample size of a population of 550

That depends on the confidence interval and confidence level you need. It could be anything, really, from 15 (+- 25, 95%) to 532 (+-1, 99%).

hi.

i used median to describe responses to a 4 point scale. What if I get a decimal number?

thanks for the response.

You might get a decimal number (x.5) if you have an even number of responses. In that case, you just report that number, that’s not considered a problem.

Hello, Im testing students attitudes towards English. I have 205 respondents and 22 items using the ikert scale.

5-strongly agree

4- Agree

3-dont’t know

2-disagree

1- strongly disagree

im grouping these statements according to 4 hypotheses. What is the best way to analyze the data ? shall i juste descrive the distribution of each category by combining SA/agree VS SD/D and i don’t know for each items or i should group the statements for every hypothesis together and calculate how many people agree with all the these questions ? etc

Well, that would depend a lot on what you’re trying to find out, wouldn’t it? I don’t think I can answer your question without knowing your research questions, but maybe this page can get you started: https://achilleaskostoulas.com/2014/02/23/how-to-interpret-ordinal-data/

Thank you so much for your quick answers§: I will be dividing the statements according to 4 hypothesis. My variable will be age, field of study and year of study.

Yes, I understand that, but it is still unclear what you are trying to do: Are you trying to describe a population or test whether something is true? It’s difficult to be specific without knowing more about your research questions or sample, and I really think your supervisor should be able to give you better advice than I can.

Very broadly, if you’re trying to confirm/disprove a hypothesis, then you need to crosscheck how the responses to the Likert items (dependent variables) map out against the other variables like age, field of study and year of study (independent variables). You can do this by grouping your items into one or more multi-item scales (assuming they are cohesive enough) and running a t-test. Alternatively you could run a cross-tab and chi square test with your independent variables and each item.

Im trying to find out if the atttitudes of the respondents is positive ort negative and if the three variables i mentioned have an effect on that. What do you mean by cohesive enough.. I tried the alpha test for internal reliability but for some sub scales i get an alpha <0.7

That’s unfortunate, as it means that the scale items do not really measure the same thing. Perhaps you can try again, after you remove selected items from the scales.

You could condense the responses to ‘positive’/’negative’ and do item-by-item cross-tabs, but I suspect that the low number of respondents and the large number of variables will prevent you from getting anything really conclusive.

I have 205 respondents in total, that’s low ?, do you think it is better if i combine more than one variable for comparison say Field of study and gender, field of study and year of study ?

is an alpfa of 0.6 good ?

Not really.

Kind of, depending on what statistical procedures you plan to do. If you do a cross tabs and chi-square with multiple values for each variable, it may throw your statistics off. In that case you’re probably better off combining values (not variables!) If you’re just doing descriptives, it should be ok.

I am sorry i did not understand you very well here. what do you mean by combining values ?

What I ended up doing is normal descriptive analysis for each statement by showing the frequency of every answer for each statement. When commenting on the results i combine SA with A Vs SD,D and i have a third category neutral. Ill treat every variable seperately for eaach statement.

Sounds OK. Good luck with your project!

Hello sir,

I am currently drafting my graduate thesis on knowledge management (KM) and climate resilience (KM). The main objective of my research is to look at the relationship between these two concepts/constructs.

I have identified four variables for KM: gathering, storing, sharing, use; and 3 variables for resilience: buffer capacity, self-org, and learning capacity.

The items for KM are constructed using Likert Scale:

1 – strongly disagree; 2 – disagree; 3 – neither agree nor disagree; 4 – agree; 5 – strongly agree

I understand that above is in ordinal level and therefore should use only ordinal level analysis like mode, median, etc.

My question now is regarding the items I have for resilience. Each item/question is stated this way:

For buffer capacity:

1.) Educational attainment

At least college graduate – 5

College level – 4

High school graduate – 3

High school level – 2

Elementary level – 1

No formal education – 0

2.) Access to basic services:

Basic services are very accessible – 5

Basic services are accessible – 4

Basic services are moderately accessible – 3

Basic services are slightly accessible – 2

Basic services are not accessible – 1

Is this still in ordinal level?

Also, what if I ask them to rank the climate risks in the community from 1 – 5 (5 being the frequent climate risk in the community, 1 being low climate risk):

a. Drought

b. Flood

c. Landslide

d. Earthquake

e. Storm Surge

Is this ordinal or interval/ratio?

Thank you sir.

Hi! Scales (1) and (2) are ordinal, because you can meaningfully place the values in a row from ‘least’ to ‘most’.

The final scale seems different: I assume that a drought is not ‘more’ or ‘less’ of a risk than an earthquake, just different. These scales are called ‘nominal’ or ‘categorical’. In this case, your analysis options are more limited: you can only use frequency counts and the mode as a measure of central tendency.

Thank you very much for the prompt response, sir. I just have a follow-up question, is it okay to have 0 in the scale? I read from sources that having 0 means “missing data”, if this is so, can I use 1 instead of 0 for “no formal education”?

And should the scales for both KM and climate resilience be equal, like should they both have a 5-point scale or can I use 5-point for KM, then 6-point for climate resilience statement?

Thank you very much.

Sure, you can use 0 to code missing data, as long as you make sure you don’t accidentally use it in your calculations.

Using similar scales with similar structures is not really necessary, but it will probably make your life easier.

Best of luck with your project!

Thank you very much, sir! I appreciate your response. I’d probably have more questions as I go on writing the paper, but thank you so much!

I am having 355 sample size and want to compare the reasons of visit to a fast food outlet. I have taken 10 reasons e.g. during travelling, for dinner, for spending time with friends and family etc. I have collected the data on a 5 point likert scale with 1 as never, 2 as rarely, 3 as sometimes, 4 as often and 5 as very often. what statistical tool I should apply ? I should compare means or mean ranks of the 10 reason to visit a fast food outlet.

I’m afraid that I cannot help you with this question because you have not told me what your research question is, i.e., what you’re trying to find out. Your supervisor may be a more appropriate person to seek advice from. They may also be able to advise you with the pragmatics or asking for assistance. Good luck with your project.

Hi Achilleas,

I am so glad to have found your blog and have spent all of my Saturday night reading it :) You have helped so many people, I hope you can help me.

I have data on an ordinal scale: 0, Never; 1, Rarely; 2, Sometimes; and 3 Often. Five questions ask variants of the same thing, and the scores are summed. The final score can have a value between 0 and 15. Is it valid to divide by 5 to get the score back on the original scale? If so, is this a form of normalization? Also, since we are dividing by N, would this be considered a mean?

Thanks

Yes, it’s all good, provided the scales measure the same thing. You can make sure by calculating a variable called Cronbach’s alpha. Thanks also for the kind comments!

Hi [Dr] Kostoulas I have conducted a quantitative survey which entitled customers satisfaction survey by the office of the registrar of St. Mary’s University there are 268 respondents with four to five like rt scaled question.All in all am trying to answer the average satisfaction level of the respondents in this aspect I prefer mean.How can I analyze?

You should not use mean values (averages) with Likert-type questions. You could either report the median value, if you need it, report each value separately, or group positives neutral and negative responses and report those.

Hi Dr. Achilleas,

It’s amazing to me that you have stayed on this discussion thread since 2013 ! I have learnt so much from reading all your comments. Scholars working with the University of Manchester are usually amazing and very tolerant. I am not sure if your responses to the questions that were asked by two other persons on 26 March 2014 @ 01.49 and 16 March 2015 @ 17.58 applies to the question I have asked below.

My question is this:

Is it theoretically acceptable for me to calculate and use the mean for the following 7-point scales that I used to collect survey data from 41 respondents across all government Ministries in a developing country?

Highly Ineffective 1 2 3 4 5 6 7 Highly Effective

Weak feedback 1 2 3 4 5 6 7 Strong Feedback

My confusion stems from the fact that I did not use verbal descriptors for the middle values aside the two extreme descriptors where 1 is highly ineffective/weak feedback and 7 is highly effective/strong feedback. But clearly, the 7-point scale is also designed as a ‘Likert-type’ scale.

Second, I intended using the above 7-point scale to produce interval-level data analysis rather than simply ordinal level data analysis. Does this 7-point scale pass the theoretical test of interval-level statistical analysis?

Thank you.

Thanks for your kind words, they are really appreciated. With regard to your first question, I am rather sceptical about using means, in this case, especially as the Likert format and the low number of respondents makes it unlikely that your responses are normally distributed. But medians are a good alternative. The second question involves something of a judgement call: if you think that for your respondents, the ‘distance’ between 1 and 2 is the same as the distance between 4 and 5, then you can treat this as an interval scale. But, I am not sure this is a warranted assumption, so again, it may be best to take the safe option. Hope that helps and good luck with your project.

Hello Dr. Achilleas, Thanks very much for your prompt response and very useful suggestions.

Which method should I use to present the Mean of a 5-point Likert scale?

First method:

To determine the minimum and the maximum length of the 5-point Likert type scale, the range is calculated by (5 − 1 = 4) then divided by five as it is the greatest value of the scale (4 ÷ 5 = 0.80). Afterwards, number one which is the least value in the scale was added in order to identify the maximum of this cell. The length of the cells is determined below:

From 1 to 1.80 represents (strongly disagree).

From 1.81 until 2.60 represents (do not agree).

From 2.61 until 3.40 represents (true to some extent).

From 3:41 until 4:20 represents (agree).

From 4:21 until 5:00 represents (strongly agree).

Second method is the traditional way:

mean score from 0.01 to 1.00 is (strongly disagree);

to 2.00 is (disagree);

from 2.01 until 3.00 is (neutral);

3.01 until 4:00 is (agree);

mean score from 4.01 until 5.00 is (strongly agree)

My questions are:

1 Which method should I use to present findings?

2 When and why the first method is used?

Hello,

The problem with both methods that you’re proposing is that you are making a somewhat bold assumption about the data. You are assuming that the ‘distance’ between ‘strongly agree’ and ‘agree’ is the same as the one between ‘agree’ and ‘neutral’ (also that its is half the distance between ‘strongly agree’ and ‘disagree’, and so on…). Many people, including myself, would argue that this is an abuse of statistics.

To avoid this, it is usually safer to calculate the median, not the mean, of the data. You can find some information about calculating medians in this post: https://achilleaskostoulas.com/2014/02/23/how-to-interpret-ordinal-data/

Good luck with your project!

Hello Dr Kostoulas! Congratulations for your good work helping people. I would like to ask you a question about my data.

I’ ve got a three-level rubric (Low=1,Medium=2,High=3) in order to assess my students’ skills in doing a task. That rubric consists of 4 questions that measure different aspects of the same skill.

1)May I sum the grades from the 4 questions (maximum of 3×4=12) and then divide by 4? That calculation would be a violation of the things we can do with ordinal data?

2) I would like to check if changes occur during 3 time points. The statistical test to conduct would be a Friedman Test (for ordinal data) or an Anova for repeated measures (continuous variables)?

3) What is the best value for the lowest level of the rubric? 0 or 1? Does it make sense?

Thank you in advance!

Hi Giovanni,

Thanks for your kind words, they are much appreciated!

About your questions: 1) Strictly speaking, your rubric still produces ordinal data. However, unlike a Likert-item, there are more-or-less identifiable start- and end-points, so it’s less of a stretch to argue that the anchor points are equidistant (spaced at equal intervals). So, I guess that you could go ahead as you described, and – if this is some kind of student project you’re doing – add some remarks in the methodology section explaining why you are doing what you are doing.

2) The Friedman test looks like a safer choice here, I imagine. All statistical tests have a set of assumptions that need to be true for them to produce good results, so perhaps you’d like to check what these are and see that no other assumption is violated. Just to be on the safe side :)

3) I don’t think it makes much of a difference. Starting with a value of 0 would be more intuitive, but involves unnecessary work… either way, make sure you explain what you did in the methods section, and maybe add some reminders when you are describing the results.

I hope that was of some help, and good luck with your work!

Dr Kostoulas, thank you very much! I really appreciate your help so much.

To sum up, I will calculate the mean for the variable ((score1+score2+score3+score4)/4), given that “the anchor points are equidistant (spaced at equal intervals)”, as you said.

But, I found that a basic requirement to run a one-way repeated measures ANOVA is to have one dependent variable (my students’ skill) that is measured at the continuous level (interval or ratio), so would it be right to start with that test and then, if the other assumptions are not met (outliers, normal distribution, equal variances between the combinations of time leves), go with the Friedman’s?

I ‘m sorry for the rapid fire questions. It’s part of a school project, as you said. Thank you so much.

You’d be assuming that the data are interval, because they are equidistant. So if you make that assumption, you can try an ANOVA. But, like I said, a Friedman test might be a safer choice

Thank you again, Dr Kostoulas! One last question and I leave you and your topic alone! :) How could I justify the use of Friedman’s instead of ANOVA, or in other words, what makes you argue that Friedman’s is a safer choice?

No worries! One way to approach this is to take a look at the assumptions for each test, and see which one fits the data best. Depending on how much space you have, you might write a paragraph or two comparing the methods, and saying which one makes more sense for your data. Does that make any sense?

That’s a perfect idea! I will do that. Thank you Dr! Your students are very lucky!

Thanks Giovanni, that’s very kind! Best of luck with your work!

Hi! I’m sorry but I’m quite confused… when you illustrate the formula to calculate the mean, on the title of that paragraph it says “THE WRONG WAY TO DO IT” so does this mean that the formula illustrated is the wrong one? Is there another formula that should be used?

thank you, Greta

Yes, in most cases the formula for calculating a weighted average is not the best way forward. A safer way to proceed with this kind of data would be to calculate the median.

Hi :)

We used a likert questionnaire in order to investigate the hypothesis of whether staff satisfaction of a certain product is high (10 questions, 5 possible answers from strongly disagree to strongly agree, n=100).

In your opinion, is it possible to analyse it as explained on the following:

1.To sum up a questionnaire score for each respondent to receive his overall questionnaire score.

2. Compare the respondent’s questionnaire median score ((i.e. from 10 to 50 possible points) to the neutral average score of the questionnaire (i.e. (10+50)/2=30) using Wilcoxon signed-rank test.

Thank you!

Dani

Sounds just fine :)

Hi there, I have a question in regards to the mean of a Likert-Scale. I use 5 questions in regards to Intrinsic motivation to recycle. Since the questions are connected to 1 concept, is it possible to use the mean? To me this makes a lot of sense that an individual receives for example a score of 3,4 on intrinsic motivation.

Hi Judith :) You’re right, it’s probably OK. However, you probably need to run a Cronbach alpha test to make sure that the scale is internally consistent, i.e. the respondents view the questions as similar.

Thank you for your very quick reply! I already did analyse the cornbach alpha and found α = .818. But thank you for the confirmation. Sometimes I get a little ‘paranoïde’ about decisions I made when it comes to statistics!

Sounds great, then! Good luck with your project :)

Hi Sir,

I have conducted a research […] and I have 5 questions under each independent variable. I have 3 independent variables in total, […], I conducted individual bar graphs and I found for [one] question […] that majority no of participants opted for ‘Disagree’ (I have 5 Likert scales from Strongly Agree to Strongly Disagree), of 38% where as in the mean it shows 3.19 percent. Is there a connection between mean and the bar graph percentage and can I link the two?

Some of the other means are over 3.0 but the bar graphs related to them are neutral as well. I hope you understand as I’m trying my best to explain and its a real issue to me. I feel like your the correct person to ask this.

Yours Sincerely,

Daniel

Hi Daniel,

Thanks for your question. It looks like you are doing some interesting work, but maybe the statistics are somewhat above your current level of competence. I think that what you should do, under the circumstances, is forget about the mean altogether. Besides, as you will read in the blog post above, and in many statistics manuals, calculating the mean for Likert scales is dangerous statistical territory. What you can do is describe the distribution of answers for every question, by showing the bar chart, and a table with the absolute number and percentage for each response option. You might also want to note which is the most popular response – we call that the ‘mode’.

If you are absolutely sure that the questions under each independent variable are measuring the same construct, then you can also add up the responses, and present the combined data for each independent variable. Ideally, we don’t do this unless we run a statistical test first, which is called Cronbach’s alpha. If you can’t do it, that’s sort of ok, but you will want to note that in your report, maybe by writing something along the following lines: ‘the internal consistency of the scale was not statistically measured, but it can be inferred from the semantic similarity of the questions’ (this means that the wording of the questions makes you reasonably confident that the scales can be combined).

Hope that helps, and good luck with your project.