Achilleas Kostoulas

Applied Linguistics & Language Teacher Education

How to interpret ordinal data

In this post, I discuss how you can interpret ordinal data. These are data that have a clear rank order, such as the data produced by Likert scales. In the paragraphs that follow, I will discuss how to find out the median and interquartile range, and I will also offer some suggestions for reporting your findings.

Some assumptions

When writing this post, I will make some assumptions about you. I will assume, for example, that you are doing research in the humanities and social sciences. This means that I will avoid finer statistical nuance. This level of analysis should be enough if you’re doing a term paper or MA dissertation. However, if you are in a structured learning programme, you should clarify expectations with your tutors. I will also assume that you do not have access to a statistical package, such as SPSS, so I will not give you detailed instructions on how to input and process data; rather, I will just focus on the concepts involved. Lastly, I will be using Likert scale data to illustrate my points, but what I write is transferable to all kinds of ordinal data.

How to analyse Likert-scale data

What are ordinal data

Let’s assume that you have prepared a questionnaire, where respondents had to select among responses ranging from “strongly agree” to “strongly disagree”. For convenience, you have probably followed the established practice of replacing these responses with numbers: “1” for “strongly disagree”, “2” for “agree” and so on. This questionnaire produces what we call ordinal data. Ordinal data can be ranked: we can say that “strongly agree” indicates more agreement than “agree” or “undecided”. But we cannot claim that four “strong disagree” responses equal an “agree”. We need to be careful about what kind of calculations we can do, and what doesn’t work.

When analysing ordinal data, not all types of analysis work well.

Calculating central tendency and spread for ordinal data

So what kinds of analysis should we do? There are two types of statistical analysis, descriptive and inferential statistics. For most small-scale projects, where you just want to find out what respondents believe about a topic, descriptive statistics are enough. This involves, for example, finding the central tendency (what most respondents believe) and the spread / dispersion of the responses (how strongly respondents agree with each other). The table below shows how to estimate these.

Type of dataCentral tendencyDispersion
Ordinal MedianIQR
ContinuousMeanStandard Deviation

Because Likert scales produce ordinal data, I suggest that you calculate the median and Inter-Quartile Range (IQR) of each item.

  • The median (i.e., the number found exactly in the middle of the distribution) is a measure of central tendency: very roughly speaking, it shows what the ‘average’ respondent might think, or the ‘likeliest’ response, in a way that makes sense for this type of data.
  • The IQR is a measure of spread: it shows whether the responses are clustered together or scattered across the range of possible responses. It is not as precise as the standard deviation, which you may have heard about, but it is good enough.

You can find some instructions on how to calculate these metrics with SPSS in this page (the procedure is the same for both). If you only have access to Excel, here are links to a couple of videos demonstrating how to calculate the median and the IQR. For small datasets, it is easy to calculate the median and IQR manually. In the next two sections, I shall show how this can be done, using the example data. If you don’t want to read these, you can skip to the bottom, for some advice about how to report the findings.

Calculating the median

First, you arrange the numbers in an order from largest to smallest, like this:


To compute the median, you then delete one number from each end of the line, and repeat until you are left with just one number (or two that are the same). This ‘middle’ number is your median. If you are left with two different numbers in the end, the median is half-way between them. This will produce a decimal (e.g., 2.5), which might seem odd, but that’s ok. Using the data you provided, the median is 3, and I have marked it with red to make it stand out.

Calculating the IQR

The IQR is slightly more complicated, but not too hard. Your starting point will be the same arrangement of responses that we used above. When you divide this line into four equal parts, the ‘cut-off’ points are called quartiles. I have used red to indicate quartiles in the dataset.

[1,1,2,2,2,2,2,2,2,2,2,3,3,3, 3] [3,3,3,3,3,3,3,3,3,3,3,3,3,3, 3][3,3,3,3,3,4,4,4,4,4,4,4,4,4, 4] [4,4,4,4,4,4,4,4,5,5,5,5,5,5, 5]

The IQR is the difference between the first and third quartile. In the example, this is: Q3 – Q1 =  4 – 3 = 1.

A relatively small IQR, as was the case above, is an indication of consensus. By contrast, larger IQRs might suggest that opinion is polarised, i.e., that respondents tend to hold strong opinions either for or against this topic.

Reporting your findings

When your findings suggest consensus, your write-up should focus on describing the median (i.e., what most respondents seem to believe). One way to describe this is by writing something like the following: 

Most respondents indicated agreement with the idea that… (Mdn=4, IQR=0).

By contrast, when opinion is polarised, your write-up should emphasise the dissonance of opinion: the median is perhaps not so important. To help you understand this, consider a hypothetical case where half of your respondents hate a new textbook, and half love it. If you were to simply report that the respondents are, on average, undecided, that would be a statistical distortion of the data. Here’s a possible way to report the data more accurately:

Opinion seems to be divided about… . Many respondents (n=28, 47%) expressed strong disagreement or disagreement, but a roughly equal number (n=26, 43%) indicated that they agreed or strongly agreed (Mdn=3, IQR=3).”

A final caveat

One last thing: I would caution you against placing too much faith on findings that were generated from a single Likert-type item. Individual items are very sensitive to factors such as wording, sequencing and more, so you cannot be sure what they really show.

If at all possible, I’d try to cluster similar items together and compare / merge their results. Such groupings of items are called Likert Scales, and they tend to be more robust. If the responses in a scale are broadly consistent, that should give you confidence that you are measuring something reliably. If they are not, it might mean that one of the items is not functionioning properly (e.g., respondents may have been confused by the wording), and you may have to discard it from the dataset.

More to read

The following presentation contains some more detailed information about doing statistical research in applied linguistics and language education.

You may also want to check out some more posts I have written on quantitative research for language teaching, including:

Dependent and independent variables, using SPSS, and minding one’s manners
This post will teach you three things: how to tell dependent and …
Doing survey research: A collection of resources
Over time, I have written a number of posts on doing survey …
On Likert scales, levels of measurement and nuanced understandings
I have found it moderately bemusing that my post on Likert scales …
How to use Likert scales effectively
Many questionnaires use Likert items & scales to elicit information about language …

Before you go

If you arrived at this page while preparing for one of your student projects, I wish you all the best with your work. There’s a range of social sharing buttons below, in case you feel like sharing this information among fellow students who might also find it useful. Also feel free to ask any other questions you may have, using the contact form.

Achilleas Kostoulas

Achilleas Kostoulas is an applied linguist and language teacher educator. He teaches at the Department of Primary Education at the University of Thessaly, Greece. Previous academic affiliations include the University of Graz, Austria, and the University of Manchester, UK (which is also where he was awarded his PhD). He has extensive experience teaching research methods in the context of language teacher education.

About this post: This post was originally written on 23 February 2013, in response to a question a student asked me. It was last revised on 26 April 2023.


63 responses to “How to interpret ordinal data”

  1. George P. avatar
    George P.

    Dear Achilleas,

    How would you go with Likert (Ordinal!) style questions which belong to one construct and the comparison of medians across groups. For instance, imagine I measure some construct by means of multiple questions, and I want to compare respondent’s answers to constructs of two different kind of groups.

    For instance, let’s say I am measuring the construct ‘happiness’ with the recent decisions made by a political party, and I am measuring happiness by means of 10 questions each.

    I interview 20 respondents in total, 10 of which appear to like party A and 10 of which like party B. I want to provide an answer to the question whether followers of party A are more happy with their political party’s decisions than the followers of party B are.In other words, I want to know whether there exists a difference between the construct happiness among respondents of two different political parties.

    In total I get [20 respondents]*[10 questions per respondent[ = 200 answers.

    How would you analyse this dataset in terms of medians?

    Would it be fine to calculate the median of all the respondent’s answers related to one construct in one political party’s followers sample, say I calculate the median of 10*10=100 values belonging to the construct ‘happiness’ of party A followers (or B, whatever you choose)?

    1. Achilleas avatar

      Hi George,

      Let me see if I understand your question: You are saying that you have set of ten questions (I’ll call these variables from now on), and a sample of twenty participants, who are divided into two groups (supporters of different political parties). You have a hunch that the responses given by your participants are different depending on which group they belong to, and you want to test this statistically. Am I getting this right?

      One way to do this would be to conduct a cross-tabulation and x-squared test for each variable. This will show you how participants from each group responded, compare the results against what might be expected if the responses were random, and tell you if the difference is statistically significant. However, the test might be skewed due to your small sample size (If you are using SPSS, it will flag possibly skewed results).

      Another way would be to merge the ten variables into one super-variable (‘happiness’), and calculate each participant’s ‘average’ response. This can only work if the ten questions elicit similar responses: e.g., participants who responded with a ‘strongly agree’ in question 1 should, ideally, respond with ‘strongly agree’ in question 2, 3 and so on. To check if this is true, you should calculate Cronbach’s alpha score for these variables. This is a metric ranging from 0 to 1.0, and the higher it is the more homogeneous your composite scale. If the Cronbach alpha is low, then you can try removing one or more of the questions from the composite variable, and see whether it is improved. Again, SPSS will calculate this metric for you, and it will tell you what the alpha would be if you removed any one question. (There are also more sophisticated methods for establishing whether the ten variables ‘cluster’ in one or more groups, but let’s not go into that now).

      Once you have created the new composite variable, and calculated the central tendency for each respondent, you’d need to compare the responses of the two groups. Although each variable produces ordinal data, it has been argued that the composite variable may have properties of ‘interval’ data. This is controversial, and depending on where you stand on this question you’d work in slightly different ways. If you treat the data as interval, and the distribution of responses is normal, you could use an independent t-test to compare the responses of the two groups. This will tell you the difference between the ‘average’ response in each group, and whether it is statistically significant. If you still treat the data as ordinal, you’d run the Mann-Whitney U-test, which does pretty much the same thing.

      1. George P. avatar
        George P.

        Dear Achilleas,

        Thank you for your comprehensive reply.

        To answer your first question: you are right about the set of ten questions and the sample of twenty participants, who is divided into two groups (supporters of different political parties). Furthermore, I do have a hunch that the responses given by your participants are different depending on which group they belong to.
        However, since the sample size is really small, I think that testing this statistically does not make any sense. For this reason, I am thinking of ‘just’ comparing the medians of the ‘happiness’ constructs.
        However, in light of using median values, I do not know how to deal with the fact that the ‘happiness’ construct is measured by means of 10 variables (questions). I forgot to mention that a factor analysis already proved, convincingly, that this set of 10 questions indeed load on one factor.

        One way to deal with the fact that ‘happiness’ is measured by means of 10 variables (questions) is, as you mentioned, creating one super-variable (‘happiness’). However, I am reluctant to create this variable because there is an extensive discussion about whether you can create this variable in an ordinal-scale setting.

        My solution, therefore, would be to rank the responses (of one group) to all 10 questions, and, subsequently, calculate the median value (out of 10×10=100 values). The next step would be to calculate the median value of the other group’s responses. In the final step, I would compare the median values of both groups (knowing that I cannot say anything about significant differences because I am not applying any statistical tests; the median comparisons would merely serve as an indication that a difference might exist).

        What do you think?

        In my last post I forgot to thank you in advance, so, hereby, many thanks in advance.

        Kind regards,


      2. Achilleas avatar

        Hi again,

        You can certainly just compare the medians of the two groups to find out some insights about your sample. What a statistical test such as the x-squared would tell you is whether these insights can be projected from the sample to the population, and whether Of course, you are right in pointing out that the modest size of your sample would make such projections difficult.

        Another thing to bear in mind is that calculating a median condenses, and to a certain degree, distorts data. So, for instance, if one group was polarised (lots of extreme views) and the respondents in the other group were all clustered around a central value, a comparison of the means would mask this difference.

        So, I guess that what I am trying to say is that what you are suggesting does make sense, but I recommend that you also consider the Mann-Whitney U-test.

  2. Arthur avatar

    Hi Kostoulas,

    one simple question, to verify how youngs sees the social and political participation, using a likert scale, what kind of analysis do you think will give more reliability: a rating average (X1W1+…+XnWN / TOTAL) or a Mode?

    I mean, sometimes I think the Mode, or the higher concentration of answers, will be more clear, but almost sites, like SurveyMonkey, suggests the use of the rating average… And in my research these two shows differents answers in every variable.

    I curse myself every time I remember that I slept in class statistics. LOL

    1. Achilleas avatar

      I think the median is your best choice for individual items.

      1. Arthur avatar

        But, what kind of factor do you attribute to the difference between the rating average and the Mode? Why the two methods give different results for the same data?

      2. Achilleas avatar

        Each set of data can have up to three different types of ‘average’ (measures of central tendency, to be more technical): the mean, the median and the mode. These are calculated in different ways, and are likely to be different. Sometimes, the data are evenly distributed, and in such a case it happens that the three measures coincide. On other occasions there may be unusual features in the data, such as an outlying value, which result in large differences across the mean, mode and median. It all depends in your data.

  3. Jacks avatar


    I am just trying to determine if there is a difference in community perception based on the geography of where they reside. Then relate that data to theories that correspond to different government administration practices.

    Thanks for the advice. It is much appreciated! I feel less lost!

  4. Achilleas avatar

    As stated in the post above, you can calculate the median and interquartile range for each response. You can also display the results in a bar chart. The types of analysis you use depend on your research question.

  5. EVE avatar

    Tons of thanks for ALL these juicy information about Likert, Spearman, and ordinal data.

    My Problem is this:
    My study ought to determine the EFFECTIVENESS of a particular radio program (let it be : “Program X”) in ENFORCING its ADVOCACY.
    Now, by that, I mean to test it via “relationship” of:

    (A) effectiveness of advocacy integration in terms of its radio programming, and;
    (B) effectiveness of advocacy promotion in terms if the radio DJs’ performance.

    I used a 3-point Likert Scale “3-agree” “2-disagree” and “1-undecided”.
    (my survey questionnaire validator highly advised it to be changed into 3-point from my original 5-point scale “5-strongly agree” “4- agree” “3-undecided” “2- disagree” “1-strongly disagree” because my questions are highly specific already.]

    Would you be so kind Sir to help me out with an appropriate statistical treatment and interpretation technique if possible?
    I plan to test the effectiveness of each variable A and B first by determining the MODE and RANGE (?) [or inter-quartile range?]
    Then, I plan to use SPEARMAN rho for the correlation of A and B.

    would these work? could you please enlighten me more?
    Please bear with me T_T
    Great Thanks!

    1. Achilleas avatar

      What you propose sounds ok, as far as I can tell. The range and mode of each item can give you some insights into individual variables (you can also use the median, instead of/in addition to the mode). Spearmans rho will then tell you if the two variables (a & b) are linked.

      1. EVE avatar

        Thank you for the quick response Sir!
        Can I ask few more things if you don’t mind? :)

        1. My respondents are only 30 in total and I am afraid this may result to unreliability due to very small sample size. With a 5% margin of error, do you think this QUANTITATIVE DESCRIPTIVE type of study would glean reliability?
        2. Can you “teach” or at least provide some tips on how to establish or make a good data interpretation for my updated 3-point scale? My old version of 5-point scale is like this with 0.79 range:

        Average Scale Interpretation
        4.20-5.00 Very High Effectiveness
        3.40-4.19 High Effectiveness
        2.60-3.39 Average Effectiveness
        1.80-2.59 Low Effectiveness
        1.00-1.79 Very Low Effectiveness

        The truth may sound d*mb but I actually don’t really know how to ADJUST the range of data interpretation scale with my updated 3-point scale. How do I adjust the range? I’m sorry could you help me out more?:(.

      2. Achilleas avatar

        With just 30 participants, you will have to be very careful about the kinds of claims you make. Of course, that depends on the size of the actual population you are studying. In general, you should be able to estimate the statistical significance of the findings (the p value) using a SPSS or by looking it up in a statistic table (many statistics manuals have these in the appendices, or at least the older editions used to). Whether or not quantitative methods are appropriate is not a moot point, since you have already done your study. You just have to report your findings, and comment on whether they are statistically significant or not.

        I cannot comment on how to transform the scale, because I do not know how it was constructed, or whether it was validated. At first sight, it appears that the range from 1-5 was divided into five equal segments, each covering .8 (i.e., 1/5th of the range). So, I guess that you could just do the same with a range from 1-3 (i.e., 2 x 1/5 = .4, hence 1.00-1.39, 1.40-1.79 and so on.) I am not sure whether you need such a fine-grained breakdown though. I would also remind you that your Likert-type items produce ordinal data, so normally you should not calculate median values (a.k.a. ‘average’). You should use the Median instead.

  6. Maxim avatar

    Thank you for the great article!
    Sorry for my English, first.
    Now I try to research methods used in schools of my country for parent satisfaction level estimation and find the best. I see, that common procedure is:
    1. Parents fills form with 20-30 questions. The answer on each question is a number from 0 (bad) to 4 (excellent).
    2. They make new composite variable. They calculate sum of all answers and divide by total count of answers (for example: 20 questions, 100 respondents: they make sum of all numbers in a table and divide by 2 000)
    3. This number is “parent satisfaction level estimation”, they said.
    I think, that this is some kind of mean and couldn’t be used. Is it right?
    In this case, when we must aggregate few ordinal variables and have as result one number, what method of aggregation will be more appropriate? Some authors says, that “11 distinct points on a scale is sufficient to approximate an interval scale”, some said that the best way is the cluster analysis and centroid analysis (mean of centroid coordinates, may be).
    More, I think that every question has a weight, so we must used it. But how?
    Thank you.

    1. Achilleas avatar

      1. Your scale is ordinal, so you can calculate a median and a mode, but not a mean. There is advice on how to do this here.

      2. I am not sure what you mean by ‘every question has weight’.

      3. It is true that an ordinal scale with many points (e.g., eleven or more) might approximate an interval scale, but since your scale only has five points, this is not relevant to your case. Cluster analysis might be a better option in this case, depending on what you are trying to find out.

  7. Adilah avatar

    Hello Sir.Thank you for the thorough explanation and the linked articles.But I have a question.What if I have a total of 55 likert-scale items which are divided into 5 subsections and I have to use all items to determine general attitude whether it is positive or negative. Should I analyse each item one by one for its median and IQR. It is going to be messy.And how am I suppose to discuss it in my study and say/conclude based on all items,the attitude is therefore positive/negative.It is a 4-point Likert scales.Another statistician told me to analyse descriptively and find the mean.Then see whether the mean is above or below the midway point.If it is above,then I can say it is positive attitude and if it is below then I can say it is negative attitude.Could you please help me because I am confused.Thank you Sir.

    1. Achilleas avatar

      Thanks for your kind comment. I would use the words ‘thorough’ and ‘transparent’ rather than ‘messy’ to describe this kind of detailed analysis. If you prefer to summarise many items into a single scale, you can find some instructions here. Good luck with your project!

  8. babyruthbeer avatar

    This is very helpful Mr. Kostoulas. Thanks for this. I have a question though.

    My case is very similar to the one on your article. Our survey is also composed of items with a 5 point scale ranging from 1-strongly disagree to 5-strongly agree. I already got the mean and the IQR of my first Likert-type item, 3 and 2 respectively. I’m having problems with interpreting them. I don’t really understand IQR and how to explain it. What does getting 2 means? Or if I get 3, how do I explain that?

    I have total of 100 respondents. And the items are statements (e.g. I often watch gay-themed films). I’m looking forward to your response.

    1. Achilleas avatar

      Thanks for your kind remarks! Usually, the kind of dispersion you are describing is a sign that the sample is polarised. This means that respondents tend to have very strong views either for or against the statement. A bar chart can be useful in visualising this: it’s likely that there will be concentrations of responses at the edges of the scale, and few responses in the centre.

      A high IQR also alerts you to the fact that the median is likely a misleading metric (e.g., if you have 50 respondents who strongly agree and another 50 who strongly disagree with a statement, this doesn’t mean that, on average, they are undecided/indifferent).

      So, when reporting your data, you may want to foreground the dispersion of responses, rather than the median. One way to do this is: “Item x indicates that opinion is divided regarding issue y (mdn=…, IQR=…). Most respondents stated that they disagree or strongly disagree (N=…), but a large number of respondents indicated agreement or strong agreement (N=…).”

      1. babyruthbeer avatar

        Thank you, Sir!

  9. George avatar

    Dear Achilleas,

    First of all i would like to thank you for the great article and all the useful information that you provide.

    I have a question regarding my survey. I use a standarize likert type questionnaire in order to measure the usefulness, ease of use, ease of learing and satisfaction for my website.

    Link to the questionnaire:

    For its variable (usefulness etc.) there are a set of questions. I also have a number of participants (N) who answered the questionnare. So what i am thinking to do in order to take insight about the perceived usefulness, ease of use, ease of learing and satisfaction is to:
    calculate the mean for every single question in every group and then to calculate the mean of the means in order to to have a single value for every group (usefulness etc). So if this mean is above a threshold, it means that the perceived usefulness of the website is high.

    Do you think that this is a good practic?

    Thank you very much :)

    1. Achilleas avatar

      Thanks for the kind words George! What you’re describing seems ok, although I would prefer to use the median rather than the mean.

      Many statisticians strongly believe that Likert items produce ordinal data, so it would be wrong to calculate a mean value for them. This is a fairly contentious point, and you might want to stay on the safe side by calculating medians.

      Here’s some advice on how to do this using SPSS:

  10. Andrew avatar

    Im learning how to use this likert sale and would really appreciate the assistance of a specialist. Im in the process of completing a case study where i must use the likert scale to analyse my information. The topic that i dealing with is misbehavior with in the class room. I trying to find information on teachers perception on this topic at grade eight level in secondary schools.

    1. Achilleas avatar

      I am not sure I understand your question: do you want help with your statistics, or guidance compiling a literature review?

  11. Eki avatar

    Hi my name is Eki, student from Indonesia

    Im conducting a research titled “Critical Success Factor of Stakeholder Management in procurement phase in EPC projects”

    I collected 48 variables from various sources such as Journal and Thesis. I designed the research with questionnaire containing likert scale 1 (not important) to 5 (most important) and asked 30 respondents to give their views on each variables.

    What i want to ask is, how could i analyze which variable can be categorized as CSF? If using median, what is the parameter of the median?

    Thx in Advance

    1. Achilleas avatar

      Hi Eki! I don’t know enough about management to be able to advise you. The scale you’re describing seems like a generic interval scale, rather than a Likert item.

  12. Shereen Adam avatar
    Shereen Adam

    I would be grateful if anyone can help me with my analysis. I conducted a study on the relationship between low income (independent variable) and stress level(dependent variable). I used 3 instrument to collect data 1. the biofeedback stress card 2. the perceived stress scale 3. a six item scale to measure financial stress. I have 134 respondents , the six item and the four item self report questioner provided ordinal data the biofeedback reading gave nominal data. I used Spearman’s rank correlation to look at the significance level between general stress and financial stress. I also used X-square to look at each item with in the questioner separately.My question is can I conduct descriptive statistics on the six item and the 4 item scale?
    I would be grateful if you can help

    1. Achilleas avatar

      Depends on what you mean by ‘descriptive statistics’. You should use measures appropriate to the level of measurement.

  13. Kasturi avatar

    I am doing a satisfaction study based on ordinal data. Since my study deals with many variables and the links among those variables, i want to know how to treat the ordinal data. If i use median, will the median values be still considered as ordinal? And if that is the case, then i cant use one way MANOVA or any other techniques which is based on continuous data assumption. Kindly Suggest.

    1. Achilleas avatar

      No, ordinal data do not yield medians or any statistical techniques that assume continuous measurement.

  14. Hannah avatar


  15. Jenny avatar

    Dear Achilleas,
    It is me again – you offered very kind help to me in some comments on another of your posts.
    I have a couple of other questions as I draw to the end of my write-up. I’ve tried searching for the answer but it’s tricky to know what search terms to use.

    1) The estimation of the population I’m working with (i.e. UK-based yoga teachers) is 10,000. I put this figure into an online sample size calculator along with a margin of error at 5% and a 95% confidence level. The resulting target number of participants for my questionnaire was 370 assuming a 20% response rate from 1850 invites. I achieved this, so that is no problem. My question concerns the next steps I took. I have interpreted my Likert-type scale data as ordinal and utilised Kruskal-Wallis test and the post-hoc Dunn-Bonferroni test to determine if there were any significant differences in the respondents’ median attitude scores between different dietary groups, and if so, between which dietary groups these significant differences were. SPSS states the results of the Kruskal-Wallis test and the post-hoc Dunn-Bonferroni test to be true at the 0.05 level. But can I assume that just because my sample size is sufficient for the overall UK-based yoga teacher population, that the number of individuals representing each dietary group within my sample is sufficient or representative of the wider population? My sample was sourced from Yoga Teachers UK Facebook group. I’m thinking that because of my non-bias source (i.e., the aforementioned Facebook group is a general group and not specific to any dietary group amongst yoga teachers) that I can assume that the proportion of my sample representing each dietary group could 1) be considered as representative of the wider UK-based yoga teacher population and 2) be considered of an adequate size. Do you agree? Is my interpretation correct?

    2) Secondly, as I assumed a 95% confidence level when using the online sample size calculator, I thought that all statistical tests I performed would be valid at the 0.05 level, which was the case for the tests I detailed in point one above. However, I used the Spearman’s rho to test for a correlation between a set of beliefs and a set of attitudes and SPSS told me the results were true at the 0.01 level. Should I ignore this and still interpret the result as true only at the 0.05 level (as SPSS does not know the confidence level that I set on the sample size calculator)?

    So many thanks to you for any advice you can give 😊

    Jenny 😊

    1. Achilleas Kostoulas avatar

      Hi Jenny,

      Good to know that your project is moving along so well. To start with your second question, a p value of 0.01 is actually even better than 0.05 (the lower the value is, the better!). This means that if something is valid at the 0.01 it is also valid at 0.05. So no problems there.

      The answer to the other question is slightly more complicated: Strictly speaking, your sample was self-selected, i.e. not truly random and this means that most of the claims about generalisablity have to be made with a certain degree of caution. This doesn’t detract from the vaule of what you did and found, it’s just a feature of most questionnaire-based studies. What you want to do, in the write-up, is demonstrate an awareness of the limitations, and at the same time enough confidence in the value of what you did.

      One way to do this in the write-up, is to repeat all the information that you just shared, explaining why you think that the sample gives insights into each dietary group, and list all the steps you took to ensure generalisability / external validity. Towards the end of the thesis, you will likely add a couple of paragraphs discussing the limitations of the study, and there you can say something along the lines of ‘of course these findings need to be interpreted with caution, because of the sampling strategy which… Although the findings convincingly show that… we cannot assume that they prove… however, they do have the potential to inform…”

      Hope that helps!

  16. Jenny avatar

    Dear Achilleas,
    Thank you so very much. That is super helpful.
    I wish I could offer you some help in return :)
    Many, many thanks :)
    Jenny :)

    1. Achilleas Kostoulas avatar

      I’m glad that was useful! Best of luck with your project :)

  17. Caress avatar

    Hi Sir! I woul like to measure the relationship/association between self-handicapping (5-point Likert scale) and type of implicit belief on intelligence which may be Entity, Incremental or Both (nominal variable). What statistical tool would you recommend?

    1. Achilleas Kostoulas avatar

      Crosstabs and chi-square, if your sample’s large enough.

  18. Esther avatar

    Hi Achilleas,

    My research is on the people’s perceptions and knowledge on the use of fire in savanna woodlands in six districts. I have a question on whether the season and frequency of burn is 1. Very important 2. Important 3. Not important. How do I analyse the data from this? Thanks in advance.

    1. Achilleas Kostoulas avatar

      Depends a lot on what you’re trying to find out, I guess. If you just want to do descriptive statistics, analysing the frequency and percentage of each response, plus median, range and IQR should be more than enough.

  19. Rashid avatar

    Hi Achilleas!

    I am writing my thesis right now and we are using a Likert scale (strongly disagree = 1, to strongly agree = 5).
    We are using 4 categories that include about 5 questions each and we are focusing on descriptive statistics as the method to present and analyze the data. More specifically we are using Median and Quartiles, which is fairly easy to achive in SPSS when trying to find out the Median and Quartiles for the questions by themselves, but the problem we run into is when we need to summarize the Median and Quartiles for our 4 categories. I can’t seem to find a way to compute the questions/items together to figure out median and quartiles for the specific categories in which the questions are included.

    You help is appriciated, thank you!

    1. Achilleas Kostoulas avatar

      Hi! If each group of items is fairly cohesive, how about adding the scores (to create a scale from 5 to 25) and calculate the median and IQR for that?

  20. Ene Ameh avatar
    Ene Ameh

    Hi Achilleas,
    Your blog has been so helpful. Your last response applies to me but raises some questions I will plead you help me with:
    Is it preferable to do individual median and IQR or per section(I have 6-8 items in each section)?
    If doing a group of items please could you further clarify what you mean when you say add the scores to create a scale.

    My data is representative of teachers’ response to questions of factors affecting effectiveness. And for each factor I have 6-8 questions they answer using a likert scale (strongly agree- strongly disagree).

    I am grateful for your help.

    1. Achilleas Kostoulas avatar

      Thanks, that’s very kind of you to say. Regarding your question, I think the answer probably depends a lot on what your ‘groups of items’ (: Likert scales) look like.

      If the scales are cohesive (i.e., they measure the same underlying construct), then you can add up the responses. So for instance, suppose you have a scale comprising 3 five-point items, and a respondent has selected (1) for Item 1, (1) for Item 2 and (2) for Item 3, then the composite score for this “group of items” is 4 (on a scale from 3 to 15).

      This method only works when the responses in your items are similar. In the example above, if respondents tended to choose (1/strongly disagree) for Items 1 and 2, and (5/strongly agree) for Item 3, that would suggest a measurement inconsistency. You may want to read up on how to use Cronbach’s alpha to measure how cohesive your scales are.

  21. Frieda avatar

    Hello Sir , one of my research questions is ‘how current residential valuation practices in Namibia, relate to international standards on competence and code of conduct framework” I have used the 5scale likert (Strongly disagree to Strongly agree) Apart from median which other descriptive statistics should I use?

    1. Achilleas Kostoulas avatar

      Cross-tabs and chi-square, I suppose

  22. Jessie avatar

    Hi, thank you for your useful information and explanation. Just like Rashid, Need your help / opinion on this please, I’am measuring the quality of the program I made:
    Let say I have five categories and each categories has two questions answerable scale 1-poor to 5-excellent by my 40 respondents.
    Category A: 5 | 4 | 3 | 2 | 1 |
    Category B:
    Category C:
    I calculated the Median and IQR of each questions but I’am wondering …
    How can I get the Median and IQR of Category A, B, C, D, E, F ?
    I’am loss, thank you in advance for help.

    1. Achilleas Kostoulas avatar

      You could add the values for A1 and A2 and so on. This will give you an new scale, ranging from 2 to 10. You can either do calculations on that, or divide it by 2

  23. jojobo avatar

    Sorry but the median says close to nothing about what the “average” respondent might think. That’s just plain wrong and a pretty bad misinterpratation of data. It tells you something about spread together with the other quantiles. But if you want to say anything about the “average” (this is not really correct because you can’t construct averages from opinions) respondent you would use the modus. The value that most people assigned to their attitude regarding the statement in question.

    For example, ask people about climate change in the United States these days. What would the distribution look like? Probably a lot of people wouldn’t give a shit and assign a “1” or “not concerned at all”. Probably about the same or slightly more people would respond with “5”, which would correspond to “strongly concerned”. So the median is found somewhere in the middle, at “3” or “neutral”. Would you say that is an accurate description of the attitudes of the US public? No of course not. You would argue, most people are strongly concerned about climate change, however, almost as many don’t give a shit. This reflects the deeply polarized political debate in the US. Using the median you would not reach this conclusion, because you would think attitudes are relatively normally distributed. Looking at the modus however you can.

    1. Achilleas Kostoulas avatar

      Thanks for this comment. You are, of course, correct; in fact, that the section on reporting findings actually says that ‘averaging’ polarised opinions would be a distortion of the data, so I think that we are in essential agreement here. The mode would be a useful alternative in such a case, but I think that an even better approach would be to describe the distribution in more detail (there’s example in the post, which I hope is helpful).

  24. Vonn Jireh Corro avatar
    Vonn Jireh Corro

    me and my classmates are having a research and we are finding if the urban planning in our city is efficient from the perspective of samples (students that lived in that city). Supposedly we want to use 5 point Likert Scale for the questionnaire if the urban planning is efficient. How do we get the mean? I mean for each question? or for each category?

    1. Achilleas Kostoulas avatar

      Hi, thanks for your question. Most of the times, when working with these questions it is safer to report the median, rather than the mean. This is especially so with single questions. You could perhaps use the mean if you’re confident that the questions in each category measure the same thing (the same latent variable, to be technical) but that would require more statistical work than is usual in a school project. Hope that helps and good luck with your project!

  25. MOSES avatar

    how can i calculate mean and standard deviation from likert scale of 5

    1. Achilleas Kostoulas avatar

      You can’t. Likert scales produce ordinal data, and ordinal data don’t have means/standard deviations.

    2. Achilleas Kostoulas avatar

      You really shouldn’t. A Likert scale produces ordinal data, so calculating means & standard deviations is problematic (some people say it’s completely wrong). A more appropriate way forward is to calculate the median and interquartile range.

  26. Nurjannah avatar

    Greetings Mr. Kostoulas,
    For a 5 point Likert Scale, the interquartile range will be either 0,1,2,3,4 isn’t it? which is considered a high interquartile range?

    1. Achilleas Kostoulas avatar

      Hi there! I’m not sure you should think of ranges as being either “high” or “low”. As with most things in statistics, it’s a matter of degree. That said, I would be inclined to think that anything >2 suggests a that answers are not clustering around the median. In such a case, you might want to take a close look at the bar chart, to see how the answers are distributed.

  27. kumy avatar

    Hi Achilleas, the main objective of my research is to find the sources of language anxiety of English language learners. To collect data, I’m going to administer a questionnaire which is a 5 point Likert scale ranging from strongly agree to strongly disagree. From this questionnaire, I will be able to get so many sources of LA. From that lot, how can I identify the main sources of LA, may be about 10 main sources? Can I use descriptive statistics and do that or is there another way?

    1. Achilleas Kostoulas avatar

      Hi there. The questions you should be asking are: (a) why are you using a new questionnaire, rather than one from the literature, and (b) is this an exploratory study, that aims to find new sources of LA? If so, what’s wrong with existing ones?

      Descriptive statistics will only tell you things that are already in the questionnaire. They will not tell you how these relate to language anxiety. For this, you will probably need to read up on inferential statistics.

  28. MELODY HERB avatar

    Greetings Dr. Kostoulas

    I have learned a great deal from you today about Likert scales. Thank you. I am in the process of writing Ch. 4 of my doctoral proposal/manuscript. The data is in and I am analyzing. I used the Academic Motivation Scale survey (AMS, Vallerand et al.,,1992). […] The survey is a 28 question, 7 point Likert scale meant to measure the motivation (self-determination-Deci and Ryan, 2000) constructs of Intrinsic, Extrinsic, and Amotivation. There are 7 subscales, each has 4 questions related to it.

    Here are the instructions I received from Dr. Vallerand:

    To calculate a person’s score on the AMS, you need to find the mean response for each of the subscales. These means will vary between 1 and 7. You then insert these means in the following formula which will able you to calculate a self-determination index in Excel: […]

    I am leaning toward this analysis type, however I’ve read your info on Likert scales (and many others) and I’m wondering if I should use median and interquartile range instead of mean and st.. dev. I have 2 independent groups (home school and public school), and I am analyzing which group may have higher levels of motivation.

    My question is which method should I use (they both have lots of controversy surrounding them), and what statistical tests would you recommend?

    Thank you very much!

    Melody Herb

    1. Achilleas Kostoulas avatar

      Hi Melody,

      This looks like an interesting project, thanks for sharing the details! You are, of course, right in pointing out that there is much controversy surrounding the use of Likert scales. However, I would suggest that in this case, you follow the instructions given by the creators of the scale. Among other things, this will ensure that your results are comparable to those reported by others who have used the same scale. Also note that many of the objections to the use of means stem from the fact that the scales are not very carefully created, and the idems do not always measure the same construct. With a properly constructed and thoroughly tested scale such as the one you are describing, any irregularities are likely to cancel each other out, and the remaining objections are mostly theoretical. You may (or may not) want to add a few comments in your methodology chapter outlining the controversy, and then go on to say how you have worked and why you have confidence in this analysis.

      Hope that helps sowewhat, and good luck with your work!

  29. Drista avatar

    Hello sir! I have a question
    I am making a qualitative research and the goals of my research are to figure out the level of agreement of the participants on a certain topic. can i use the median as a way to determine the level of agreement of my participants?

    1. Achilleas Kostoulas avatar

      Hello! Qualitative research involves the analysis of textual data. It is unclear to me how you plan to use medians, as this would involve quantifying the data – which is a valid thing to do in some research traditions, but doesn’t seem to be in line with qualitative work…

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.