How to summarise Likert scale data using SPSS

Elsewhere in this blog, I have written that a Likert scale might consist of several overlapping items. For instance, if I want to measure subjects’ attitudes towards sweets, I might ask them to record how they feel about the following statements:

Strongly Agree Agree Disagree

Strongly Disagree

1.  I like chocolate.
2.  I like cookies
3.  Ι Iike whipped cream

In order to interpret these data, we need to summarise the data in the scale. The safest way to do this is by estimating the median value of all the items. Using the same example as above, I need to create a new ‘super-variable’, which shows the mean of items (1), (2) and (3) for each respondent.

In the paragraphs that follow, I will show how to do this, using SPSS.

Assumptions

I assume that you will already know how to define variables and values, how to toggle between the numerical expression and verbal descriptor of the values (i.e., you can make SPSS show responses as “strongly agree/agree/disagree/strongly disagree” or as “1/2/3/4”, and how to key in data. I will also assume that you have already established that the scale is internally consistent, so I will focus only on the technical aspects of merging the variables.

Starting out

Your starting point will be a dataset similar to Figure 1 below.

SPSS screenshot showing responses to Likert-type items
SPSS screenshot showing responses to Likert-type items

When you have typed in your data, and tested for the internal consistency of the scale (use Cronbach’s α), it’s time to create a new variable.

Merging the variables

From the top menu bar, select Transform -> Compute variable. You should now see the following dialogue box.

Screenshot showing steps for combining Likert type responses
Four steps for combining Likert type responses
  1. Assign a name to the new variable (e.g., Sweets)
  2. Scroll down the Function Group, and select Statistical
  3. From the functions that appear select the Median. [ΝΒ it is possible to select the mean, but I don’t recommend it]. At this point, the following formula should appear in the numerical expression box: Median ( , )
  4. Place the cursor in the brackets, select the variables you want to merge, and click on the arrow. Repeat with all the variables, separating them with comas.
  5. Click on OK.

Result

SPSS will automatically generate a new variable, which will appear at the end of your dataset. This will be in numerical form (1, 2, 3, …), but you can change it to a verbal descriptor for consistency (Figure 3). You can use this variable for descriptive statistics (e.g., estimate the central tendency and dispersion), cross-tabulations, correlations and so on…

Screenshot showing the combined scale
Figure 3 The new variable

Now wasn’t that very easy?


More to read

I hope that this information was helpful, but if there’s anything that was not clear, feel free to drop a line in the comments below. You may also want to check out some more posts I have written on quantitative research, including:

If you landed on this page while preparing for one of your student projects, I wish you all the best with your work. There’s a range of social sharing buttons below in case you feel like sharing this information among fellow students who might also find it useful. Also feel free to ask any other questions you may have, using the contact form.

77 Replies to “How to summarise Likert scale data using SPSS”

  1. Hi,
    I am doing likert scale questions. However, I use 2 questions under one category as the situation you shown above, there are 3 questions under one category, which is able to calculate the new variable through median. However, for my situation, if I use median, there will be decimal places for the number. How should I do? Please help me. Thank you so much

  2. Hi, As the situation you shown above, you are using three questions for one variable, therefore you calculate median, however for my situation, I am using two questions for one variable, if i calculate median will be not approprate?because demical places.. or should I calculate by mode? I am really do not know how should I do the analysis?

  3. Hi! Thanks for the post. It’s been really helpful. But a quick question. How do I create verbal descriptions if I gave a decimal value?

    1. I’m glad you found it helpful. The reason why you have decimal values is because you have calculated the mean (or ‘average’). As you have intuitively found out, the mean doesn’t make much sense in Likert scales. That’s because the data that these scales produce are ordinal. There are researchers who claim that this is okay, but many people think that this is statistically wrong, and as you’ve found out, the results are hard to interpret. A less controversial thing to do is to calculate the median of the scale.

  4. I have 5 different questions for identifying the mood or response of 1 independent variable. The rating scale is
    5-Strongly Agree
    4-Agree
    3-Neutral
    2-Disagree
    1-Strongly Disagree
    The result of tested internal consistency is alpha 8.45. Now I want to compute the 5 questions having 5 tables into 1 table for my independent variable All questions are in ordinal measure. Some people suggest me to take the average of these 5 question but i see you used MEDIAN. Here im confused as to what should i take ???

    1. I prefer using the median, for reasons that I have explained elsewhere in this blog. Many people use the mean (average), but I think this is not, strictly speaking, sound statistical practice.

  5. ok thanks and after the MEDIAN result table, their measure changed from ordinal to Nominal in SPSS. The original data type was ordinal. Should i leave it nominal or as by question data type, Ordinal ? what should i select between nominal and ordinal ?

    1. I am not sure why it changed. Your scale is Ordinal, as you said. It doesn’t really matter what the SPSS table reads – this is more of a reminder for you. If you are of an obsessive predisposition, you can change it back, but even if you don’t correct it, it shouldn’t affect results.

  6. Hi,

    I’ve a dataset with likert scale questions, in which participants weren’t required to answer all questions. I showed them 4 advertisements in total. I had 2 versions of each advertisement. The questionnaire tool I used assigned each participant randomly to a condition. So for example, participant 1 saw ad 1, version 2, ad 2, version 2, ad3, version 1 and ad 4, version 2. Participant 2 saw ad 1, version 2, ad 2, version 1, ad 3, version 1 and ad 4, version 2 etc. etc.

    I asked the same set of questions (7 questions in total) after having shown each advertisement. These 7 questions combined measured the dependent variable likeability.

    I wanted to combine the answers of the 7 questions, since it measures one dependent variable.

    I used your technique, and SPSS does combine the questions and makes new variables. Now I get: Likeabilityad1version1, Likeabilityad1version2, Likeabilityad2version1, Likeabilityad2version2, Likeabilityad3version1, Likeabilityad3version2, Likeabilityad4version1 and Likeabilityad4version2.

    However, when I want to create a new variable that gives the general likeability of version 1 (so with all the version 1 ads combined) and the general likeability of version 2 (so with all the version 2 ads combined), I get an error. The reason for this is, that SPSS only calculates a result for each participant that either answered the questions for all versions 1 or all versions 2.

    My question is, how can I create a variable with which I can measure if people liked version 1 or version 2 better?

    I’m sorry if this sounds all way too complicated. Please let me know if I need to provide more details. I would really appreciate your help.

    Many thanks in advance!

  7. Hello, I am writing my Thesis about employee satisfaction and I have 50 questions with 5 options: Agree, strongly agree, neutral, disgaree, strongly disagree. It will be very confusing and long if I analyse each question on its own. How can I give a clear overview on the results keeping in mind that I have 350 questionnaires?

    1. I think that despite the risk of seeming confusing and long, the data should be presented in full, for reasons of transparency.

      That said, I imagine that the 50 questions form ‘groups’ of similar questions, with each group measuring one underlying construct (or ‘latent variable’). You may want to summarise the information in these groups, and here’s some advice on how to do that: https://achilleaskostoulas.com/2014/12/15/how-to-summarise-likert-scale-data-using-spss/

      Best of luck with your project!

  8. Dear Sir, I am lookin at impact of three IVs (Perception of Police Fairness, Perception of Police Effectiveness and Perception of Police Moral Solidarity with Community) onDV Police Legitimacy moderated by Perception of Judicary. Likert scale questionares to measure 3 variables were taken from a one source and the the other two were taken from a different source. As such some scales go (1=Strongly Agree to 5 Strongly Disagree) whereas one goes (1= Strongly Disagree – 6 = Strongly Agree). This implies likert values of one variable suggesting a good situation fall close to 1 whereas in the case of second variable scale is such that ‘goodness’ ought to be valued close to 6 or 5. Data has been collected and entered into SPSS. Would this create a problem in analysis ot should I reverse entire variable question groups. Please help

    1. That shouldn’t be a problem, unless you plan to combine the data in some way. However in the interest of making the data interpretation and display clearer, you might want to reverse the codes in one of the scales.

  9. I am conducting an employee engagement survey with 28 questions on a 5 point likert scale. No demegraphics such as age, Male or Femaile ect. was asked. What tests would you suggest running in SPSS to get the best results and show the data is or is not significant.

    Thanks

  10. i followed the steps and i couldn’t find the MEDIAN within the statistical function group. what can i do? and i tried the mean for a 5 likert scale and i found the descriptives in decimals (2- 2.4-2.6-2.8 till 5). how can i deal with it?
    thank you

    1. The median should be there (have a look at the screenshot). If it isn’t, you may want to take that up with the IT services at your university or IBM customer support. But do look carefully before contacting them.

  11. hello i would like to ask how to do if i have two variables (practices and awareness) with 9 questions each… late it would be used to compute for the significant relationship. thank you very much…

  12. Hi Achilea,

    I have a 10 likert scale 1= not at all, 10= very much. I will the subcategories of my questionnaire in order to get the mean as you suggest. Why do you suggest Median instead of Mean? Would you suggest the same for my case?

    Thanks a lot in advance!

    Giorgos

    1. Hi! I wouldn’t call that a Likert scale as such. The scale you’re using produces interval data, and a mean is an appropriate measure of central tendency in this case.

  13. Thanks for the info… I have been looking for discussions about merging likert scales, and also youtube videos bust they all use means but do not explain why, this is the first time I heard using median, which makes a lot of sense.

  14. hey,
    I have the similar question.
    I have 3 question all in scale –
    1 – strongly support,
    2 – support,
    3 – do not support
    4 – strongly do not support
    5 – no answear.

    So maybe you can help figure out, how can I group those 3 questions answers to get 25% respondents who are supporters un 25% who ar not?

  15. Hello,

    I have three questions for each independent variable. I have three independent variables.
    I need to perform Multiple Regression Analysis to find out the relationship between the dependent variable and the independent variable. How do I do this?

  16. dear sir..
    i need help
    my questionnaire is like this [description of questionnaire reducted] i need to do hypothesis testing in spss…. how do i do it…. i really need help…. please help me…

    1. Hi Bijay,

      Your hypotheses will derive from your research questions. It is hard to see how you can confirm or disprove them, without knowing what the research questions are, or any other information about your research project. Your advisor will be able to help you more than I can.

  17. Good day to you Achilleas.

    I have 29 items/survey questionnaire results measured using a 5 point Likert Scale. I’ve created my dependent variable for analysis, by calculating the median for these 29 items. When I run an ordinal regression or factor analysis, none of the data seems usable. To be clear, my dependent variable is ‘perceived effectiveness’, to be influenced by the categories of financial management, data collection, etc. with the 29 survey questions falling into these categories. Should I enter the category titles somewhere? I’ve just been using the questions results. Is there something else I should do for the dependent variable. Thank you!!! Cass.

  18. Good evening!

    Me and my co-author for our master thesis have just finished the collection of data through a questionnaire. We have used the five-point Likert scale for most of our statements. The scale ranges from 1 = Totally disagree to 5 = Totally agree. However, we have an issue now when we are suppose to start analyze the data in SPSS.

    The problem is that we have one independent variable and the dependent variable combined in the same statement. So, we don’t know how SPSS can understand what is the independent variable and what is the dependent variable.

    I will give an example to clarify:

    The dependent variable is: Intention to not choose the accounting profession
    One indepdent variable is: Personal interest in accounting
    Another independent variable is: Job opportunities in different occassions

    Based on the variables, statement 1 in our questionnaire is:

    I will not choose the accounting profession because I do not have a personal interest in accounting (1 = Totally disagree, 5 = Totally agree)

    Statemment 2 in our questionnaire is:

    I will not choose the accounting profession because I believe that other occupations offer higher job opportunities (1 = Totally disagree, 5 = Totally agree)
    ———————————

    So, as you see, each statement consists both of one independent variable and the dependent variable as well. How can we tell SPSS what is the dependent variable and what is the dependent variable? Do we have to create the dependent variable or how do we do?

    Thanks in advance!

    Regards,

    Per Karlsson

    1. Hi, I am not sure I really understand what the problem seems to be, so let me try to see if I get it.

      You have some respondents who want to choose in the profession, and some who don’t. This is your dependent variable. You also have a number of independent variables like, ‘personal interest in accounting’ etc. Your questionnaire contains some items that make an assumption about one variable (they will not choose the profession) and ask about the other (it is a boring profession). Does this sound about right?

      If that is the case, it seems to me that we must disentangle the two variables. To do this you must have (or create) a variable in SPSS to index ‘choice of profession’ or something along those lines. The values for this variable could be ‘yes’, ‘no’ and, possibly, ‘unknown’. Ideally, there should be a question along those lines somewhere in your questionnaire, and you get the values from there. If not, you are on shaky ground, but maybe you can infer values from the answers given to other questions. This is less than ideal, but still viable if you can transparently document what you did and why.

      So you create this variable, enter the values for each respondent, and use this as the independent variable for your analysis.

  19. hi i have a question.

    the study used a five-point scale(1=very low to 5=very high) to determine the level of self-efficacy. it also used the socio-economic status (1=poor to 6=rich) and wants to get the relationship between the two variables(level of self-efficacy and status). What I did is I got the median of levels of self-efficacy so that all of the data of the respondents would be combined in one variable (self-efficacy) and then crosstabulated the self-efficacy and status. Is this correct? The Chi-square result said there is no significant relationship between self-efficacy and status. Kindly advise. Thanks.

  20. thanks for replying immediately.

    in the median, there are decimals. i rounded off the decimals so that there will only be integers. Is this correct?

  21. one more question please :)
    the measure of the rounded-off median automatically changed to scale. Is it ok if i changed it to ordinal and entered the 5-point scale (1=very low to 5 very high) again on the values column?

    thanks a bunch.

  22. Hi there,

    This was a very helpful post! I was just curious as to why you do not recommend grouping the Likert scales as means and you recommend using medians?

    1. Hi, thanks for saying so.

      The data produced by Likert type items are, strictly speaking, ordinal data. That means that they can tell us how to rank responses (‘strongly agree’ is more agreement than ‘agree’) , but they do not give us information about the distance between them (‘strongly agree’ is not twice as much agreement as ‘agree’). Think of the medals in the Olympics: they can tell you if an athlete came first, second or third, but you cannot use them to calculate average speed.

      The median is a cruder statistic than the mean, because it does not take into account the distance or weighting of responses. In this case though, where the distance or weighting is unknown, it is the best statistic we can legitimately use.

  23. Dear Achilleas
    I appreciate for your endeavor and support to novices like me about ordinal data. Some scholars changed the 5 point Likert scale to continuous data and analyze it using t-test. So, is that possible to change it in that way and analyze it using t-test? second question, is that possible to use t-test and ANOVA in the ordinary data?

  24. Hi there Achilleas, thanks so much for your posts and helpful responses. I have a question in relation to your recent posts above regarding decimal results in the medians. If we accept the decimal results, does this not negate the reasons why we are using medians (rather than mean) for the ordinal data? The decimal median results are assuming the distance between each number is 1 (rather than being unknown and potentially variable).

    Also, the only decimal I appear to have after computing the median is a .5 one; that seems a big jump to me to round it up to the next whole number (so as to make the data ordinal)!

    Finally, I note that “mode” is not listed within the statistical functions in SPSS, I guess because it is not a calculation. I think I would feel most comfortable with working with a new variable that would create the mode value of a set of Liker-scale responses for each respondent. Do you know of a way that this is possible?

    I would be grateful for your thoughts on these three matters.

    Jenny :)

    1. Hi Jenny,

      Thanks for your comment. You’re quite right, the only decimal you can get when calculating the median is .5 , which is going to happen occasionally if you have an even number of responses. In this case, you just report the decimal, and you should not round it up. As you correctly point out, this is not really a decimal in the same sense as it would be with the mean; rather it’s just a conventional way of showing where the central tendency lies.

      I’m not sure which version of SPSS you’re using, but have you checked under ‘frequencies’? I think that you should be able to find the mode there.

      Hope that helps, and good luck with your project!

      1. Thanks so much for this help and advice Achilleas :) Any idea of a source I could reference to back-up this advice? It’s like looking for the proverbial needle in a haystack when searching for such specific details in a big book (and even online). As much as I’d love to be able to cite your blog, I’m not sure how well my supervisor will mark me on that :///// sorry!

        P.S. I’m using SPSS 25.

        Thank you so much! :)
        Jenny

      2. No worries about that, and I’d love to point you to some literature, but I’m out of office and don’t have access to my books, so I can’t be as specific as I would otherwise have been. I’m quite sure there will be something helpful in Andy Field’s Discovering Statistics, and in Daniel Muijs’ Doing Quantitative Research in Education, for all that’s worth

      3. Dear Achilleas,

        Thanks very much indeed. A library near to me has those books, so I shall go fetch them. If I struggle to find a reference, I’ll come back to you. (I’m not filled with hope at the ease of finding back-up for such specific advice from looking briefly at the books’ contents from Amazon’s “look inside” feature (at least without reading the whole book) but hopefully when the whole book is in front of me, I will be proved wrong :)

        In the meantime…. please can you confirm that you think it is safe for me to proceed with non-paramtric tests of my ordinal Likert data using a new MEDIAN Likert variable? I also plan to use the MEDIAN Likert variables from two different ordinal Likert scales to test any correlation between these two scales (i.e., using Spearman’s Rho).

        Many thanks once again Achilleas :)

        Jenny :)

  25. Hi’
    How do I test the impact of ten different entrepreneurship education teaching pedagogy (categorical data) on entrepreneurial intention (5-point Likert data)?

    1. It’s hard to give specific advice without knowing about your data set. Crosstabs, followed by chi-square to test for statistical significance, is a relatively safe choice. However, it’s likely to work best if you have a large number of participants.

  26. Dear Achilleas,
    It is Jenny here again – we already exchanged some messages recently. I now have Andy Field’s Discovering Statistics book and some other quantitative analysis books; however, cannot find an explicit mentioning (for reference/citation purposes in my dissertation) that it is OK or advisable to compute a new median score variable for Likert Data and to then use this in non-parametric tests. They are big books though, so without reading the whole book cover-to-cover, I cannot guarantee that I’ not missing anything.
    Are you with your books yet and able to confirm your source for this recommendation?
    Many thanks indeed and sorry to bother you again :)
    Jenny :)

  27. Dear Achilleas,

    Thanks for the great blog above.

    I have a question. I’m doing a dissertation about the different learning styles between Western and Asian students. My questionnaire consists of 62 5-point questions for 4 learning styles and each learning style is defined by 14-18 questions in the questionnaires. I divided the data into 2 groups (Asian and Western) to run statistics before doing the comparison. However, when I run Median for each learning styles of each student, I receive a lot of same results. For instance, a student can have the same median for 2 or 3 out of 4 learning styles. Hence, I’m not sure which tests I should use to find out the learning styles for Asian and Western students in my research.

    I really need your advice on this as I’m quite new to SPSS and statistics.

    Hope to receive your reply soon!

    Best wishes,
    Nguyet

    1. One thing you could do, if you need more granular information, is to add the scores from the questions that correspond to each style. This might show some differentiation that is lost when averaging.

      Once you have those scores, you can run a cross-tab and chi square to see if there is any statistically significant difference between the Asian and Western groups.

      Hope that helps!

  28. hello Sir,

    I am preparing my thesis on the topic “EFFECTIVENESS OF FACEBOOK MARKETING ON CONSUMERS BUYING BEHAVIOR” and my one of the hypothesis is ” There is no significant association between brand image and buying behavior of consumers.” i have Likert scale data on brand image and consumer behaviour with each having 4 statement and data is collected ..

    my question are;
    1. how to use Likert scale data in analysis via SPSS
    2. which test is suitable for my study?

    Please, i really need your advice…as soon as possible

    1. The safest choice is to do crosstabs and chi-square, if you have enough data. In you are confident that your data function like interval data, you could also try a correlational analysis. I am sorry I cannot give you more specific advice without knowing more about your data. Your advisor should be able to give you more specific guidance.

  29. and i have 167 respondent in total. my advisor is out of reach so i need your advice…i have computed data as you have shown above and crosstab between consumer behaviour and brand image

    and output of SPSS as follows;

    Consumerbehaviour * Brandimage Crosstabulation
    Brandimage
    Consumerbehaviour SA A N D SD Total
    SA 5 8 0 0 0 13
    A 5 18 30 0 2 55
    N 5 32 42 2 0 81
    D 0 0 1 6 0 7
    SD 0 5 0 2 4 11
    Total 15 63 73 10 6 167

    SA= Strongly Agree, A= Agree N= Neutral , D Disagree, SD Strongly Disagree

    Chi-Square Tests
    Value df Asymp. Sig. (2-sided)
    Pearson Chi-Square 153.983a 16 .000
    Likelihood Ratio 95.235 16 .000
    Linear-by-Linear
    Association 26.810 1 .000
    N of Valid Cases 167

             a. 19 cells (76.0%) have expected count less than 5. The minimum 
                   expected count is .25.           
    

    is this Okay? please help me sir…

    1. No, it doesn’t look good. You have too many categories and not enough data.

      If you notice, under the crosstab there’s a line that says that 76% if your cells have an expected count <5. This is too high and it skews your statistics.

      There are three options, now: (a) report the data as are, and suggest that readers exercise caution in the interpretation; (b) get more data; or (c) consolidate your categories, by merging values like ‘agree’ and ‘strongly agree’.

      I am sorry your advisor is out of reach, but you should really talk to him or her, if they are statistically competent – this looks like it needs lots of help.

  30. Hello sir, I have data in which there are 5 categories for Work Life Balance such as Management, Role Conflict, Role Ambiguity, Work Overload and Peer Support. Under each of these categories there are 5 questions. I have merged their respective question using median. Now what further analysis can I do using these median data of each categories?
    There are lot of missing values under each category. So is it correct to calculate median because there are different counts for missing values for each question.?

    1. As I have repeatedly written in this blog, the statistical tests you should do depend on what you are trying to find out, ie your research questions. As I do not know these, or the particularities of your dataset, it would be inappropriate for me to give you specific advice. If this study is part of a student project, you should be asking your supervisor, who may be able to provide specific tips or literature.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.