Elsewhere in this blog, I have written that a Likert scale might consist of several overlapping items. For instance, if I want to measure subjects’ attitudes towards sweets, I might ask them to record how they feel about the following statements:
Strongly Agree  Agree  Disagree 
Strongly Disagree 

1. I like chocolate.  
2. I like cookies  
3. Ι Iike whipped cream 
In order to interpret these data, we need to summarise the data in the scale. The safest way to do this is by estimating the median value of all the items. Using the same example as above, I need to create a new ‘supervariable’, which shows the mean of items (1), (2) and (3) for each respondent.
In the paragraphs that follow, I will show how to do this, using SPSS. I assume that you will already know how to define variables and values, how to toggle between the numerical expression and verbal descriptor of the values (i.e., you can make SPSS show responses as “strongly agree/agree/disagree/strongly disagree” or as “1/2/3/4”, and how to key in data. I will also assume that you have already established that the scale is internally consistent, so I will focus only on the technical aspects of merging the variables.
Starting out
Your starting point will be a dataset similar to Figure 1 below.
When you have typed in your data, and tested for the internal consistency of the scale (use Cronbach’s α), it’s time to create a new variable.
Merging the variables
From the top menu bar, select Transform > Compute variable. You should now see the following dialogue box.
 Assign a name to the new variable (e.g., Sweets)
 Scroll down the Function Group, and select Statistical
 From the functions that appear select the Median. [ΝΒ it is possible to select the mean, but I don’t recommend it]. At this point, the following formula should appear in the numerical expression box: Median ( , )
 Place the cursor in the brackets, select the variables you want to merge, and click on the arrow. Repeat with all the variables, separating them with comas.
 Click on OK.
Result
SPSS will automatically generate a new variable, which will appear at the end of your dataset. This will be in numerical form (1, 2, 3, …), but you can change it to a verbal descriptor for consistency (Figure 3). You can use this variable for descriptive statistics (e.g., estimate the central tendency and dispersion), crosstabulations, correlations and so on…
Now wasn’t that very easy?
Featured Image by Michael Kwan [CC BYNCND]
Hi,
I am doing likert scale questions. However, I use 2 questions under one category as the situation you shown above, there are 3 questions under one category, which is able to calculate the new variable through median. However, for my situation, if I use median, there will be decimal places for the number. How should I do? Please help me. Thank you so much
Not a problem – when there is an odd number of data points, there will be decimals in the median.
Hi, As the situation you shown above, you are using three questions for one variable, therefore you calculate median, however for my situation, I am using two questions for one variable, if i calculate median will be not approprate?because demical places.. or should I calculate by mode? I am really do not know how should I do the analysis?
It’s ok to have a decimal in the median.
Hi! Thanks for the post. It’s been really helpful. But a quick question. How do I create verbal descriptions if I gave a decimal value?
I’m glad you found it helpful. The reason why you have decimal values is because you have calculated the mean (or ‘average’). As you have intuitively found out, the mean doesn’t make much sense in Likert scales. That’s because the data that these scales produce are ordinal. There are researchers who claim that this is okay, but many people think that this is statistically wrong, and as you’ve found out, the results are hard to interpret. A less controversial thing to do is to calculate the median of the scale.
I have 5 different questions for identifying the mood or response of 1 independent variable. The rating scale is
5Strongly Agree
4Agree
3Neutral
2Disagree
1Strongly Disagree
The result of tested internal consistency is alpha 8.45. Now I want to compute the 5 questions having 5 tables into 1 table for my independent variable All questions are in ordinal measure. Some people suggest me to take the average of these 5 question but i see you used MEDIAN. Here im confused as to what should i take ???
I prefer using the median, for reasons that I have explained elsewhere in this blog. Many people use the mean (average), but I think this is not, strictly speaking, sound statistical practice.
ok thanks and after the MEDIAN result table, their measure changed from ordinal to Nominal in SPSS. The original data type was ordinal. Should i leave it nominal or as by question data type, Ordinal ? what should i select between nominal and ordinal ?
I am not sure why it changed. Your scale is Ordinal, as you said. It doesn’t really matter what the SPSS table reads – this is more of a reminder for you. If you are of an obsessive predisposition, you can change it back, but even if you don’t correct it, it shouldn’t affect results.
Hi,
I’ve a dataset with likert scale questions, in which participants weren’t required to answer all questions. I showed them 4 advertisements in total. I had 2 versions of each advertisement. The questionnaire tool I used assigned each participant randomly to a condition. So for example, participant 1 saw ad 1, version 2, ad 2, version 2, ad3, version 1 and ad 4, version 2. Participant 2 saw ad 1, version 2, ad 2, version 1, ad 3, version 1 and ad 4, version 2 etc. etc.
I asked the same set of questions (7 questions in total) after having shown each advertisement. These 7 questions combined measured the dependent variable likeability.
I wanted to combine the answers of the 7 questions, since it measures one dependent variable.
I used your technique, and SPSS does combine the questions and makes new variables. Now I get: Likeabilityad1version1, Likeabilityad1version2, Likeabilityad2version1, Likeabilityad2version2, Likeabilityad3version1, Likeabilityad3version2, Likeabilityad4version1 and Likeabilityad4version2.
However, when I want to create a new variable that gives the general likeability of version 1 (so with all the version 1 ads combined) and the general likeability of version 2 (so with all the version 2 ads combined), I get an error. The reason for this is, that SPSS only calculates a result for each participant that either answered the questions for all versions 1 or all versions 2.
My question is, how can I create a variable with which I can measure if people liked version 1 or version 2 better?
I’m sorry if this sounds all way too complicated. Please let me know if I need to provide more details. I would really appreciate your help.
Many thanks in advance!
Interesting situation you have there :)
When inputting data, how did you deal with missing values?
Thanks aloooooooot for ur help
Hello, I am writing my Thesis about employee satisfaction and I have 50 questions with 5 options: Agree, strongly agree, neutral, disgaree, strongly disagree. It will be very confusing and long if I analyse each question on its own. How can I give a clear overview on the results keeping in mind that I have 350 questionnaires?
I think that despite the risk of seeming confusing and long, the data should be presented in full, for reasons of transparency.
That said, I imagine that the 50 questions form ‘groups’ of similar questions, with each group measuring one underlying construct (or ‘latent variable’). You may want to summarise the information in these groups, and here’s some advice on how to do that: https://achilleaskostoulas.com/2014/12/15/howtosummariselikertscaledatausingspss/
Best of luck with your project!
Dear Sir, I am lookin at impact of three IVs (Perception of Police Fairness, Perception of Police Effectiveness and Perception of Police Moral Solidarity with Community) onDV Police Legitimacy moderated by Perception of Judicary. Likert scale questionares to measure 3 variables were taken from a one source and the the other two were taken from a different source. As such some scales go (1=Strongly Agree to 5 Strongly Disagree) whereas one goes (1= Strongly Disagree – 6 = Strongly Agree). This implies likert values of one variable suggesting a good situation fall close to 1 whereas in the case of second variable scale is such that ‘goodness’ ought to be valued close to 6 or 5. Data has been collected and entered into SPSS. Would this create a problem in analysis ot should I reverse entire variable question groups. Please help
That shouldn’t be a problem, unless you plan to combine the data in some way. However in the interest of making the data interpretation and display clearer, you might want to reverse the codes in one of the scales.
I am conducting an employee engagement survey with 28 questions on a 5 point likert scale. No demegraphics such as age, Male or Femaile ect. was asked. What tests would you suggest running in SPSS to get the best results and show the data is or is not significant.
Thanks
Depends on your research questions. Median and IQR seem like reasonable places to start.
i followed the steps and i couldn’t find the MEDIAN within the statistical function group. what can i do? and i tried the mean for a 5 likert scale and i found the descriptives in decimals (2 2.42.62.8 till 5). how can i deal with it?
thank you
The median should be there (have a look at the screenshot). If it isn’t, you may want to take that up with the IT services at your university or IBM customer support. But do look carefully before contacting them.
hello i would like to ask how to do if i have two variables (practices and awareness) with 9 questions each… late it would be used to compute for the significant relationship. thank you very much…
I am sorry May, I do not understand your question: are you asking me how to test if there is some kind of relationship between the two variables?
WHAT DOES IT MEAN IF I USE MEAN SCORES TO INTERPRET MY DATA
If your data is ordinal it means that your analysis is meaningless
Hi Achilea,
I have a 10 likert scale 1= not at all, 10= very much. I will the subcategories of my questionnaire in order to get the mean as you suggest. Why do you suggest Median instead of Mean? Would you suggest the same for my case?
Thanks a lot in advance!
Giorgos
Hi! I wouldn’t call that a Likert scale as such. The scale you’re using produces interval data, and a mean is an appropriate measure of central tendency in this case.
Thanks for the info… I have been looking for discussions about merging likert scales, and also youtube videos bust they all use means but do not explain why, this is the first time I heard using median, which makes a lot of sense.
i did reliability test but the value of cronbach alpha is o.65 the value contain negative sign what should i do ??
It means that the relations between the different items are very weak: they seem to be measuring different things. Do not combine them in a single scale!
hey,
I have the similar question.
I have 3 question all in scale –
1 – strongly support,
2 – support,
3 – do not support
4 – strongly do not support
5 – no answear.
So maybe you can help figure out, how can I group those 3 questions answers to get 25% respondents who are supporters un 25% who ar not?
I am sorry, I do not understand what you are asking.
Hello,
I have three questions for each independent variable. I have three independent variables.
I need to perform Multiple Regression Analysis to find out the relationship between the dependent variable and the independent variable. How do I do this?
I am afraid that I can’t answer this question in the space of a comment or a blog post. I suggest reading Chapter 8 in this book, if you can find it.
can you please help on how to analyze ranking questions ?
I am sorry, I do not understand what you are asking me.
dear sir..
i need help
my questionnaire is like this [description of questionnaire reducted] i need to do hypothesis testing in spss…. how do i do it…. i really need help…. please help me…
Hi Bijay,
Your hypotheses will derive from your research questions. It is hard to see how you can confirm or disprove them, without knowing what the research questions are, or any other information about your research project. Your advisor will be able to help you more than I can.
Hi i would like to know which version of SPSS is the median option available in the compute function.
All of them. You can also use Excel or any other spreadsheet to calculate it.
Good day to you Achilleas.
I have 29 items/survey questionnaire results measured using a 5 point Likert Scale. I’ve created my dependent variable for analysis, by calculating the median for these 29 items. When I run an ordinal regression or factor analysis, none of the data seems usable. To be clear, my dependent variable is ‘perceived effectiveness’, to be influenced by the categories of financial management, data collection, etc. with the 29 survey questions falling into these categories. Should I enter the category titles somewhere? I’ve just been using the questions results. Is there something else I should do for the dependent variable. Thank you!!! Cass.
I do not think I understand what you are describing: why are the data not usable?
thanks for posting comments Achilleas
Very useful ideas especially for post graduate students. Many thanks Achilleas.
Good evening!
Me and my coauthor for our master thesis have just finished the collection of data through a questionnaire. We have used the fivepoint Likert scale for most of our statements. The scale ranges from 1 = Totally disagree to 5 = Totally agree. However, we have an issue now when we are suppose to start analyze the data in SPSS.
The problem is that we have one independent variable and the dependent variable combined in the same statement. So, we don’t know how SPSS can understand what is the independent variable and what is the dependent variable.
I will give an example to clarify:
The dependent variable is: Intention to not choose the accounting profession
One indepdent variable is: Personal interest in accounting
Another independent variable is: Job opportunities in different occassions
Based on the variables, statement 1 in our questionnaire is:
I will not choose the accounting profession because I do not have a personal interest in accounting (1 = Totally disagree, 5 = Totally agree)
Statemment 2 in our questionnaire is:
I will not choose the accounting profession because I believe that other occupations offer higher job opportunities (1 = Totally disagree, 5 = Totally agree)
———————————
So, as you see, each statement consists both of one independent variable and the dependent variable as well. How can we tell SPSS what is the dependent variable and what is the dependent variable? Do we have to create the dependent variable or how do we do?
Thanks in advance!
Regards,
Per Karlsson
Hi, I am not sure I really understand what the problem seems to be, so let me try to see if I get it.
You have some respondents who want to choose in the profession, and some who don’t. This is your dependent variable. You also have a number of independent variables like, ‘personal interest in accounting’ etc. Your questionnaire contains some items that make an assumption about one variable (they will not choose the profession) and ask about the other (it is a boring profession). Does this sound about right?
If that is the case, it seems to me that we must disentangle the two variables. To do this you must have (or create) a variable in SPSS to index ‘choice of profession’ or something along those lines. The values for this variable could be ‘yes’, ‘no’ and, possibly, ‘unknown’. Ideally, there should be a question along those lines somewhere in your questionnaire, and you get the values from there. If not, you are on shaky ground, but maybe you can infer values from the answers given to other questions. This is less than ideal, but still viable if you can transparently document what you did and why.
So you create this variable, enter the values for each respondent, and use this as the independent variable for your analysis.
hi i have a question.
the study used a fivepoint scale(1=very low to 5=very high) to determine the level of selfefficacy. it also used the socioeconomic status (1=poor to 6=rich) and wants to get the relationship between the two variables(level of selfefficacy and status). What I did is I got the median of levels of selfefficacy so that all of the data of the respondents would be combined in one variable (selfefficacy) and then crosstabulated the selfefficacy and status. Is this correct? The Chisquare result said there is no significant relationship between selfefficacy and status. Kindly advise. Thanks.
The procedure sounds very reasonable. I am not sure what the question is though?
thanks for replying immediately.
in the median, there are decimals. i rounded off the decimals so that there will only be integers. Is this correct?
Yes, that night happen if you have an odd number of responses. Not a problem
one more question please :)
the measure of the roundedoff median automatically changed to scale. Is it ok if i changed it to ordinal and entered the 5point scale (1=very low to 5 very high) again on the values column?
thanks a bunch.
You mean in SPSS? Yes, it’s just trying to be helpful, but it isn’t :)
Hi there,
This was a very helpful post! I was just curious as to why you do not recommend grouping the Likert scales as means and you recommend using medians?
Hi, thanks for saying so.
The data produced by Likert type items are, strictly speaking, ordinal data. That means that they can tell us how to rank responses (‘strongly agree’ is more agreement than ‘agree’) , but they do not give us information about the distance between them (‘strongly agree’ is not twice as much agreement as ‘agree’). Think of the medals in the Olympics: they can tell you if an athlete came first, second or third, but you cannot use them to calculate average speed.
The median is a cruder statistic than the mean, because it does not take into account the distance or weighting of responses. In this case though, where the distance or weighting is unknown, it is the best statistic we can legitimately use.
Dear Achilleas
I appreciate for your endeavor and support to novices like me about ordinal data. Some scholars changed the 5 point Likert scale to continuous data and analyze it using ttest. So, is that possible to change it in that way and analyze it using ttest? second question, is that possible to use ttest and ANOVA in the ordinary data?
You could, if you can convincingly argue that the distance between the anchor points is equal. This link might help: http://www.statisticshowto.com/likertscaledefinitionandexamples/