Designing better questionnaires: Using Scales

This is the fourth in a series of five introductory blog posts on questionnaire design. In previous posts, I talked about question wording, possible bias, responses and sequencing of questions, and I discussed the demographics section of questionnaires. In this post, I will look into how Likert items, Likert scales, and forced choice can be used to elicit information from respondents.

Embed from Getty Images

What are Likert items?

Likert items (and scales; see below) are very good at measuring constructs like beliefs and attitudes. A Likert item consists of a statement followed by a list of possible responses. The list is bivalent and symmetrical, and the responses are often anchored to numerical descriptors. Here’s an example:

Example 1
The next Doctor Who should be cast as a female role.
1=Strongly Agree
3=Not sure
5=Strongly Disagree

In this example, the item consists of a statement and five response options. The list of options is bivalent: it extends in the directions of both ‘agreement’ and ‘disagreement’. The responses are symmetrically arranged around a neutral value (‘not sure’). In other  Likert items, the neutral value might not be explicitly stated (scroll down to see why), but the list is still symmetrical. Note however, that scales with percentages, or scales ranging from ‘never’ to ‘always’ are not Likert scales.

Some statisticians argue that the response options such as the one shown above are evenly spaced (equidistant), or that we can at least pretend that they are. This makes intuitive sense, and it is a necessary for running a number of useful statistical tests. For instance, we can then calculate the mean, or weighted average of the responses.

However, to do this you would need to make three assumptions. First, you must assume that psychological constructs can be measured with precision; secondly, that such precision can be linguistically mapped; and thirdly that all respondents will interpret the descriptors in a similar way. This seems a lot like wishful thinking to me, and I would rather adjust research methods to reality, rather than vice versa.

I will therefore treat the information these scales produce as ordinal data for the purposes of analysis – this means that we can calulate the frequency of each response (e.g., how many people strongly agree), we can add responses (e.g., how many people express some form of agreement), calculate percentages. We can also calculate the central tendency using the median, and the spread of responses, using the total and interquartile ranges of responses. I think it is quite enough.

Embed from Getty Images

Likert items work best in groups

Like most quantitative methods, Likert items can efficiently generate lots of data; on the other hand, these data can be misleading, because the scales are very sensitive to the wording of the items. For example, there is strong empirical evidence showing that support for free speech in the US is much higher when the questions contain the word ‘forbid’ rather than ‘not allow’ (a phenomenon known as the ‘forbid/allow asymmetry’). Even though the words are logical opposites, they elicit different responses: participants generally object to ‘forbidding’ free speech, but they are less strongly opposed to ‘not allowing’ some forms of expression. To moderate for the effect of item wording, it is best to use several variants of the same item in a questionnaire, and derive a composite score from the responses. Here’s one way to do this:

Example 2
4. I enjoy science fiction shows.
1=Strongly Agree, 2=Agree, 3=Disagree, 4=Strongly Disagree
12. I watch science fictions shows whenever I can.
1=Strongly Agree, 2=Agree, 3=Disagree, 4=Strongly Disagree
16. I am a science fiction fan.
1=Strongly Agree, 2=Agree, 3=Disagree, 4=Strongly Disagree
17. I dislike science fiction.
1=Strongly Agree, 2=Agree, 3=Disagree, 4=Strongly Disagree

A cluster of such related items, which probe the same underlying construct, produces a Likert scale. The items that make up a Likert scale, by the way, don’t need to be presented together in your questionnare. In fact, it may be advantageous to spread them out across the questionnaire, so that their sequencing does not influence responses. In the example above, I have used random numbers before each item, to simulate spread in a larger questionnaire.

To ensure that all the items in the Likert scale measure the same underlying construct, it is necessary to pilot the scale with a relatively large number of participants. We can then calculate the internal consistency of the scale (using a metric called Cronbach’s alpha), and eliminate any items that do not work well. Looking above, item 12 is suspect, because it seems to measure behaviour rather than attitudes. If piloting also suggests that the scale works better without it, then we would have to remove the item from the final version of the questionnaire (or, depending on when we found out, we might have to remove the data it produced from our calculations).

Derive the score of a Likert scale involves three steps. First, we reverse any negatively worded items. Number 17, above, is negatively worded, so we will code ‘Strongly disagree’ as 4, and so on. Next, we remove from the scale any item that systematically generates different responses from the others (see previous paragraph). Finally, we add the score that was produced – in this case, a number ranging from 4 to 16. Alternative techniques, like assigning each response a value from 0 to 3, or from -2 to +2 are also fine, but it is important to be transparent about what you did, so make sure you document every step of the process and report it in your ‘methods’ section.

Embed from Getty Images

Using forced choice

Most commonly, Likert items contain five (or seven) options, which are arranged around a neutral response such as ‘neither agree or disagree’. This beautifully symmetric format can give rise to the ‘central tendency bias’, which is what happens when participants systematically select the uncontroversial middle option. This might happen because of respondent fatigue, or sometimes it is a deliberate strategy to avoid committing to an opinion. Either way, such responses give very few usable insights, so we need to discourage them.

One of the simplest ways to counteract the central tendency bias is to use scales with an even number of responses. In Example 2, above, I used forced-choice (or ipsative) items. These are items from which the ‘neutral’ option has been removed, leaving an even number of options. This forces participants to either agree or disagree with the statement. Forced-choice items with a small number of responses are very effective in eliciting attitudes that participants might otherwise feel inclined to suppress.

Further reading

I have written extensively about Likert scaling in this blog. Some relevant posts are:

Other online resources which you may wish to consult are listed below:


In the next, final, post to this series, I will discuss ways to make your questionnaire layout more effective. Till then!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s