As the results for the University Entrance Exams were announced today, I suppose it’s as good an opportunity as any to put down thoughts about the examination process. I will begin this post by laying out some groundwork about what makes a good test, and then move on to note my reservations about the way the University Entrance Exams are conducted.
Validity and Reliability
Two of the most important criteria that make an examination effective and fair are validity and reliability. Validity is a subjective measure of the overlap between the real-life skills that the exam purports to measure and the skills that are actually tested. To illustrate by means of an example: let’s assume that we need to measure how effective trainee receptionists are at dealing with English-speaking hotel guests. An examination task that simulates a desk-front encounter with a guest would be a fairly valid test. Having the trainees translate an extract from a tourist guide would be less valid for this particular purpose.
Reliability is a technical term that reflects how accurate an examination measurement is. A perfectly reliable test would be one that could consistently produce the same result, regardless of how well the candidate was feeling on that particular day, who examined the candidate, who marked the papers and so on. It would also be a test in which scores would be comparable across multiple sittings, so that candidates with similar ability would get similar scores no matter when or where they took the exams. In practice, there is no such thing as a perfectly reliable test, but reliability can be mathematically calculated so that we might take it into account when interpreting scores.
In education, where the most important qualities are hard to convert to numbers, the more reliable a test is, the less valid it tends to be.
Just to be perfectly clear: reliability and validity are independent criteria: this means that it is possible for any measurement to be very reliable but not valid for some uses or visa-versa. The weighing scales in my bathroom, for instance, are reasonably reliable (they will not give wildly different readings if I step down and climb back on), but I know for a fact that they produce a figure that is two kilograms below my actual weight, so the measurement is not so valid. The converse, i.e., validity without reliability, is also possible. I’ve been told that bus conductors in England used to ask children to put their right arm over their head and touch their left ear in order to determine if they were eligible for a reduced rate ticket: because young children tend to have proportionately larger heads than older ones, they would not be able to do this, which provided the conductor with evidence about their age. This test was valid enough for the needs of the conductor, but clearly its reliability was compromised by differences in individual anatomy, clothing and so on. In fact, in fields like education, where the most important qualities are hard to convert to numbers, it is often the case that the more reliable a test is, the less valid it tends to be.Embed from Getty Images
Just how effective are the University Entrance Exams?
Moving on to the topic of the University Entrance Exams, much has been made of their reliability, or rather more specifically of the fact that results are not vulnerable to undue influence. It is my contention that this claim is problematic in two ways: first, because it’s vacuous in the absence of evidence, and secondly because reliability is a necessary but not a sufficient condition for test effectiveness. The first strand of my argument is fairly straightforward: Unless the Ministry of Education release information about the facility and discriminatory value of each individual test item and the correlations between the marks of different raters, there is simply no way to confirm or disprove such claim. If the competent authorities are incapable of calculating these statistics on their own, or unwilling to do so, they could -at minimum- make the anonymised raw data publicly and freely available. One is not normally of a suspicious or cynical mind, but when the information that would provide us with indisputable proof of the Exams’ reliability is inexplicably withheld from scrutiny, questions are bound to be raised.
More importantly, though, I believe that the validity of the Exams as a placement test for tertiary education needs to be brought into question. Validity, in this case, is a measure of the match between the skills required for success at University and the skills that are actually tested. The premise underpinning the Exams is that the requisite academic skills are identical, whether one wants to become a lawyer, a teacher or a military officer . This is, in itself, a problematic assumption.
In the Exams we tend to test only what what can be reliably measured
Moreover, because of the high premium placed on reliability, in the Exams we tend to test only what what can be reliably measured. To illustrate: a hypothetical candidate who can ponder why the textbooks and the children’s literature of my generation referred to Basil the II as the ‘Slayer of Bulgarians‘ (Βουλγαροκτόνος), whereas contemporary textbooks call him ‘the Macedonian’ would likely make a better-than-average historian. But because such inquisitiveness cannot be measured reliably, candidates are tested in their ability to recall trivia that can be unambiguously marked, such as the start and end of said Emperor’s reign or his place of birth. The construct that is tested, it would seem, is considerably reduced in scope compared to the construct that needs to be measured.
To recap: the public confidence in the reliability of the University Entrance Exams is misplaced. Even conceding that the Exams produce reliable data about the candidates, it seems that it’s the wrong kind of data. In this sense, we are not unlike the proverbial drunkard who lost his keys in a dark alley but went to look for them under a lamp post, because “that’s where the light is”. As a test for entry into tertiary education, the exams should select those candidates who can go beyond the given truth, but we insist on testing them on what they’ve been taught, because we like to think it’s more ‘reliable’.
In writing these lines it was never my intention (I hasten to clarify) to trivialize the well-deserved success of those candidates who did pass the exams. For the most part, I think that they are intelligent and hard-working individuals, and it’s been a privilege to work with some of them before the exams and after they had entered University. What I mean is that if our vision of tertiary education includes inquisitive insights and social leadership, if we expect from our university entrants uncompromising personal ethics coupled with compassion and tolerance, and if our universities aim to foster the ability to unsettle and renew the intellectual and social status quo, then a major overhaul of the examination system is long overdue.