A key feature of good science, especially if you happen to adhere to a positivist world-view, is its replicability. Simply put, it means that if our methods are sound, other researchers working on the same question should reach the same (or at least very similar) results. In the social sciences, however, perfect agreement between studies is rare, sometimes suspicious, and at the very least mundane. It seems, in fact, that the most interesting aspects of overlapping studies are the points where they differ.
In the social sciences, however, perfect agreement between studies is rare, sometimes suspicious, and at the very least mundane.Tweet
It was such a clash of findings that prompted this note. A colleague and I recently presented a conference paper, in which we talked about the pilot phase of a Teaching English for Young Learners (TEYL) after-school programme. In our presentation, we noted -among other things- that pupils and parents in a certain school seemed sceptical about introducing foreign language teaching in the early education curriculum. This particular finding is displayed in Figure 1, which shows that most participants in our study believed it was more appropriate to introduce English lessons in the 3rd Form or later, i.e., when children were at least 9 years old.
In the discussion that followed, Prof. Thomais Alexiou (University of Thessaloniki) was kind enough to share unpublished findings from a much larger project in which she was involved, which seem to be at variance with our conclusions. In their study, which evaluated the effectiveness of PEAP, a TEYL programme involving 800 schools across Greece, they found that parents were initially ambivalent towards such programmes, but developed positive attitudes later on.
As I said above, such discrepancies are not uncommon, and they need not be seen as a threat to the validity of either study. Rather, by taking these differences into consideration, and by attempting to account for them, we should be able to end up with more robust interpretations, and this is what I will try to do in the present post.
So, why did our findings differ? Here are four hypothetical explanations about why this discrepancy came to be.
They are just more effective than we were
The most obvious explanation that might account for the differences in our findings is that the PEAP project simply manages to influence parental attitudes more than our pilot programme did. This is hardly surprising, given the amount of resources and institutional support available to them, and the fact that their project now draws on the accumulated experience of three years of implementation. If that is the case, it may prove fruitful to further investigate which features of the PEAP project contributed most to their success, and build upon them.
Their study was more sensitive than ours
A second interpretation, which complements the one above, is that any TEYL programme that is implemented well can counter parental scepticism, and that our study somehow masked this effect. Indeed, there is some evidence in our data that this may be the case: By cross-tabulating parental attitudes against the age of their children, we found that parents of first- and second-form pupils (among whom our pilot project was implemented) tended to hold more positive views towards TEYL, compared to parents of older pupils. Using a standard test of statistical significance (chi-square), we determined that the chances of this distribution being a mere statistical fluke were a little higher than one in twenty (p = .051). Unfortunately, this is slightly over generally accepted thresholds of statistical significance (p < .05), and therefore we felt reluctant to draw conclusions from it. It is possible that the small size of our sample may have skewed these statistics, but short of repeating the study with more participants, there is little one can do to confirm or disprove this hypothesis.
I would argue, however, that if there is merit to this interpretation, i.e., that any well-structured school based interventions can have an immediate measurable effect on societal attitudes, then perhaps discussion should move towards the political and ideological implications of Teaching English to Young Learners. In other words, if we can empirically establish that we can change the way stakeholders think, we need to give some serious thought as to what we are making them believe and why.
It may also be the case that the discrepancy in findings may be a product of different demographics. The PEAP findings have yet to be published in full, so much of the discussion in this section is speculative [update: A summary of their published findings can be read here]. However, it stands to reason that their data, taken from 800 schools across Greece, constitute a reasonably representative cross-section of Greek society. This was not the case at the school we worked with: although the school was not officially selective, it seemed to attract students from the upper strata of local society. For instance, nearly all our respondents were university educated, and almost a third of our sample had an advanced studies degree (M-level or PhD). That being the case, it would be very surprising if the knowledge and skills commensurate to such education did not influence the views proffered by our respondents.
|Secondary education||5 (7,1%)|
|Four to six years of undergraduate education||41 (58,6%)|
|Advanced university education (M-level, Doctorate)||23 (32,9%)|
It is also plausible that the dynamics between highly-educated parents and the school may have resulted in the former being more forthright with their opinions compared to the general population. Social scientists often have to contend with what has been termed the social desirability bias, i.e., the tendency among survey participants to respond in ways that enhance their status, conform to mainstream ideology, and protect their self-image, even at the expense of factual accuracy (Consider, for instance, how you might respond to the question “How many books did you read this year?”).
I would intuitively think that the respondents in our sample may have been relatively less concerned about constructing socially acceptable identities, and they would not feel much pressure to provide us with positive feedback only. Conversely, I would argue that in the PEAP sample, there must have been more than a few respondents who have been accustomed to dealing with authority from a disenfranchised position. It would hardly be surprising if they provided the PEAP researchers with the data they wanted to hear. This seems even likelier if the respondents were concerned, however wrongly, that critical feedback might result in fallout from the school system against their children.
Uneven levels of quality control
Linked to this, one final hypothesis that might partially explain to the difference between our findings and those of the PEAP study pertains to the methodological design of the latter evaluation. Once again, most of the details in this section are anecdotal, but they will have to do, until more details about PEAP become available to academic scrutiny.
Unless I am mistaken, the PEAP evaluation was conducted by means of an anonymous questionnaire survey, which was administered and collected by the school teachers that participated in the project. Although questionnaire responses do offer a degree of anonymity, this seems to have been compromised by the small size of the participant groups (each teacher was responsible for collecting questionnaires from a small number of parents), the personal connections and power differentials between survey administrators and respondents, and the fact that standard precautions (e.g., sealed envelopes) were not used, presumably in an attempt to cut costs. Such arrangements must have exaggerated the social desirability bias, especially among the participants with low SocioEconomic Status (see above), and it is unclear –for the time being at least– how the PEAP team controlled for them.
Another reason why this arrangement is problematic is because the teachers who administered the survey had a professional stake in the future of the PEAP project, in that their continued employment depended on its success. To the best of my knowledge, there were no quality control safeguards in place, and the research team relied on the good faith of the PEAP teachers to prevent research malpractice. Moreover, to the best of my knowledge, none of the teachers who collected the data underwent training in data collection or research ethics, so the quality of the data collected must have been conditional on the integrity of the 800+ participating teachers. Under the circumstances, it is not inconceivable that at least some teachers may have prioritized their professional future over considerations of academic integrity, and that the PEAP data are –to some unknowable extent– aspirational.
Although there is no way of ascertaining to what extent the social desirability bias and –hopefully infrequent– research malpractice impacted the validity of the PEAP data, the credibility of their findings would be greatly enhanced if they could be independently confirmed.
In brief, the variance between our findings and those of the PEAP team is intriguing, although it seems impossible to tell whether it is a product of the different methods and goals of our studies, or whether it corresponds to actual differentiations in society at large. A number of hypothetical interpretations have been suggested, and in each case a possible suggestion for further investigation has been put forward. These are summarised in Table 2.
|The PEAP programme is more successful||Identify which aspects contribute to its success|
|Any well-designed TEYL intervention can change societal attitudes||a) Replicate our study with larger sample |
b) Problematise why attitudes should be changed and what attitudes are desirable.
|Differences in demographics (–> Differentiation of attitudes? Different levels of social desirability bias?)||Consider replicating PEAP evaluation by non-stakeholders.|
|Effect of quality safeguards on the PEAP study||Consider replicating PEAP evaluation by non-stakeholders.|
To conclude, I am very grateful to Prof. Alexiou for having pointed out how the PEAP data differ from our own. Reflection on these differences has been very helpful in highlighting the limitations of our own study and move towards a more valid synthesis. I have no doubt that the PEAP team would feel the same.
About this post: This post is based on my notes from the 1st Model/Experimental Schools conference in Thessaloniki, and it was written shortly after the event in May 2013. The post was last updated in June 2020, at which time it was re-formatted, but no changes have been made to its content.