When I was at university studying Psychology, I became very bored with the constant focus on research methods. I was young and fascinated by the subject, but I wanted to discuss ideas and conclusions, and I became petulantly frustrated by the insistence that the methods section was the first place that had to be analysed in detail: before we even knew what the study was about we were picking apart the sampling technique! It took a few years of teaching research methods myself, and a number of subsequent scandals in different areas of Psychology, to bring me round to understanding just how crucial research methods are for a thorough understanding of Psychology. Far from being that annoying and boring extra bit that gets tagged onto most Year 12 courses, it is really the central recurring idea at the heart of everything you do as a psychologist. With the aid of a few famous examples, allow me to explain.
In 1996 John Bargh and colleagues published a famous study into ‘social priming’, the idea that our behaviour can be unconsciously affected by clues in our environment. In the study, participants had to create sentences from lists of words that they were given. Half of the participants had words associated with age and the elderly in them (such as old, lonely, grey, elderly and wise). The other half had neutral words, with no age related connotations. After thanking them for making the sentences, the researcher told the participants that the elevator was down the hallway and let them leave... at which point the real test started. The basic procedure was replicated by the BBC in the video on the right. | |
What Bargh et al found was that the group ‘primed’ with stereotypes about the elderly walked more slowly down the corridor than the control group. They weren’t aware of moving more slowly when asked about it afterwards, but the times of the two groups showed a significant difference.
As you can imagine, such fascinating and thought-provoking results very quickly became famous within Psychology. Rather more surprisingly, however, was the fact that they seemed to also be pretty much accepted immediately. As all good Psychology students know, one key aspect of Psychology research is that the results should be reliable, and the way to show this is to repeat the experiment and to try to replicate the results. Replication is not perfect (Milgram’s experiment is a great example of a very reliable procedure where people still don’t agree on exactly what the results mean - more on that later), but it does at least indicate to us that we are dealing with a real phenomenon. The more unusual or surprising the findings, the more important it becomes to replicate them before we can have real confidence in the conclusions. It seems especially odd, then, that no one really tried to replicate Bargh et al’s experiment immediately after it was published. Two studies were done which semi-replicated the findings, but both of them changed significant aspects of the method, so they weren’t really true replications. It wasn’t until 2012 that real attempts were made to check Bargh et al’s findings, but before we get onto this study; a slight digression on why it might take 18 years for anyone to check that a famous experiment actually worked.
As you can imagine, such fascinating and thought-provoking results very quickly became famous within Psychology. Rather more surprisingly, however, was the fact that they seemed to also be pretty much accepted immediately. As all good Psychology students know, one key aspect of Psychology research is that the results should be reliable, and the way to show this is to repeat the experiment and to try to replicate the results. Replication is not perfect (Milgram’s experiment is a great example of a very reliable procedure where people still don’t agree on exactly what the results mean - more on that later), but it does at least indicate to us that we are dealing with a real phenomenon. The more unusual or surprising the findings, the more important it becomes to replicate them before we can have real confidence in the conclusions. It seems especially odd, then, that no one really tried to replicate Bargh et al’s experiment immediately after it was published. Two studies were done which semi-replicated the findings, but both of them changed significant aspects of the method, so they weren’t really true replications. It wasn’t until 2012 that real attempts were made to check Bargh et al’s findings, but before we get onto this study; a slight digression on why it might take 18 years for anyone to check that a famous experiment actually worked.
File drawers and quiet failuers
One of the most interesting aspects of looking at research methods in Psychology is that they reveal to us a lot about the psychology of psychologists (and scientists in general)! Imagine that you are a researcher, in your first proper position at a university and ready to do your first piece of real research. Would you rather a) Try to repeat an experiment that someone has already done to check their reliability... or b) try to find out something totally new (which will be far more likely to get published in a research journal than a replication study). In addition, would you a) like to produce a positive finding (which is also much more likely to get published and to raise publicity)... or b) a negative finding, where you have to report that there isn’t actually anything interesting going on.
Like almost all people, you probably answered “a” for both of those options, and science researchers are no different (it is their career and livelihood after all, so they can be forgiven for trying to do the thing which will benefit them the most). The problem is that this leads to huge bias in the way that research is done. Answering ‘a’ to the first scenario means that replication studies are far less likely to be done, and answering ‘a’ in the second means that even when they are done, negative results are often not published. They are just filed away at the bottom of a drawer somewhere and no-one ever knows about them - this is the famous ‘File-Drawer Problem’, which leads to unsuccessful research in all sciences being criminally underreported. Perhaps then... it isn’t that surprising that it took so long for anyone to check Bargh et al’s results replicated, or to raise their voice when they didn’t.
In 2012 Doyen et al replicated the study as closely as they could (though with a larger sample), but instead of experimenters timing the walk, they used infrared sensors. They found no difference in the times of the two groups. They also used experimenters who were ‘blind’ to the experimental condition (they didn’t know whether to expect slow of fast walks); again no difference. Eventually, in the words of science writer Ed Yong:
“they found that the volunteers moved more slowly only when they were tested by experimenters who expected them to move slowly… Let that sink in: the only way Doyen could repeat Bargh’s results was to deliberately tell the experimenters to expect those results.”
One of the most interesting aspects of looking at research methods in Psychology is that they reveal to us a lot about the psychology of psychologists (and scientists in general)! Imagine that you are a researcher, in your first proper position at a university and ready to do your first piece of real research. Would you rather a) Try to repeat an experiment that someone has already done to check their reliability... or b) try to find out something totally new (which will be far more likely to get published in a research journal than a replication study). In addition, would you a) like to produce a positive finding (which is also much more likely to get published and to raise publicity)... or b) a negative finding, where you have to report that there isn’t actually anything interesting going on.
Like almost all people, you probably answered “a” for both of those options, and science researchers are no different (it is their career and livelihood after all, so they can be forgiven for trying to do the thing which will benefit them the most). The problem is that this leads to huge bias in the way that research is done. Answering ‘a’ to the first scenario means that replication studies are far less likely to be done, and answering ‘a’ in the second means that even when they are done, negative results are often not published. They are just filed away at the bottom of a drawer somewhere and no-one ever knows about them - this is the famous ‘File-Drawer Problem’, which leads to unsuccessful research in all sciences being criminally underreported. Perhaps then... it isn’t that surprising that it took so long for anyone to check Bargh et al’s results replicated, or to raise their voice when they didn’t.
In 2012 Doyen et al replicated the study as closely as they could (though with a larger sample), but instead of experimenters timing the walk, they used infrared sensors. They found no difference in the times of the two groups. They also used experimenters who were ‘blind’ to the experimental condition (they didn’t know whether to expect slow of fast walks); again no difference. Eventually, in the words of science writer Ed Yong:
“they found that the volunteers moved more slowly only when they were tested by experimenters who expected them to move slowly… Let that sink in: the only way Doyen could repeat Bargh’s results was to deliberately tell the experimenters to expect those results.”
Maze-bright rats
We have known for a long time that the expectation of the experimenter can have a powerful effect on the results of an experiment. Either he experimenter unconsciously conveys to participants how they should behave, or they measure the results slightly differently for different groups. The experimenter may be totally unaware of the influence which s/he is exerting and the cues may be very subtle indeed but they have an influence nevertheless.
Rosenthal (1966) is famous for demonstrating how powerful these experimenter effects can be. He used several hundred of his students as experimenters and told one group that they would be studying a strain of maze-bright' rats (bred from intelligent stock) and the other half that they would be studying 'maze-dull' rats. In fact there was no great difference between the rats and they had been randomly assigned to the groups. Nevertheless, the supposedly brainy rats did learn to run the maze more quickly, according to the students’ results! In another of his studies Rosenthal found that male researchers were far more likely to smile at female participants than at male ones. Since this is likely to elicit a smile from the female participant, it means that any study on sex differences in co-operation or friendliness is liable to be spoilt.
In order to reduce these effects, carefully designed studies are often ‘blinded’. In a ‘single-blind’ condition, the participants do not know under which condition they are being tested. This prevents biased responding from participants (so-called ‘demand characteristics’), but doesn’t rule out bias from experimenters. Even better, then, to use a ‘double-blind’ design, where the experimenter does not know the conditions under which the participants are being tested. When Doyen et al double-blinded the Bargh et al design, the differences between the groups seemed to disappear. Sadly, the problems for Social Psychology don’t seem to end there.
We have known for a long time that the expectation of the experimenter can have a powerful effect on the results of an experiment. Either he experimenter unconsciously conveys to participants how they should behave, or they measure the results slightly differently for different groups. The experimenter may be totally unaware of the influence which s/he is exerting and the cues may be very subtle indeed but they have an influence nevertheless.
Rosenthal (1966) is famous for demonstrating how powerful these experimenter effects can be. He used several hundred of his students as experimenters and told one group that they would be studying a strain of maze-bright' rats (bred from intelligent stock) and the other half that they would be studying 'maze-dull' rats. In fact there was no great difference between the rats and they had been randomly assigned to the groups. Nevertheless, the supposedly brainy rats did learn to run the maze more quickly, according to the students’ results! In another of his studies Rosenthal found that male researchers were far more likely to smile at female participants than at male ones. Since this is likely to elicit a smile from the female participant, it means that any study on sex differences in co-operation or friendliness is liable to be spoilt.
In order to reduce these effects, carefully designed studies are often ‘blinded’. In a ‘single-blind’ condition, the participants do not know under which condition they are being tested. This prevents biased responding from participants (so-called ‘demand characteristics’), but doesn’t rule out bias from experimenters. Even better, then, to use a ‘double-blind’ design, where the experimenter does not know the conditions under which the participants are being tested. When Doyen et al double-blinded the Bargh et al design, the differences between the groups seemed to disappear. Sadly, the problems for Social Psychology don’t seem to end there.
In the last few years, with the increased focus on the methods used and the replicability of results from famous findings, more and more Social Psychology studies have come under fire. It started in fields similar to Bargh’s social priming - such as the finding that Dijksterhuis et al (1998)’s ‘intelligence priming’ experiment (where thinking about either a professor or a football hooligan before taking an IQ test sent people’s results up or down) didn’t replicate. It wasn’t helped by the sad case of Diederik Stapel, who was found to have invented or manipulated data for at least 55 of his published research papers, without anyone noticing. Slowly, though the net widened. In the last year, this has come to include the two most famous Psychology studies of all time, the Stanford Prison Experiment and Milgram’s shock experiments. Zimbardo’s experimenter bias was far more obvious than Bargh et al’s - he actively encouraged the behaviour he expected of the guards (this in addition to a host of other problems with the methods)! Milgram, for so long the other staple diet of first-year psychology students has been accused of finding out...well... nothing at all, due to his lack of controls, experimenter effect and flawed data analysis. The list goes on too... Asch’s ‘conformity studies’, Little Albert, the Hawthorne studies, the bystander effect - yet more sacred cows of Psychology being mercilessly slain on the alter of research methodology. One begins to wonder when it will all end, and indeed if we will ever be able to have confidence in any Psychology findings ever again!
Yet if a little existential doubt about our subject is the price we have to pay to move forward, then so be it. Much better to have a big spring clean and to clear out all of the old, cobwebbed research results than to spend another twenty years tripping over them. Although it moves slowly, the increased focus on replication studies and methodological checking of procedures in the last few years - including appeals from Nobel Prize-winning Psychology-celebrities (if such a thing exists) and the formation of the 'Many Labs Replication Project' - offers real hope of a new dawn of Social Psychology findings that we can have confidence in. It’s hardly that methodological problems are totally unique to Social Psych either (fMRI scans have famously taken a bashing for their statistical methods recently), and the mathematician John Ioannidis has even gone so far as to suggest that ‘most published research findings are false’, due to the faulty statistics used in the analysis of many papers.
Should we give up now then? Is there any point in even following Psychology research any further, given how flawed many of the results increasingly seem? Counter-intuitively perhaps, I see this as more of a cause for celebration than despair. It may seem pessimistic and depressing to be constantly re-evaluating research findings, but this process is exactly what science is supposed to go through! Far better this than that we credulously accept every theory that is put before us, swallowing whatever ideas are currently fashionable without ever checking the facts (the ‘paleo diet’, acupuncture or homeopathy spring to mind here). This is precisely the process which makes scientific results, whilst far from perfect, the best means that we have for generating knowledge about the world. As I’ve mentioned before on this blog, human beings are unbelievably complex organisms, who are therefore unbelievably difficult to study. All the more important then, that when we do try to study them we do it in the best way that we possibly can, and check our results as carefully as possible. And that, of course, all comes down eventually to our research methods - the most important thing in science!
Yet if a little existential doubt about our subject is the price we have to pay to move forward, then so be it. Much better to have a big spring clean and to clear out all of the old, cobwebbed research results than to spend another twenty years tripping over them. Although it moves slowly, the increased focus on replication studies and methodological checking of procedures in the last few years - including appeals from Nobel Prize-winning Psychology-celebrities (if such a thing exists) and the formation of the 'Many Labs Replication Project' - offers real hope of a new dawn of Social Psychology findings that we can have confidence in. It’s hardly that methodological problems are totally unique to Social Psych either (fMRI scans have famously taken a bashing for their statistical methods recently), and the mathematician John Ioannidis has even gone so far as to suggest that ‘most published research findings are false’, due to the faulty statistics used in the analysis of many papers.
Should we give up now then? Is there any point in even following Psychology research any further, given how flawed many of the results increasingly seem? Counter-intuitively perhaps, I see this as more of a cause for celebration than despair. It may seem pessimistic and depressing to be constantly re-evaluating research findings, but this process is exactly what science is supposed to go through! Far better this than that we credulously accept every theory that is put before us, swallowing whatever ideas are currently fashionable without ever checking the facts (the ‘paleo diet’, acupuncture or homeopathy spring to mind here). This is precisely the process which makes scientific results, whilst far from perfect, the best means that we have for generating knowledge about the world. As I’ve mentioned before on this blog, human beings are unbelievably complex organisms, who are therefore unbelievably difficult to study. All the more important then, that when we do try to study them we do it in the best way that we possibly can, and check our results as carefully as possible. And that, of course, all comes down eventually to our research methods - the most important thing in science!