Preregistration as a guide to reproducibility and scientific competence

This is a long post written for both professionals and curious lay people; the links below allow you to jump among the post’s sections. The links in all CAPS represent the portions of this post I view as its unique intellectual contributions.

Navigation: Prelude | Reproducibility | Conflicts of Interest | SCIENTIFIC COMPETENCE | Model | TEMPLATE

The Preregistration Knights who say Ni require a shrubbery instead of a garden of forking paths.Preregistration: prelude, problems addressed, and concerns

Psychology is beset with ways to find things that are untrue. Many famous and influential findings in the field are not standing up to closer scrutiny with tightly controlled designs and methods for analyzing data. For instance, a registered replication report in which my lab was involved found that holding a pen between your lips in a smiling pose does not, in fact, make cartoons funnier. Indeed, less than half of 100 studies published in top-tier psychology journals replicated.

But it’s not only psychology that has this problem. Only 6 out of 53 “landmark” cancer therapy studies replicated. An attempt to induce other labs to reproduce findings in cancer research has scaled back substantially in the face of technical and logistical difficulties. Nearly two thirds of relatively recent economics papers failed to replicate, though this improved to about half when the researchers had help from the original teams. In fact, some argue that most published research findings are false due to the myriad ways researchers can find statistically significant results from their data.

One proposal for solving these problems is preregistration. Preregistration refers to making available – in an accessible repository – a detailed plan about how researchers will conduct a study and analyze its results. Any report that is subsequently written on the study would ideally refer to this plan and hew closely to it in its initial methods and results descriptions. Preregistration can help mitigate a host of questionable research practices that take advantage of researcher degrees of freedom, or the hidden steps behind the scenes that researchers can take to influence their results. This garden of forking paths can transmute data from almost any study into something statistically significant that could be written up somewhere; preregistration prunes this garden into a single, well-defined shrub for any set of studies.

Yet prominent figures doubt the benefits of preregistration. Some even deny there’s a replication crisis that would require these kinds of corrections. And to be sure, there are other steps to take to solve the reproducibility crisis. However, I argue that preregistration has three virtues, which I describe below. In addition to enhancing reproducibility of scientific findings, it provides a method for managing conflicts of interest in a transparent way above and beyond required institutional disclosures. Furthermore, I also believe preregistration permits a lab to demonstrate its increasing competence and a field’s cumulative knowledge. 


Enhancing reproducibility

Chief among the proposed benefits of preregistration is the ability of science to know what actually happened in a study. Preregistration is one part of a larger open science movement that aims to make science more transparent to everyone – fellow researchers and the public alike. Preregistration is probably more useful for people on the inside, though, as it helps people knowledgeable in the field assess how a study was done and what the boundaries were on the initial design and analysis. Nevertheless, letting the general public see how science is conducted would hopefully foster trust in the research enterprise, even if it may be challenging to understand the particulars without formal training.

Here are some of the problems preregistration promises to solve:

  • Hypothesizing After the Results are Known (HARKing): You can’t say you thought all along something you found in your data if it’s not described in your preregistration.
  • Altering sample sizes to stop data collection prematurely (if you find the effect you want) or prolong it (to increase the power, or the likelihood you have to detect effects): You said how many observations you were going to make, so you have a preregistered point to stop. Ideally, this stopping point would be determined from a power analysis using reasonable assumptions from the literature or basic study design about the expected effect sizes (e.g., differences between conditions or strengths of relationships between variables).
  • Eliminating participants or data points that don’t yield the effect you want: There are many reasons to drop participants after you’ve seen the data, but preregistering reasons for eliminating any participants or data from your analyses stops you from doing so to “jazz up” your results.
  • Dropping variables that were analyzed: If you collect lots of measures, you’ve got lots of ways to avoid putting your hypotheses to rigorous tests; preregistration forces you to specify which variables are focal tests of your hypothesis beforehand. It also ensures you think about making appropriate corrections for making lots of tests. If you run 20 different analyses, each with a 5% chance (or .05 probability) of yielding a result you want (a typical setup in psychology), then you’re likely to find 1 significant result by chance alone!
  • Dropping conditions or groups that “didn’t work”: Though it may be convenient to collect some conditions “just to see what happens”, preregistering your conditions and groups makes you consider them when you write them up.
  • Invoking hidden moderators to explain group differences: Preregistering all the things you believe might change your results ensures you won’t pull an analytic rabbit out of your hat.

Many of these solutions can be summed up in 21 words. Ultimately, rather than having lots of hidden “lab secrets” about how to get an effect to work or a multitude of unknown ingredients working their way into the fruit of the garden of forking paths, research will be cleanly defined and obvious, with bright and shiny fruit from its shrubbery.


Managing conflicts of interest

As I was renewing my CITI training (the stuff we researchers have to refresh every 4 years to ensure we keep up to date on performing research ethically and responsibly), I also realized that preregistration of analytic plans creates a conflict of interest management plan. Preregistered methods and data analytic plans ensure researchers to describe exactly what they’re going to do in a study. Those plans can be reviewed by experts to detect ways in which their own interests might be put ahead of the integrity of the data or analyses in the study, including officials at an individual’s university, at a funding agency, or in a journal’s editorial processes. Conscientious researchers can also scrutinize their own plans to see how their own best interests might have crept ahead of the most scientifically justifiable procedures to follow in a study.

These considerations led the clinical trials field to adopt a set of guidelines to prevent conflicts of interest from altering the scientific record. Far more than institutional disclosure forms, these guidelines force scientists to show their work and stick to the script of their initial study design. Since adopting these guidelines, the number of clinical trials showing null outcomes has increased dramatically. This pattern suggests that conflicts of interest may have guided some of the positive findings for various therapies rather than scientific evidence analyzed according to best practices. The preregistered shrub may not bear as much fruit as the garden of forking paths, but the fruit preregistered science bears is less likely to be poisonous to the consumer of the research literature.


Demonstrating scientific competence and cumulative knowledge

One underappreciated benefit of preregistration is the way it allows researchers to demonstrate their increasing competence in an area of study. When we start out exploring something totally new, we have ideas about basic things to consider in designing, implementing, and analyzing our studies. However, we often don’t think of all the probable ways that data might not comport with our assumptions, the procedural shifts that might be needed to make things work better, or the optimal analytic paths to follow.

When you run a first study, loads of these issues creep up. For example, I didn’t realize how hard it was going to be to recruit depressed patients from our clinic for my grant work on depression (especially after changing institutions right as the grant started), so I had to switch recruitment strategies. Right as we were starting to recruit participants, there was also a conference talk in 2013 that totally changed the way I wanted to analyze our data, as the mood reactivity item was better for what we wanted to look at than an entire set of diagnostic subtypes. In dealing with those challenges, you learn a lot for the second time you run a similar study. Now I know how to specify my recruitment population, and I can point to that talk as a reason for doing things a different way than my grant described. Over time, I’ll know more and more about this topic and the experimental methods in it, plugging additional things into my preregistrations to reflect my increased mastery of the domain.

Ideally, the transition from less detailed exploratory analyses to more detailed confirmatory work is a marker of a lab’s competence with a specific set of techniques. One could even judge a lab’s technical proficiency by the number of considerations advanced in their preregistrations. Surveying preregistered projects for various studies might let you know who the really skilled scientists in an area are. That information could be useful to graduate students wanting to know with whom they’d like to work – or potential collaborators seeking out expertise in a particular topic. Ideally, a set of techniques would be well-established enough within a lab to develop a standard operating procedure (SOP) for analyzing data, just as many labs have SOPs for collecting data.

In this way, the fruits of research become clearer and more readily picked. Rather than taking fruitless dead ends down the garden of forking paths with hidden practices and ad hoc revisions to study designs, the well-manicured shrubbery of preregistered research and SOPs gives everyone a way to evaluate the soundness of a lab’s methods without ever having to visit. Indeed, some journals take preregistration so seriously now that they are willing to provisionally pre-accept papers with sound, rigorous, and preregistered methodology. Tenure committees can likewise peek behind the hood of the studies you’ve conducted, which could alleviate a bit of the publish-or-perish culture in academia. A university’s standards could even reward an investigator’s rigor of research beyond a publication history (which may be more like a lottery than a meritocracy).


A model for confirmatory and exploratory reporting and review

In my ideal world, results sections would be divided into confirmatory and exploratory sections. Literally. Whether written as RESULTS: CONFIRMATORY and RESULTS: EXPLORATORYPREREGISTERED RESULTS and EXPLORATORY RESULTS, or some other set of headings, it should be glaringly obvious to the reader which is which. The confirmatory section contains all the stuff in the preregistered plan; the exploratory section contains all the stuff that came after. Right now, I would prefer that details about the exploratory analyses be kept in that exploratory results section to make it clear it came after the fact and to create a narrative of the process of discovery. However, similar Data Analysis: Confirmatory and Data Analysis: Exploratory or Preregistered Data Analysis and Exploratory Data Analysis sections might make it easier to separate the data analytics from the meat of the results.

It’s also important to recognize that exploratory analyses shouldn’t be pooh-poohed. Curious scientists who didn’t find what they expected could systematically explore a number of questions in their data subsequent to its collection and preliminary analysis. However, it is critical that all deviations from the preregistration be reported in full detail and with sufficient justification to convince the skeptical reader that the extra analyses were reasonable to perform. Much of the problem with our existing literature is that we haven’t reported these details and justifications; in my view, we just need to make them explicit to bolster confidence in exploratory findings.

Reviewers should ask about those justifications if they’re not present, but exploratory analyses should be held to essentially the same standards as we hold current results sections. After all, without preregistration, we’re all basically doing exploratory analyses! As time passes, confirmatory analyses will likely hold more weight with reviewers. However, for the next 5-10 years, we should all recall that we came from an exploratory framework, and to an exploratory framework we may return when justified. When considering an article, reviewers should also look carefully at the confirmatory plan (which should be provided as an appendix to a reviewed article if a link that would not compromise reviewer anonymity cannot be provided). If the researchers deviated from their preregistered plan, call them on it and make them run their preregistered analyses! In any case, preregistration’s goals can fail if reviewers don’t exercise due diligence in following up the correspondence between the preregistration and the final report.

The broad strokes of a paper I’m working on right now demonstrates the value of preregistration in correcting mistakes and the ways exploratory results might be described. I was showing a graduate student a dataset I’d collected years before, and there were three primary dependent variables I planned on analyzing. To my chagrin, when the student looked through the data, that student pointed out one of those three variables had never been computed! Had I preregistered my data analytic plan, I would have remembered to compute that variable before conducting all of my analyses. When that variable turned out to be the only one with interesting effects, we also thought of ways to drill down and better understand the conditions under which the effect we found held true. We found these breakdowns were justifiable in the literature but were not part of our original analytic plan. Preregistration would have given us a cleaner way to separate these exploratory analyses from the original confirmatory analyses.

In any future work with the experimental paradigm, we’ll preregister both our original and follow-up analyses so there’s no confusion. Such preregistration also acts as a signal of our growing competence with this paradigm. We’ll be able to give sample sizes based on power analyses from the original work, prespecify criteria for excluding data and methods of dealing with missing values, and more precisely articulate how we will conduct our analyses.


My template

Many people talk about the difficulties of preregistering studies, so I advance a template I’ve been working on. In it, I pose a bunch of questions in a format structured like a journal article to guide researchers through questions I’d like to have answered as I start a study. It’s a work in progress, and I hope to add to it as my own thoughts on what all could be preregistered grows. I also hope we can publish some data analytic SOPs along with our psychophysiological SOPs that we use in the lab (a shortened version of which we have available for participants to view). I hope it’s useful in considering your own work and the way you’d preregister. If this seems too daunting, a simplified version of preregistration that hosts the registration for you can get you started!

Home safety and child welfare

A messy home

As the heat of summer washes over the country, basic home safety becomes a concern. Sometimes, parents become worried that their messy houses might cause Child Protective Services to view them as unfit parents. A new paper from my research collaborators and I has shown that even in homes with genuine safety concerns, the beauty of a home (or lack thereof) isn’t associated with being child abuse potential or socioeconomic status. Thus, it doesn’t appear that messy homes come from abusive parenting environments, and unattractive or unsafe are just as likely to be found in poorer and richer neighborhoods.

We found that trained assessors and people inhabiting homes had reasonable agreement about the beauty of the homes, but they didn’t agree on the safety risks present in the home. Part of that may have been because the trained assessors had checklists with over 50 items to check over in each room to assess safety and appearance, whereas the occupants of the homes only provided summary ratings of room safety and appearance on a 1-6 scale. It’s probably easier to give an overall judgment of the attractiveness of a room than to summarize in your mind all the possible safety risks that exist.

Because it’s so hard to notice these safety risks without a detailed guide, the assessment we developed can also be used as a way to point parents to specific things to fix in the home to make their children’s environment safer. We didn’t want people overwhelmed when thinking about what to clean up or make safer – rather, we wanted to give people specific things to address. We’ll be interested to see if people are better able to make their homes cleaner and safer places with the help of that assessment.

#BlackLivesMatter, #BlueLivesMatter, and radical empathy

Empathic conundrums
Empathic conundrums

Empathy is a multifaceted beast, and it can get us into trouble when social upheavals strike.

There’s a bunch of measures of empathy, but many of them make a distinction between cognitive empathy (being able to think like someone else) and emotional or affective empathy (being able to feel like someone else).  Within cognitive empathy, we often speak of perspective taking (the ability to put yourself and potentially adopt in another’s mindset) as a critical skill. In contrast, we often talk about empathic concern (feeling sympathy with or concern for those less fortunate) as an important part of emotional empathy.

When confronted with tragedy, we often extend empathic concern toward those most like us. This concern is associated with experiencing the same patterns of brain activity when seeing someone else (who is similar to you, or part of your ingroupfeeling sad as when feeling sad yourself. However, this isn’t typically the case for people who aren’t similar to you, or those who are part of your outgroup. Even chimpanzees have a hard time empathizing with other primates who aren’t chimpanzees. In fact, it’s often the opposite.

Specifically, people in your outgroup who also seem to have the capability to harm your ingroup are more likely to elicit smiles when they’re hurt and to be more likely to be volunteered to receive electric shocks. But how do we know who’s likely to harm people like you – your ingroup? Though researchers have sometimes used culturally normative definitions of such people in their work, I would argue it’s important to examine people’s own beliefs to assess this notion. For instance, the rising notion of “black privilege” suggests that whites have myriad opportunities stripped from them on account of racial preference. Conversely, lists of ways to avoid being killed by police circulate in the black community.

With such threats to different kinds of ingroups believed to be posed by specific outgroups, it’s extraordinarily hard to engage in emotional empathy with “the other side”, let alone engage in cognitive empathy. Going through the work of taking another person’s perspective isn’t likely when that person feels like a threat instead of someone with whom you might cooperate. To empathize with people who are different from us, we may have to take a view of all humanity as our ingroup. However, there are large individual differences in the ability to do this, and when one feels under threat, such radical empathy poses even bigger challenges. Even if such things could be taught, there appear to be interactions between genes and hormones related to empathy. Those who are more likely to empathize with their ingroups are more likely to be receptive to oxytocin (which is a hormone that’s more associated with ingroup bonding than universal connectedness), whereas those less receptive to oxytocin are willing to harm members of the outgroup to the degree their brains “want” to harm the outgroup.

So, what can we do to help empathy build between groups who view each other as threatening? Shared suffering may be the answer, coming together over shared tragedies to let pain bind closed the wounds of humanity. Failing that, empathic listening to both sides may also help, allowing people to express their pain or fears without judgment or defensiveness. In the wake of last week’s tragic shootings of two black men and five police officers, a black man offered free hugs outside the Dallas police department headquarters. The Dallas police themselves guarded the people’s right to protest peacefully. Perhaps emotional empathy can give rise to cognitive empathy.

Empathy is a challenge to us all, and it may have untoward consequences if we only exercise it toward those we perceive to be like us. In my own experience, I grew up playing the Police Quest series of games, and the narratives that Jim Walls spun affected me viscerally, allowing me to peer inside a cop’s life in a way that sticks with me still. Conversely, working at the Walk-In Counseling Center allowed me to hear the stories of people who grew up with very different racial and socioeconomic backgrounds in ways I’d never experienced before and moderated my political beliefs. But each of these took years of work to fully set in for me, to let me see what both sides might be thinking – yet recognizing that my own empathy will be forever incomplete as a result of living outside of both black and police worlds. Society will not heal quickly from these wounds, and more than just emotional empathy will be necessary to do so.

Hannibal, Bates Motel, and trait absorption

HannibalIn 2013, two remarkable TV shows hit the air- and cable-waves that provide backstories of two of cinema’s most notable villains. Hannibal features a retelling of the story of Hannibal Lecter and Will Graham that surprises even the most die-hard connoisseurs of Thomas Harris’s original novels and the movies that have been made from them. Bates Motel fills in the history of Norman Bates, tracing his descent from a gawky teenager into the Psycho murderer.

The personality trait of absorption is strongly evident in a character in each series. Absorption is a strange trait in the Giant 3 model of personality that doesn’t fit cleanly anywhere. It was originally designed as a measure of hypnotic susceptibility, but it’s been refined over the decades to emphasize getting lost in one’s own experiences, whether those experiences be enthralling external stimuli or deeply engaging thoughts and images that come to a person’s mind. Absorption relates equally to the superfactors of Positive Emotionality and Negative Emotionality, indicating that it predisposes people to strong emotional experiences. Within the Big Five model of personality, it’s associated with the fantasy proneness and emotionality facets of Openness to Experience, not the parts of Openness that are associated with playing with ideas or political liberalism.

Some of my recent work has examined how absorption is related to initial attention to emotional pictures and subsequent attention to noise probes. We found that people high in absorption had more emotional attention to emotional pictures (both pleasant and aversive) compared to neutral pictures. Thus, people high in absorption get wrapped up in what they’re seeing when it’s emotionally evocative. Furthermore, we found that people high in absorption show less attention to a loud noise probe during all pictures. It’s as if they’re so wrapped up in processing the pictures that they don’t have as strong an ability to disengage attention to process something else coming in a different channel (that is, hearing as opposed to sight).

How does this apply to our two fictional characters? Both of them get really absorbed in the imaginal part of their internal experience, which wreaks havoc on their emotional lives. Will Graham’s unique perceptual gifts entail mentally reconstructing a crime from the residues left at the crime scene. He may be a perceptive person, but his genius lies in absorbing himself in what he sees and piecing people’s last moments together through the eyes of a killer. This kind of perspective taking is rare in individuals on the autism spectrum, as Graham claims himself to be. Therefore, I would argue that absorption is the key trait allowing Graham to get inside killers’ heads; his inability to disengage from the disturbing images that run through his head confuses him and creates untoward consequences that demonstrate the perils as well as the promise of high levels of absorption.

Bates MotelNorman Bates is a more purely maladaptive face of high absorption. Absorption is also associated with dissociation, which refers either to the feeling that one’s self or surroundings aren’t real or to the experience of having done something without recalling having done it. As the seasons progress, Norman’s increasing absorption in his fantasies about his mother propel him from committing murders of women he desires to taking on his mother’s identity without recalling having done it in the morning. Norman’s emotions overwhelm him, and he uses his absorption to retreat into a mental world that’s safer for him, that’s anchored by his mother. It’s this fantasy component of openness and absorption that’s related to psychoticism, which represents a vulnerability to experiencing odd and unusual perceptual experiences consistent with schizotypal personality disorder and certain forms of schizophrenia. In essence, Norman Bates isn’t a psychopathic killer; he’s one of the rare serial murderers with psychotic experiences – in this case, that may be underpinned by absorption. Will Graham exhibits a form of dissociation that might superficially seem related to absorption as well, but instead (SPOILER ALERT) is more likely due to encephalitis than his personality.

Robin Williams and anhedonia

The Angriest Man in Brooklyn

Warning: Ahead be spoilers for the little-seen movie The Angriest Man in Brooklyn and a frank discussion of suicide.

Robin Williams’ death hit me hard when it was first reported. I spent about a week watching his movies, comedy shows, television appearances, and even some old Mork and Mindy episodes to remember the depth and breadth of his talent. From the Captain of Dead Poets Society and the unorthodox therapist in Good Will Hunting to the madcap comic hijinks of Mrs. Doubtfire and Good Morning Vietnam to the sublimely creepy photo technician in One Hour Photo and malign Milgram clone in an episode of Law and Order: Special Victims Unit, Williams’ acting talents were uniquely diverse.

His death’s lingering impact struck me when I watched The Angriest Man in Brooklyn tonight. It was the last movie of his released while he was still alive, and it features a scene in which his character jumps off the Brooklyn Bridge to attempt suicide. At that moment, the remainder of the movie didn’t matter to me. That scene was an eerie reminder of the nature of his death; his character’s survival of the attempt rendered even more poignant the death of his portrayer through similar means. Early attempts to explain his suicide focused on his history of depression and substance use, both of which are predictors of suicide.

Williams was reported to be suffering from the early stages of Parkinson’s disease at the time of his death (though later reports suggest he may have suffered from Lewy body dementia instead). Assuming that the Parkinson’s disease diagnosis was correct, Williams becomes one of the most striking exemplars of the anhedonia that frequently accompanies Parkinson’s disease. Specifically, because Parkinson’s disease entails reduced levels of dopamine, it’s reasonable to assume that the anhedonia in Parkinson’s disease relates to a decrease in “wanting”, the part of reward processing that’s involved in yearning for and approaching something that’s desirable. If very little seems truly desirable in your life, it’s difficult to make yourself get out of bed, do the potentially hard work in front of you, and keep going through obstacles that rise up.

One might assume that near the end of his life, Williams’s emotional life was the opposite that of his character in The Angriest Man in Brooklyn. Anger is an approach-related emotion, one that’s related to dopamine binding. Rather than being angry, Williams was described as depressed, anxious, and paranoid toward the end of his life. Clearly, finding good assessments of the anhedonia of Parkinson’s disease patients is critical, particularly to the degree it may share features with the anhedonia in other disorders like depression. If that’s the case, treatments for one form of anhedonia may be applied to other forms, saving the lives of thousands of people – including some of our most creative members of society.