Exercise - Filedraw Bias and Publication Bias in Academic Research

This exercise draws on two published papers: Annie Franco, Neil Malhotra, and Gabor Simonovits (2014) “Publication bias in the social sciences: Unlocking the file drawer.” Science, vol. 345, no. 6203, pp 1502–1505. and Annie Franco, Neil Malhotra, and Gabor Simonovits (2015) “Underreporting in political science survey experiments: Comparing questionnaires to published results.” Political Analysis vol. 23, no. 2, pp 306–312. You will not need to consult these papers to answer this question, although you may find them interesting to read for more background.

You may have heard something about the “replication crisis” in social science. In broad strokes, concern has grown in recent years that a substantial fraction of published work in empirical social science may be unreliable. The tools of classical frequentist inference, hypothesis testing and confidence intervals, were designed to protect researchers against drawing erroneous conclusions from data. These tools can and have been misused, but that’s only half of the story. Even if every individual researcher does things by the book, science is a social enterprise, and the mechanisms through which research is filtered and disseminated between researchers are tremendously important in determining which claims are accepted as reliable within a field.

It is widely thought that journal editors and referees are more willing to accept a paper for publication if it contains statistically significant results, an effect called publication bias. This practice may seem natural, but its consequences are perverse. It could easily lead to a situation in which most published research is false. Things only get worse if individual researchers, aware of journals’ proclivities, tend to self-censor projects with statistically insignificant results, never submitting them for peer review and potential publication. This second effect is called file drawer bias.

Below you will look for evidence of publication and file drawer bias using two datasets filedrawer.csv and published.csv. You can download these from the data directory of my website: https://ditraglia.com/data/. Both datasets contain information about experiments funded through the “Time-Sharing Experiments in the Social Sciences” (TESS) program. Whereas published.csv contains information about 53 published papers that resulted from TESS-funded experiments, fildrawer.csv contains information about 221 TESS-funded projects that may or may not have yielded a published paper.

The columns contained in fildrawer.csv are as follows:

Name Description
DV Publication status (“Unwritten” / “Unpublished” / Published, non top” / “Published, top”)
IV Statistical significance of main findings (“Null” / “Weak” / “Strong”)
max.h Highest H-index among authors
journal Discipline of journal for published articles

Two notes on the preceding. First, “top” and “non top” refer to journal rankings, with top indicating a highly-ranked journal and non top indicating a lower-ranked journal. Second, the H-index is a measure of scholarly productivity and influence computed from citation counts. It is the largest number \(h\) such that you have published at least \(h\) papers each of which has received a least \(h\) citations.

The columns contained in published.csv are as follows:

Name Description
id.p Paper identifier
cond.s # of experimental conditions in the study
cond.p # of experimental conditions presented in the published paper
out.s # of outcome variables in the study
out.p # of outcome variables used in the published paper
  1. Patterns in filedrawer.csv
    1. Load filedrawer.csv and store it in a tibble called filedrawer.
    2. For each value of IV, count the number of papers with each publication status. Suggestion: try datasummary_crosstab() from modelsummary.
    3. Suppose we wanted to test the null hypothesis that there’s no difference in publication rates (in any journal) across projects with “Null” and “Weak” results compared to “Strong” results. Carry out appropriate data manipulation steps, and then run a regression that would allow us to accomplish this.
    4. Suppose we were worried that “researcher quality,” as measured by H-index, causes both publications and strong results: e.g. better researchers are more likely to think up experimental interventions with large treatment effects and write better papers. This might confound our comparisons from above. Carry out additional analyses that could help us address this concern.
    5. Interpret your results from above.
  2. Patterns in published.csv:
    1. Load published.csv and store it in a tibble called published.
    2. Make a scatterplot with cond.s on the x-axis and cond.p on the y-axis. Use jittering to avoid the problem of overplotting.
    3. Repeat the preceding with out.s and out.p in place of cond.s and cond.p.
    4. Interpret your results.
  3. If a study has \(p\) experimental conditions and measures a total of \(k\) outcomes for each, then we might imagine that it generates \(kp\) distinct null hypotheses that one could test (no effect of any particular condition on any particular outcome). Suppose that every null hypothesis in every study in published is TRUE. Based on the values of cond.s and out.s
    1. What is the average (per paper) probability that we will reject at least one null hypothesis at the 5% level?
    2. Repeat the preceding for at least two null hypotheses.
    3. Repeat the preceding for at least three null hypotheses.
    4. Discuss your findings in light of those from the other parts of this question.