This page is an attempt to encourage a dialogue about reproducibility in econometric research. Like the weather, reproducibility of research is a topic of frequent discussion, but little action. Experience suggests that mandates by journals and the NSF notwithstanding, research in both econometric theory and applications remains difficult to reproduce. This is particularly true for students or new Ph.ds in the field who, quite properly, are reluctant to approach senior colleagues about "filling in the gory details" of what they may have done a decade ago.
An entertaining case study in reproducibilty is the recent paper by Brian Kernighan and others about recreating a early Bell Labs memo on computer typesetting: archaeology.. I particularly liked the comment: "But computer archaeology has its problems. To paraphrase George Santayana, 'Those who do not archive the past are condemned to recreate it.'”
A related issue is reproducibility in mathematics. This might seem to be more straightforward: one just needs to read the proofs and decide whether they hold together. However there are often gaps that memory fails to fill, and references that are overlooked. I've written a 2 page tutorial on how to write mathematics intended for graduate students in econometrics. It is somewhat idiosyncratic in that it advocates a somewhat more modular approach that mimics structured programming for software. The note is available here.
Central archives for data and programs have not met with widespread acceptance in econometrics. There is no archive in econometrics which plays the important role that statlib used to play in statistics. But no central archive can ever serve the full function required of providing complete details of published work in a transparent form, easily accessible by a worldwide audience. If this is to be realized, it seems it must happen in a decentralized manner. Individual researchers must be convinced that it is in their own interest to provide details as part of the effort to encourage the dissemination of their ideas.
There are many impediments, not the least of which is the "Tower of Babel"
of econometric software. But this should not prevent us from making a start.
In this spirit we suggest adopting the following general principles taken from
recent work by David Donoho:
All the code underlying figures and tables is made available
Together with the underlying software environment necessary to execute that code
Together with documentation of both the tools and environment
Using standard internet methods (ftp, www) for anonymous access.
A new review essay on the subject written with Achim Zeileis is available from here. Some notes along similar lines are available in Protocol for Simulations in R. These notes include some suggestions for using R on our clusters. A file used as an example in this document can be downloaded from this page as plink.R.
I would very much like to have comments on all of this, I would particularly like to encourage others to suggest other www links which provide other examples of this sort. We would also welcome comments on further elucidation of the principles proposed above and ways to make them more operational.
Last Revised on August 2015 by Roger Koenker