\documentclass[10pt,a4paper,twoside]{article}

%% additional packages
\usepackage[latin1]{inputenc}
\usepackage{a4wide,graphicx,color,thumbpdf}
\usepackage{hyperref}
\usepackage{amsmath}

%% BibTeX settings
\usepackage[authoryear,round]{natbib}
\bibliographystyle{jae}
\bibpunct{(}{)}{,}{a}{,}{,}
\newcommand{\doi}[1]{\href{http://dx.doi.org/#1}{\normalfont\texttt{doi:#1}}}

%% markup commands for code/software
\let\code=\texttt
\let\pkg=\textbf
\let\proglang=\textsf
\newcommand{\file}[1]{`\code{#1}'}
\newcommand{\email}[1]{\href{mailto:#1}{\normalfont\texttt{#1}}}

%% paragraph formatting
\renewcommand{\baselinestretch}{1}

%% \usepackage{Sweave} is essentially
\RequirePackage[T1]{fontenc}
\RequirePackage{ae,fancyvrb}
\DefineVerbatimEnvironment{Sinput}{Verbatim}{fontshape=sl}
\DefineVerbatimEnvironment{Soutput}{Verbatim}{}
\DefineVerbatimEnvironment{Scode}{Verbatim}{fontshape=sl}
\newenvironment{Schunk}{}{}
\SweaveOpts{engine=R, eps=FALSE, keep.source = TRUE}

%% mimic JAE style
\renewcommand{\section}{\secdef \mysec \mysecnn}
\newcommand{\mysec}[2][default]{\vspace{1.7\baselineskip}%
  \pdfbookmark[1]{#1}{Section.#1}%
  \refstepcounter{section}%
  \centerline{\large \thesection. \uppercase{#1}} \vspace{.5\baselineskip}}
\newcommand{\mysecnn}[1]{\vspace{1.7\baselineskip}%
  \centerline{\large #1} \vspace{.5\baselineskip}}
\renewcommand{\abstractname}{\normalfont SUMMARY}
\renewcommand{\refname}{REFERENCES}
\renewcommand{\thetable}{\Roman{table}}

%% page header (and currently no footer)
\usepackage{fancyhdr}
\setlength{\headheight}{15pt}
\renewcommand{\headrulewidth}{0pt}
\pagestyle{fancy}
\thispagestyle{plain}
\fancyhf{}
\fancyhead[LE,RO]{\thepage}
\fancyhead[CE]{{\normalsize \uppercase{R.~Koenker and A.~Zeileis}}}
\fancyhead[CO]{{\normalsize \uppercase{On Reproducible Econometric Research}}} 
\fancyfoot[LO,LE]{\small Copyright {\copyright} 2009 John Wiley \& Sons, Ltd.}
\fancypagestyle{plain}{
\fancyhf{}
\fancyfoot[LO,LE]{\small This is a preprint of an article published in %
  \textit{Journal of Applied Econometrics}, \textbf{24}(5), 833--847. \\%
  Copyright {\copyright} 2009 John Wiley \& Sons, Ltd. \doi{10.1002/jae.1083} }
}

%% title information
\title{\bf On Reproducible Econometric Research}
\author{\hfill Roger Koenker$^a$ \hfill Achim Zeileis$^b$\thanks{Correspondence
to: Achim Zeileis, Department of Statistics and Mathematics, WU Wirtschaftsuniversit\"at Wien,
Augasse 2--6, 1090 Wien, Austria. Tel: +43/1/31336-5053. Fax: +43/1/31336-774.
E-mail: \email{Achim.Zeileis@R-project.org}} \hfill \hfill \\
{\small \it $^a$ Department of Economics, University of Illinois at Urbana-Champaign, United States of America} \\
{\small \it $^b$ Department of Statistics and Mathematics, WU Wirtschaftsuniversit\"at Wien, Austria}}
\date{}

% hyperref setup
\definecolor{Red}{rgb}{0.5,0,0}
\definecolor{Blue}{rgb}{0,0,0.5}
\hypersetup{%
  pdftitle = {On Reproducible Econometric Research},
  pdfsubject = {},
  pdfkeywords = {reproducibility, replication, software, literate programming},
  pdfauthor = {Roger Koenker, Achim Zeileis},
  %% change colorlinks to false for pretty printing
  colorlinks = {true},
  linkcolor = {Blue},
  citecolor = {Blue},
  urlcolor = {Red},
  hyperindex = {true},
  linktocpage = {true},
}

\begin{document}

\maketitle

\begin{abstract}
Recent software developments are reviewed from the vantage point of reproducible
econometric research.  We argue that the emergence of new tools, particularly in
the open-source community, have greatly eased the burden of documenting and archiving
both empirical and simulation work in econometrics.  Some of these tools are highlighted
in the discussion of two small replication exercises.
\end{abstract}

\noindent {\bf Keywords:} reproducibility, replication, software,
  literate programming.

<<preliminaries, echo=FALSE, results=hide>>=
library("lattice")
library("MASS")
library("lmtest")
library("sandwich")
options(prompt = "R> ", continue = "+  ", digits = 4, show.signif.stars = FALSE)
cleanup <- FALSE
@


\section{Introduction} \label{sec:intro}

The renowned dispute between Gauss and Legendre over priority for the
invention of the method of least squares might have been resolved by
\cite{repro:Stigler:1981}.  A calculation of the earth's ellipticity
reported by Gauss in 1799 alluded to the use of {\it meine Methode};
had Stigler been able to show that Gauss's estimate  was consistent
with the least squares solution using the four observations available
to Gauss, his claim that he had been using the method since 1795 would
have been strongly vindicated.  Unfortunately, no such computation
could be reproduced leaving the dispute in that limbo all too familiar
in the computational sciences.  

The question that we would like to address here is this:  200
years after Gauss, can we do better?  What can be done to
improve our ability to reproduce computational results in econometrics
and related fields?  Our main contention is that recent software developments, 
notably in the open-source community, make it much easier to achieve 
and distribute reproducible research.  

What do we mean by reproducible
research?  \cite{repro:Buckheit+Donoho:1995} have defined what
\cite{repro:deLeeuw:2001} has called \emph{Claerbout's Principle}:
``An article about computational science in a scientific publication
is \emph{not} the scholarship itself, it is merely \emph{advertising}
of the scholarship. The actual scholarship is the complete software
development environment and the complete set of instructions which
generated the figures.''  We view this as a desirable objective for
econometric research.  See \cite{repro:Schwab+Karrenbach+Claerbout:2000} 
for further elaboration of this viewpoint. 

The transition of econometrics from a handicraft industry
\citep{repro:Wilson:1973,repro:Goldberger:2004}
to the modern sweatshop of globally interconnected computers
has been a boon to productivity and innovation, but sometimes
seems to be a curse.  Who among us expected to be in the ``software
development'' business?  And yet many of us find ourselves  precisely
in this position, and those who are not, probably should be.  As
we will argue below, software development is no longer something  that
should be left to specialized commercial developers, but instead should
be an integral part of the artisanal econometric research process.  Effective
communication of research depends crucially on documentation and
distribution of related software and data.

Some journals, such as the \emph{Journal of Applied Econometrics}
(JAE, \url{http://JAE.Wiley.com/}), support authors in this task
by providing data archives \citep{repro:MacKinnon:2007}. However,
a more integrated approach encompassing data, software, empirical
analysis, and documentation is often desirable to facilitate
replication of results. 

\section{Software Tools} \label{sec:software}

In this section, we review some recent software developments that facilitate a more 
reproducible approach to econometric research. The tools discussed encompass
the most common components of econometric practice: from data handling, over
data analysis in some programming environment, to preparing a document describing
the results of the analysis. Additionally, we provide information on literate
programming tools and techniques that enable an integrated approach to these
three steps, and on software for version control of all components.

\subsection{Version Control} \label{sec:version}

Econometric research on a given project is often carried out over an extended
period, by several authors, on several computers, so it is hardly surprising
that we often have difficulty reconstructing exactly what was done.  Files
proliferate with inconsistent naming conventions, get overwritten or deleted
or are ultimately archived in a jumble with a sigh of relief when papers are finally
accepted for publication.  Sometimes, as is the case with JAE papers, a part of the 
archive is submitted to a central repository, but more often the archive resides
peacefully on one or more authors' computers until the next disk crash or
system upgrade.

Such difficulties have long been recognized in software development
efforts, but it is only relatively recently that practical version control systems
have emerged that are well adapted to econometric research.  Many
systems have grown out of the open-source software community where rigorous
archive cataloging of development efforts involving many participants is
absolutely essential.  We briefly describe one such system, \pkg{Subversion} 
\citep[SVN, \url{http://Subversion.Tigris.org/}, see][]{repro:Pilato+Collins-Sussman+Fitzpatrick:2004}
that we have used for this project.
Cross-platform compatibility is an important consideration in many projects;
SVN's basic command-line interface will feel comfortable to most 
Linux/Unix users, but the graphical clients \pkg{TortoiseSVN}
(\url{http://TortoiseSVN.Tigris.org/}), embedded 
into the Windows \pkg{Explorer}, or  \pkg{svnX} for 
Mac (\url{http://www.Apple.com/downloads/macosx/development_tools/svnx.html})
will prove convenient for others.

The first stage of any SVN project involves the creation of a central repository
for the project.  Once  the repository is created, any of the authorized collaborators
can ``checkout'' the current version of the project files, update the local copies
of the files, and make changes, additions or deletions to the files.  At any point,
the modified  file structure  can be ``committed'' thereby creating a new version
of the project.  When changes are committed they are recorded in the repository
as incremental modifications so that all prior versions are still fully accessible
(even if a file was deleted from the current revision).
In rare cases that more than one author has modified the same passage of the same
file, the system prompts authors to reconcile the changes before accepting the
commitment.  A complete historical record of the project is available at any point in 
the development process for inspection, search and possible restoration.
Since modifications are stored as increments, file ``diffs'' in Unix jargon,
storage requirements for the repository are generally far less severe than for
prior improvised archiving strategies.  Version numbering is fully automatic,
so one consistent set of naming conventions for project files can be adhered
to, thereby avoiding the typical m\'elange of ad hoc files resulting from impromptu
version control strategies.

If an SVN server is available, setting up an SVN project/repository is very easy.
Only if no SVN server is available, the setup requires some technical work.
It is not dissimilar to setting up a Web server but can be a lot easier
\citep[see][Chapter~6]{repro:Pilato+Collins-Sussman+Fitzpatrick:2004}. Also,
the individual researcher can often avoid setting up a server alone, especially if the university/department
already provides such a server, typically along with other Web services.
Some Web pages also host free SVN repositories, esepecially for software projects, e.g.,
\url{http://SourceForge.net/} or \url{http://Code.Google.com/},
and \url{http://R-Forge.R-project.org/} among others.


\subsection{Data Technologies and Data Archiving} \label{sec:data}

%% 1. flat data
In the beginning data was flat, representable as matrices, and easily storable in
text files.  Easily stored need not imply easily retrieved, as anyone with 
with a drawer full of floppy disks or a collection of magnetic tapes 
is undoubtedly aware. As data has become more plentiful it has also become
more structurally complex.
%% 2. DBMS
Relational database management systems (DBMSs) are still
quite exotic in econometrics, but seem likely to become more commonplace.
Stable, well-documented databases with proper version control are critical to
the reproducibility goal. 
%% 3. Web-based formats
In many cases, data archives for individual projects can be most efficiently
represented by software describing selection rules from publicly available
data sources, like the Panel Study on Income Dynamics (PSID) and the US
Census Bureau. Such data sources often provide their contents via the World
Wide Web (WWW), either using one of the data technologies above or new standards
such as \proglang{XML} or \proglang{PHP} that emerged with the WWW.
However, as we stress in Section~\ref{sec:growth} regarding the Penn World Tables,
reproducibility of such data retrieval strategies relies heavily on future access to
current versions of the database.

%% 1. flat data
If the data to be stored is not too complex and/or large, a flat plain text file
is often ideal for reproducibility (and hence used for most publications in the
JAE data archive): it is portable across platforms, many software
packages can read and write text files and they can be efficiently archived, e.g.,
in an SVN repository.
For larger data sets, it may be useful or even necessary to store the data
in a DBMS. Many candidates are available here, but most use
some form of the \proglang{SQL} (structured query language).
Two popular open-source candidates, available across many platforms, are
\pkg{PostgreSQL} (\url{http://www.PostgreSQL.org/}) and \pkg{MySQL} (\url{http://MySQL.com/}).

%% 3. Web-based formats
In addition to the data technologies above, other new standards have emerged,
especially for sharing data across the WWW. These include 
\proglang{XML} \citep[extensible markup language,][]
{repro:Bray+Paoli+SperbergMcQueen:2008}: a text-based format,
well-suited for storing both data and meta-information, which is therefore used
as the basis for many other data and text formats. A key feature of \proglang{XML}
is that it is an open standard, maintained by an international consortium,
in contrast to many proprietary formats which are often altered periodically.
Data formats based on open standards have a higher likelihood of remaining accessible in the future. 
Further discussion of data technologies for statistical/scientific applications may
be found in \cite{repro:Murrell:2009}. 


\subsection{Programming Environments} \label{sec:programming}


Econometrics has always been a Tower of Babel with many languages, or programming
environments,  
competing for attention in the never-ending struggle to communicate effectively
with machines. Over the course of our disparate careers we have had
opportunities to explore many of these, including, in approximate chronological order:  
\proglang{TSP}, \proglang{MIDAS}, \proglang{SAS}, \proglang{Statlib},
\proglang{GLIM}, \proglang{S}, \proglang{LIMDEP}, \proglang{SHAZAM}, \proglang{GAUSS},
\proglang{S-PLUS}, \proglang{Lisp-Stat}, \proglang{SPSS}, \proglang{Minitab},
\proglang{Stata}, \proglang{Ox}, \proglang{MATLAB}, \proglang{Mathematica}, and 
\proglang{R}. Clearly, these vary considerably in their level of ``programmability'' and
degree of specialization. Lower-level languages such as \proglang{C} or \proglang{Fortran}
also play an important role. A historical survey of the early development of econometric software
may be found in \cite{repro:Renfro:2004}, see also \cite{repro:Ooms+Doornik:2006}. 
There is also an extensive literature on the individual merits and comparative 
performance of various forms of statistical software;  see \cite{repro:Baiocchi:2007} 
for further references, and a more proscriptive review of reproducibility issues.
Here, we will try to restrict our attention quite narrowly to properties of
programming environments that might facilitate reproducible research.

\paragraph{Language Structure}
%% 1. readability
Software reviews of programming environments often stress speed
of execution, rather than accuracy, breadth of coverage, or other less tangible
attributes. This is unfortunate since the exertions of the machine are rarely a
matter concern, or sympathy. Of more direct relevance in our view is ease of
writing software (for avoiding errors) and reading software
(for replication of results).  For those who believe that computer
programming should be, like prayer, a private language between the individual
supplicant and the \emph{deus ex machina}, we recommend 
\cite{repro:Kernighan+Plauger:1974} who emphasize the need to write programs 
that other \emph{people} are capable of reading.  The programming language
should support the user in turning theory into software that reflects how
we think about the underlying method conceptually. Several language features are 
helpful in obtaining this goal: functional languages, object orientation, and
modular programs with re-usable components.
%% functioncal languages
Environments for statistical
computing are gradually moving away from the ``canned soup model'' toward 
functional languages  that permit fresh ingredients to be assembled in a more flexible
manner. Formulae specification for linear and nonlinear models, and unified  
interfaces for general linear models constitute two developments that have
brought software and theoretical specification of models closer together.  
%% object orientation
Combining such an approach with object orientation allows one to encapsulate
complex structures (such as fitted regression models) and define typical
tasks for them such as inference or visualization. Programs or analyses
written in such a way are more intelligible and hence easier to reproduce.
%% re-usability
Furthermore, it is highly desirable to have a single environment within which
to conduct empirical analyses and simulation studies. Re-using the same functionality 
across different tasks assures better consistency and avoids duplication of programming effort.
But environments designed for the convenience of empirical analysis may not provide
good simulation environments, and vice-versa.  To assure reproducibility
and re-usability by other authors, the structural features of a language should
facilitate (and not suppress) the ability to build on innovations of prior authors.  
%% further comments
Exploiting common structure in problems
often leads to general software solutions that facilitate critical comparison
of various methods as illustrated for structural change testing
in \cite{repro:Zeileis:2006a}. 
The modern trend toward functional
languages and object-orientation is visible in several of the currently 
successful econometrics environments.
\proglang{MATLAB} (\url{http://www.MathWorks.com/}),
\proglang{Ox} (\url{http://www.doornik.com/ox/}) and
\proglang{R} (\url{http://www.R-project.org/})
are notable in this respect.  

\paragraph{Open Sources}
Languages are acquired by mimicry so it is extremely valuable to have access to a
large body of text written by those proficient in the language.  This is one of
the prime advantages of open-source software projects. 
When we first embark on a new econometric project, we have at our fingertips an 
extensive body of existing code, some of which will be directly useful as building 
blocks---provided that the language is structured to facilitate such modularity---and
some code will offer only stylistic hints.  In either case, adapting prior code
when available from reliable sources is almost always preferable to reinventing
the wheel---and, as argued above, increases the readability and intelligibility 
of the resulting programs. Most languages now offer some sort of forum for sharing code,
and thus provide access to an existing body of tested code. An
exemplary approach is the Comprehensive \proglang{R} Archive Network
(CRAN, \url{http://CRAN.R-project.org/}) that hosts about 1,700 software
packages (containing code, data, and documentation) which are checked on 
a daily basis on several platforms.
In addition to such open platforms for user-contributed code,
there are also several journals publishing peer-reviewed software, e.g.,
\emph{The Stata Journal} (\url{http://www.Stata-Journal.com/}) and the
\emph{Journal of Statistical Software} (\url{http://www.JStatSoft.org/}).
Furthermore, even commercial software producers usually provide some functionality written in
the language itself and therefore open to inspection and emulation for the individual user.  
\proglang{MATLAB} and \proglang{Stata} (\url{http://www.Stata.com/})
are notable in this respect. However, even in these favorable circumstances 
one may eventually wish to dig below the surface only to discover that crucial
elements of the story are accessible only in a binary form readable only
by machines.  At this point the computations become a black box visible only
to the select few, and scientific trust becomes a leap of faith.
Source code is itself the ultimate form of documentation for computational
science. When it is well written it can serve as an inspiration for subsequent
work. When it is flawed, but accessible, then it can be much more easily 
subjected to necessary critical scrutiny.
With computational methods, gradual refinement is seriously impeded
unless source code is open.

\paragraph{Language Interfaces}
To combine the convenience of high-level programming environments with 
the efficiency of compiled languages, it is desirable to either (byte-)compile
program code directly or to be able to link code written in lower-level
languages like \proglang{Fortran} or \proglang{C} and their dialects.
Most modern languages employed in econometrics---including \proglang{MATLAB},
\proglang{Ox}, \proglang{R}, and \proglang{Stata}---offer some means
for ordinary users to accomplish this sort of sorcery. Doing so in ways that
are platform independent is a considerable challenge. From a reproducibility
perspective, it should be assured that even code interfacing lower level languages
behaves consistently across platforms with differing hardware and operating systems.
This problem is particularly acute in simulation settings where it is often
desirable to be able to reproduce sequences pseudo-random numbers across machines
and operating systems. 
 
\paragraph{Environmental Stability}
Since hardware and software environments for econometric research are constantly
evolving, a major challenge for reproducibility involves proper documentation of 
the environment used to conduct the computational component of the project.  Ideally,
in our view, authors and journals should be expected to provide a section similar to
Section~\ref{sec:computational} (``Computational Details'') of the present paper
describing in some detail the software and hardware environment employed. 
Even when these environments are completely specified, it is likely to be difficult
several years later to restore
a prior version of the environment.  In this respect, again, the open-source
community is exemplary, since prior versions are typically archived and easily
downloadable. For commercial software prior versions are sometimes difficult to obtain due to
licensing and distribution constraints. 
\cite{repro:Zeileis+Kleiber:2005} describe 
an exercise in ``forensic econometrics'' exploring the evolution of the evaluation
of the Gaussian distribution function in successive versions of
\proglang{GAUSS}. 
Such investigations are obviously handicapped by the unavailability of older versions
and lack of adequate documentation of algorithms and their evolution.

\paragraph{User Interfaces}
In the paragraphs above, we have focused on reproducibility using source code
as typically used in programs with a command line interface (CLI). In practice,
however, many analyses are carried out using a point-and-click approach
based on a graphical user interface (GUI). Although easier to learn and often
easier to apply, this procedure poses a challenge to reproducibility because
it is much harder to reconstruct the ``click history'' for a certain analysis.
A useful compromise are log files offered by several GUI-based systems that
contain the source code associated with the point-and-click analysis. Thus,
an author who wants to assure reproducibility, probably cannot avoid the contact
with source code alltogether, but the transition to reading/writing source
code is often made easier in this way.

\subsection{Document Preparation Systems} \label{sec:document}

{\TeX} and {\LaTeX} have become the \emph{lingua franca} for the composition 
of mathematical text in general and econometric text in particular.
{\TeX} (\url{http://www.TUG.org/}) developed by \cite{repro:Knuth:1984}
beginning in the late 1970s constitutes an exceptional case study in 
software development and the effectiveness of the gradual refinement process 
of open-source projects.  
{\LaTeX} (\url{http://www.LaTeX-project.org/}), originally written by  \cite{repro:Lamport:1994},
still constitutes an ongoing development effort to build a higher level markup
language on the foundation provided by {\TeX}.  Nevertheless, {\LaTeX} is also
a remarkably stable environment and serves as a convenient and portable format
across many platforms for a wide variety of documents. 

Although these systems are clearly state of the art in econometric document
preparation, casual users of document preparation systems often prefer to avoid
the somewhat steeper learning curve of {\LaTeX} in favor of \emph{WYSIWYG}
(what you see is what you get) text processors such as Microsoft's \pkg{Word}
(\url{http://www.Office2007.com/}) or \pkg{OpenOffice.org} (\url{http://www.OpenOffice.org/}).
To avoid a general
discussion about the relative merits of {\LaTeX} over WYSIWYG processors, we focus
on the perspective of reproducibility: Especially \pkg{Word}'s proprietary binary document
format is problematic in this respect as its documents are less stable across platforms or
versions of \pkg{Word}. With the advent of the open-source suite \pkg{OpenOffice.org}
the situation improved considerably as it not only introduced an \proglang{XML}-based
open document file format (ODF) for WYSIWYG documents but also supports export to 
a number of older proprietary file formats.

\subsection{Literate Programming} \label{sec:literate}

The idea of merging text, documentation and various forms of computer code
arose from the structured programming discussions of the 1970s and was
championed by \cite{repro:Knuth:1992} following his initial development of
{\TeX}.  Literate programming, as this movement has come to be called, 
encourages a more readable programming style by making the code itself an
integral part of the documentation. The basic operations on documents 
containing both code chunks and documentation chunks are known as \emph{tangling} and
\emph{weaving}: the former strips off the documentation chunks and extracts only
the code chunks while the latter weaves the code chunks into the documentation,
typically by adding appropriate markup for display of the code. Following
\cite{repro:Knuth:1992}, the initial literate programming systems were
\pkg{WEB} and \pkg{CWEB} for combining \proglang{Pascal} and \proglang{C}
code, respectively, with {\TeX} documentation. In order to provide a leaner
and more flexible literate programming system, \cite{repro:Ramsey:1994}
developed \pkg{noweb}, a set of open-source tools for combining code in
arbitrary languages and {\LaTeX} documentation. 

Literate programming is an important first step in supporting reproducibility in
econometric practice but one would like to carry the idea a step further and include not only
the code but also its results dynamically in a document \citep{repro:Leisch+Rossini:2003}. 
This idea of \emph{literate econometric practice} seems especially well-suited to statistical
computing applications where models, data, algorithms and interpretation all coalesce
to produce scientific documents. Directly linking text with computational procedures
reduces the likelihood of inconsistencies and facilitates reproducing the same type of
analysis with an extentended/modified data set or different parameter
settings.  Proximity is, of course, no guarantee that text and code will agree; we have 
all read and probably written commented code for which the comments contradicted the 
unintended consequences of the code. 
\proglang{Mathematica}'s concept of ``notebooks'' is an approach to documents of this type
in which the displayed document can be easily altered by changing the associated \proglang{Mathematica}
code. Another implementation, more closely modelled after the literate programming ideas
discussed above, is the \code{Sweave} system of \cite{repro:Leisch:2002} for \proglang{R}.
Re-using the markup commands of \pkg{noweb}, it allows authors to create ``revivable'' documents
containing a tightly coupled bundle of code and documentation. To illustrate this,
and to try to practice what we preach, the archived version of this paper includes
a source file \file{JAE-RER.Rnw}
that includes all of the text as well as all of the code used to generate
the statistical analyses presented in the next section. The input file is structured
into code chunks written in \proglang{R} and text chunks written in {\LaTeX} and it
can be processed in \proglang{R} with the command \verb|Sweave("JAE-RER.Rnw")|.
This extracts the code chunks and executes them in \proglang{R}, produces the associated
output (either in numerical or graphical form) and weaves them into a 
{\LaTeX} file. This resulting {\LaTeX} file can
then be processed to PDF by pdf{\LaTeX}. The choice of pdf{\LaTeX} is specific to our document.
Many other flavours of \code{Sweave} are available, including {\LaTeX},
HTML \citep[\pkg{R2HTML},][]{repro:Lecoutre:2003}, and ODF \citep[\pkg{odfWeave},][]{repro:Kuhn:2006}.
There are also extensions to mixing \proglang{SAS} code with {\LaTeX} documentation
\citep{repro:Lenth+Hojsgaard:2007}.


\section{Replication Case Studies} \label{sec:replication}

In this section, we describe briefly two replication exercises chosen
to illustrate several aspects of the reproducibility problems discussed above.
The software tools used are \pkg{Subversion} for version control, flat text files
for data storage, \proglang{R} \citep{repro:R:2007} for programming and empirical analysis, and {\LaTeX}
for document preparation. As mentioned above, a literate data analysis approach is
adopted based on the \code{Sweave} tools. The SVN archive, containing the
full sources of the document, can be checked out anonymously from
\url{svn://svn-statmath.wu-wien.ac.at/repro/}. For convenience of non-SVN users,
there is also an online archive available at \url{http://www.econ.uiuc.edu/~roger/research/repro/}.
An extended version \citep{repro:Koenker+Zeileis:2007} with many more details is also available
from the same locations.

\subsection{An Empirical Example: Cross-Country Growth Regression} \label{sec:growth}

<<packages, echo=FALSE, results=hide>>=
library("lmtest")
library("sandwich")
@

<<dj-data1, echo=FALSE, results=hide>>=
dj <- read.table("data.dj", header = TRUE, na.strings = c("-999.0", "-999.00"))
@

\citet[henceforth DJ]{repro:Durlauf+Johnson:1995} investigate cross-country
growth behavior based on an extended Solow model suggested in
\citet[henceforth MRW]{repro:Mankiw+Romer+Weil:1992}. The variables in the growth regression
model are real GDP per member of working-age population, $Y/L$
(separately for 1960 and 1985); fraction of real GDP devoted to investment, $I/Y$
(annual average 1960--1985); growth rate of working-age population, $n$ (annual average 1960--1985);
fraction of working-age population enrolled in secondary school, $\mathit{SCHOOL}$
(annual average 1960--1985); and the adult literacy rate, $\mathit{LR}$ in 1960.
Data for these variables (except $\mathit{LR}$) for \Sexpr{nrow(dj)} countries is printed
in MRW and provided in electronic form in the JAE data archive along
with DJ (who added $\mathit{LR}$).

The unconstrained extended Solow model suggested by MRW in their Table~V and given by DJ in their
Equation~7 regresses GDP growth $\log(Y/L)_{1985} - \log(Y/L)_{1960}$ on initial GDP
$\log(Y/L)_{1960}$ as well as $\log(I/Y)$, $\log(n + g + \delta)$ (assuming $g + \delta = 0.05$),
and $\log(\mathit{SCHOOL})$. DJ first fit the model by least squares for all
\Sexpr{nrow(subset(dj, NONOIL == 1))} non-oil-producing countries (reproducing the 
results of MRW) and to various subsets of countries---with subset selection based
on initial output $(Y/L)_{1960}$ and/or literacy $\mathit{LR}_{1960}$---finding multiple
regimes rather than a single stable set of coefficients.

Here, we aim to reproduce the unconstrained regression results given by DJ in their Tables~II and V
given the sample splits described in the paper. Although this seems to be a rather modest task, it turned
out to be surprisingly difficult, illustrating some typical pitfalls. We first read the data,
provided in file \file{data.dj} in the JAE data archive, into \proglang{R}, coding missing values
as described in the accompanying documentation and subsequently selecting the non-oil-producing
countries.
%
<<dj-data>>=
<<dj-data1>>
dj <- subset(dj, NONOIL == 1)
@
%
The relevant columns in the resulting data set \code{dj} are \code{GDP85},
\code{GDP60}, \code{IONY}, \code{POPGRO}, \code{SCHOOL}, and \code{LIT60}.
The last four are described as ratios/fractions, but it was not clear from the
documentation that they were given in percent. Of course, this is quickly revealed by
a look at the actual data; and MRW probably used this scaling because it is easier to
read when printed. However, it remains unclear which scaling of the variables
was used for model fitting. After first attempting to use the variables as printed, it
turned out that MRW had scaled them to the unit interval. Thus,
the model employed by MRW and DJ (Table~II, first column)
can be written in \proglang{R} as the formula
%
<<dj-mrwmodel>>=
mrw_model <- I(log(GDP85) - log(GDP60)) ~ log(GDP60) + log(IONY/100) +
  log(POPGRO/100 + 0.05) + log(SCHOOL/100)
@
%
This model can now be fit with \proglang{R}'s linear model
function \code{lm()} to the \code{dj} data set, producing the following regression
summary:
%
<<dj-mrw>>=
dj_mrw <- lm(mrw_model, data = dj)
summary(dj_mrw)
@
%
reproducing coefficient estimates, standard errors, $\bar R^2$, and the
residual standard error $\sigma_\varepsilon$ provided in the first column of Table~II 
in DJ (identical to the first column in Table~V of MRW) with only small deviations 
for some of the entries.  

<<dj-select-false, echo=FALSE, results=hide>>=
dj_select_false <-c(nrow(subset(dj, GDP60 <  1950 & LIT60 <  54)),
  nrow(subset(dj, GDP60 >= 1950 & LIT60 >= 54)))
@

<<dj-select-correct, echo=FALSE, results=hide>>=
dj_select_correct <- c(nrow(subset(dj, GDP60 <  1800 & LIT60 <  50)),
  nrow(subset(dj, GDP60 >= 1800 & LIT60 >= 50)))
@

In the next step we want to fit the two other columns of DJ's Table~II, pertaining to
the same model fitted to two different subsets: DJ employ a low-output/low-literacy
sample of 42 countries with $(Y/L)_{1960} < 1950$ and $\mathit{LR}_{1960} < 54\%$ and
the corresponding high-output/high-literacy sample, also consisting of 42 countries. 
However, when selecting these sub-samples and computing their size we obtain
sample sizes of \Sexpr{dj_select_false[1]} and \Sexpr{dj_select_false[2]}, respectively,
showing that either the subset definitions or the sample sizes are misstated in 
DJ's Table~II.  Some brute-force search (via all-subsets regression) coupled 
with some educated guessing, yielded revised 
cut-offs for model fitting of 1800 and 50\%,
respectively, yielding consistent sample sizes, 
and also consistent model fits as we will see below.

Fitting the same model to these two subsets yields DJ's slope
coefficients, but the wrong intercepts. After further combinatorial search
using different logarithm bases and variable scalings it turned out that DJ 
appear to have employed the model: 
%
<<dj-model>>=
dj_model <- I(log(GDP85) - log(GDP60)) ~ log(GDP60) + log(IONY) +
  log(POPGRO/100 + 0.05) + log(SCHOOL)
@
%
i.e., only scaling \code{POPGRO} to the original unit interval but keeping \code{IONY}
and \code{SCHOOL} in percent. Thus, the intercepts in their Table~II are not comparable
between the first and the other two columns, while the coefficients of interest are
unaffected by the scaling. Using the model formula \code{dj\_model}, we can then go on
and fit the model to both sub-samples:
%
<<dj-sub1, results=hide>>=
dj_sub1 <- lm(dj_model, data = dj, subset = GDP60 <  1800 & LIT60 <  50)
dj_sub2 <- lm(dj_model, data = dj, subset = GDP60 >= 1800 & LIT60 >= 50)
@
%
<<dj-sub1-test, echo=FALSE, results=hide>>=
coeftest(dj_sub1, vcov = sandwich)
@
%
<<dj-sub2-test, echo = FALSE, results=hide>>=
coeftest(dj_sub2, vcov = sandwich)
@
%
and perform the Wald test for all coefficients, via \code{coeftest(dj\_sub1, vcov = sandwich)},
which reproduces the results from the second column in their Table~II (with only
minor deviations). Note that heteroskedasticity-consistent sandwich
standard errors are used
\citep[provided by package \pkg{sandwich} in \proglang{R}, see][]{repro:Zeileis:2004,repro:Zeileis:2006}
whereas conventional standard errors are used for the first MRW column.
Results for \code{dj\_sub2} can be obtained analogously.
All \proglang{R} output is provided in condensed form
in our Table~\ref{tab:dj} (generated using \code{Sweave()}
with the models fitted above). 


<<lm-summary-table, echo=FALSE, results=hide>>=
## convenience function for extracting from "lm" objects:
##   coef, se, R^2, sigma
lmtab <- function(obj, vcov. = vcov, ...) {
  x <- t(coeftest(obj, vcov. = vcov., ...)[,1:2])
  x <- structure(as.vector(x), .Names = as.vector(rbind(colnames(x),
    paste(colnames(x), "SD", sep = "_"))))  
  c(x, unlist(summary(obj)[c("adj.r.squared", "sigma")]))
}

## turn summary table into LaTeX representation
tab2latex <- function(x, nam = NULL, last = 2) {
  n <- nrow(x) - last
  rval <- rbind(
    structure(ifelse(is.na(x[1:n,]), "",
      paste(c("$", "$("), format(round(x[1:n,], digits = 3), trim = TRUE, nsmall = 3),
      c("\\phantom{)}$", ")$"), sep = "")), .Dim = dim(x) - c(last, 0)),
    structure(paste("$", ifelse(is.na(x[-(1:n),]), "",
      format(round(x[-(1:n),], digits = 2), trim = TRUE, nsmall = 2)),
      "\\phantom{0)}$", sep = ""), .Dim = c(last, ncol(x))))
  if(is.null(nam)) nam <- rownames(x)
  rval <- paste(nam, "&", apply(rval, 1, paste, collapse = " & "), "\\\\")
  rval[c(n, nrow(x))] <- paste(rval[c(n, nrow(x))], "\\hline")
  rval
}
@

<<dj-summary-table, echo=FALSE, results=hide>>=
dj_tab <- cbind("mrw" = lmtab(dj_mrw),
  "sub1" = lmtab(dj_sub1, vcov = sandwich),
  "sub2" = lmtab(dj_sub2, vcov = sandwich)
)
@

\begin{table}[t]
\caption{\label{tab:dj}
  Replication of cross-country growth regressions, corresponding to
  models \code{dj\_mrw}, \code{dj\_sub1}, \code{dj\_sub2} with dependent
  variable $\log(Y/L)_{1985} - \log(Y/L)_{1960}$. Conventional standard errors
  are used in the first column and sandwich standard errors in the other two.}
\begin{center}
\begin{tabular}{lrrr} \hline
& & $(Y/L)_{1960} < 1800$ & $(Y/L)_{1960} \ge 1800$ \\
& MRW & $\mathit{LR}_{1960} < 50\%$ & $\mathit{LR}_{1960} \ge 50\%$ \\ \hline
<<dj-table, echo=FALSE, results=tex>>=
writeLines(tab2latex(dj_tab, nam = c(
  c("Constant", "", "$\\log(Y/L)_{1960}$", "", "$\\log(I/L)$", "",
  "$\\log(n + g + \\delta)$", "", "$\\log(\\mathit{SCHOOL})$", "",
  "$\\bar R^2$", "$\\sigma_\\varepsilon$"))))
@
\end{tabular}
\end{center}
\end{table}

Thus, we have reproduced DJ's Table~II but only after some effort: cut-offs
for subset selection are different than those displayed in the table, variable scaling is different
leading to different intercepts, and the standard errors used vary across columns (although
DJ state briefly in a footnote that they would use sandwich standard errors). None of this
changes their results qualitatively: all conclusions drawn from the analysis of DJ are supported. 
Nevertheless, it would be desirable to avoid such uncertainty both from the author's
and reader's point of view. Organizing both code and documentation in a single revivable document
as we suggest in the previous section will assist in this but can, of
course, not prevent human error. It will, however, be much easier for the authors and readers---provided
the code is made publicly available---to find the sources of such confusions and untie the knots.

\subsection{Another Empirical Example: Wage-Equation Meta-Model} \label{sec:wage}

<<rk-preprocessing, echo=FALSE, results=hide>>=
## read in raw information
rk <- readLines("rk.raw")
n <- length(rk)/2

## process information on model dimension
mdim <- strsplit(rk[(1:n) * 2], "\t")
neq <- sapply(mdim, length)
mdim <- sapply(strsplit(unlist(mdim), ","), rbind)

## process information on publication
info <- rep(rk[(1:n) * 2 - 1], neq)
info <- sapply(strsplit(info, "\t"), rbind)

## combine and turn to data.frame
rk <- data.frame(id = rep(1:n, neq), neq = rep(neq, neq),
  nreg = as.numeric(mdim[1,]), nobs = as.numeric(mdim[2,]),
  author = info[1,], journal = info[2,], year = as.numeric(info[3,]),
  page = as.numeric(gsub("s", "", info[4,])),  subject = factor(info[5,],
  levels = c("d", "h", "g", "u", "w", "b", "m", "c")))
levels(rk$subject) <- c("discrimination", "human capital",
  "general", "unionism", "women")[c(1:5, 1, 3, 3)]
rk$collection <- factor(rk$journal %in% c("ir", "jpe", "jhr", "ilrr",
  "aer", "ej", "restat"), levels = c(TRUE, FALSE), labels = c("no", "yes"))

## write as CSV
write.table(rk, file = "rk.csv", sep = ",", quote = FALSE)
@

Our second case study in replication reports our experience trying to reproduce
the empirical results in one of our own papers.  \cite{repro:Koenker:1988} describes
a meta-analysis of published wage equations intended to explore how parametric
dimension of econometric models depends upon sample size.  
The data for the study consists of \Sexpr{nrow(rk)} wage equations from  
\Sexpr{max(rk$id)} published papers
in mainstream journals and essay colllections.  In addition to citation information
and a topic indicator, the sample size and parametric dimension of the model was
recorded for each equation.  

The models used for the analysis are standard count data models.  Observations 
on each wage equation are weighted by the reciprocal of the number of equations, $m$,
appearing in the paper, so the effective unit of analysis is really the paper not 
the equation.  The first, and simplest, version of the model, appears in the paper as
\begin{equation}
  \log \lambda \quad = \quad  \underset{(0.149)}{1.336} ~+~ \underset {(0.017)}{0.235} \log z.
\end{equation}
where $\lambda$ is Poisson rate parameter, the expected number of parameters $y$ in
the equation, and $z$ is the sample size.  The parameter of interest is the
elasticity of parsimony, or \emph{parsity}, the log derivative of $\lambda$ with respect
to $z$.   In this case, parsity is constant and equal to about $1/4$ implying
that parametric dimension increases roughly like the fourth root of the
sample size.  The paper dutifully reports that the computations were conducted
using release~3 of the \proglang{GLIM} system \citep{repro:Baker+Nelder:1978}.
Our first attempt to replicate these results in \proglang{R} invoked the incantation:
%
<<rk-data+equation1>>=
rk <- read.csv("rk.csv")
names(rk)[2:4] <- c("m", "y", "z")
rk1 <- glm(y ~ log(z), weights = 1/m, family = quasipoisson, data = rk)
coeftest(rk1)
@
%
<<dispersion, echo=FALSE>>=
dispersion <- function(object, type = "deviance")
  sum(residuals(object, type = type)^2)/df.residual(object)
@
%
The reader can imagine our relief seeing that the estimated coefficients
agreed perfectly with the published results, turning to dismay as we observed
that the standard errors differed by a factor of 
\Sexpr{round(sqrt(dispersion(rk1, type = "deviance")/dispersion(rk1, type = "pearson")),digits = 3)}.  
Covariance matrices for the parameters in quasi-Poisson models require 
an estimated dispersion parameter:
typically some generalization of a residual sum of squares scaled by the
residual degrees of freedom. In the text of the paper, this was claimed
to be the scaled Pearson statistic
$\hat \sigma_P^2 = (n-p)^{-1} \sum w \cdot (y - \hat y)^2 / \hat y$,
following the advice offered by \citet[pp.~172--173]{repro:McCullagh+Nelder:1983}.
Unfortunately, however, the authors of the \proglang{GLIM} system, in their wisdom,  
chose to use the corresponding \emph{deviance} statistic instead of the 
Pearson statistic, that is
they scaled the variance-covariance matrix of the estimated coefficients by
$\hat \sigma_D^2 = (n-p)^{-1} 2 \sum w \cdot (y \log(y/\hat y) - (y - \hat y))$.
\proglang{R} adopts the Pearson-based estimate as its default and hence the
dispersion of \Sexpr{round(dispersion(rk1, type = "pearson"),digits = 2)}
reported in the summary above does not conform with the estimate of
\Sexpr{round(dispersion(rk1, type = "deviance"), digits = 2)} reported in the 
original paper.  Recalculating the \proglang{R} results using the
deviance-based estimate
%
<<rk1-dispersion, results=hide>>=
<<dispersion>>
summary(rk1, dispersion = dispersion(rk1))
@
%
reproduces the published results; see the first column of Table~\ref{tab:rk} for condensed output.
The lesson of this ``conflict of defaults'' is that it is dangerous  to make assumptions about
what software is doing; more careful reading of the \proglang{GLIM} manual would have
revealed that deviance residuals were being used and this error could have been
avoided.

In an effort to explore the sensitivity of the parsity estimate to the specification of the
functional form of the model, several other models were reported in 
\cite{repro:Koenker:1988}.  Table~\ref{tab:rk} summarizes our attempt to replicate
these results.  For each of the four Poisson models we report, in addition to
coefficients and standard errors, both the Pearson and deviance dispersion estimates
and an estimate of the parsity parameter  $\pi$ evaluated at $z = 1000$ which is
roughly the geometric mean of the observations on sample size.  The quadratic-in-logs
model was very sensitive to a few outlying $z$ observations and two versions of the
estimates were reported, one for the full sample as Equation~2 and one for a reduced
sample excluding points with $z > 250,000$, as Equation~3.  Again, replication 
revealed an error:  the text of the paper  reports that this cut-off was $500,000$,
but a yellowing page in the project file folder and the printed \proglang{GLIM}
output in the same folder clearly reveal that $250,000$ was used.  
This is a clear case in which a more
literate style of programming might have prevented the error by enforcing consistency
between the text and the econometric estimation.  Our use of \code{Sweave}  is one
way to accomplish this.
%
<<rk2-3>>=
rk2 <- glm(y ~ log(z) + I(log(z)^2), weights = 1/m, family = quasipoisson,
  data = rk)
rk3 <- glm(y ~ log(z) + I(log(z)^2), weights = 1/m, family = quasipoisson,
  data = rk, subset = z <= 250000)
@
%
The results reported in Table~\ref{tab:rk}  for Equations~1 through 4 agree with the
published results to the precision reported, except for the coefficient on $\log z$
in Equation~2 which appears to be an old-fashioned typo (because the published parsity
estimate is consistent with our replicated coefficient rather than the published coefficient).
Mistakes like this could again be more easily avoided by better integration of text and data analysis.

Two final models employed $\log \log z$ as the only covariate; 
first in the quasi-Poisson specification (Equation~4) and then using the negative binomial
likelihood. The former can again be reproduced via
%
<<rk-equation4>>=
rk4 <- glm(y ~ log(log(z)), weights = 1/m, family = quasipoisson, data = rk)
@
%
using the same specification of dispersion as above. 
The negative binomial results in \cite{repro:Koenker:1988} were computed, not
in \proglang{GLIM}, but with general code written by Richard Spady for maximum
likelihood estimation and linked to \proglang{S}. The same model can now
be estimated in \proglang{R} using the \code{glm.nb()} function from the 
\pkg{MASS} package \citep{repro:Venables+Ripley:2002}.
%
<<rk-negbin>>=
rk_nb <- glm.nb(y ~ log(log(z)), data = rk, weights = 1/m)
@
%
<<glm-summary-table, echo=FALSE, results=hide>>=
## convenience function for extracting from "glm" objects:
##   coef, se, deviance/pearson dispersion
glmtab <- function(obj, dispersion. = dispersion(obj), ...) {
  x <- t(summary(obj, dispersion = dispersion.)$coefficients[,1:2])
  x <- structure(as.vector(x), .Names = as.vector(rbind(colnames(x), 
    paste(colnames(x), "SD", sep = "_"))))
  c(x, "s2_deviance" = dispersion(obj, type = "deviance"),
    "s2_pearson" = dispersion(obj, type = "pearson"))
}
@
%
<<rk-summary-table, echo=FALSE, results=hide>>=
## call glmtab(), combine with parsity estimate, match rows
rk_ltab <- list(
  "rk1" = c(glmtab(rk1), "pi1000" = as.vector(coef(rk1)[2])),
  "rk2" = c(glmtab(rk2), "pi1000" = as.vector(coef(rk2)[2] + 2 * coef(rk2)[3] * log(1000))),
  "rk3" = c(glmtab(rk3), "pi1000" = as.vector(coef(rk3)[2] + 2 * coef(rk3)[3] * log(1000))),
  "rk4" = c(glmtab(rk4), "pi1000" = as.vector(coef(rk4)[2]/log(1000))),
  "rk_nb" = lmtab(rk_nb, vcov = sandwich))

rk_tab <- matrix(NA, ncol = 5, nrow = 11)
rownames(rk_tab) <- unique(unlist(sapply(rk_ltab, names)))[c(1:4, 8:11, 5:7)]
colnames(rk_tab) <- names(rk_ltab)
for(i in seq_along(rk_ltab)) rk_tab[names(rk_ltab[[i]]), i] <- rk_ltab[[i]]
@
%
\begin{table}[t]
\caption{\label{tab:rk}
  Replication of wage-equation meta-models, corresponding
  to \code{rk1}--\code{rk4} and \code{rk\_nb} with dependent variable $y$.  
  Deviance-based dispersion estimates are used for the quasi-Poisson models
  and sandwich standard errors for the negative binomial model.}
\begin{center}
\begin{tabular}{lrrrrr} \hline
& \multicolumn{4}{c}{Quasi-Poisson} & Neg-Bin \\
& \multicolumn{1}{c}{1} & \multicolumn{1}{c}{2} & \multicolumn{1}{c}{3}
& \multicolumn{1}{c}{4} & \\ \hline
<<rk-table, echo=FALSE, results=tex>>=
writeLines(tab2latex(rk_tab, last = 3, nam = c(
  "Constant", "", "$\\log z$", "", "$(\\log z)^2$", "", "$\\log \\log z$", "",
  "$\\sigma^2_D$", "$\\sigma^2_P$", "$\\pi(1000)$")))
@
\end{tabular}
\end{center}
\end{table}
%
For this model, sandwich standard errors were reported based on Spady's 
estimation of the Hessian and outer-product matrix of the gradient.  Our
attempt to replicate these results  employed the \proglang{R} \pkg{sandwich}
package as for the growth regressions in Section~\ref{sec:growth}.
The results are again shown in our Table~\ref{tab:rk}. 
Comparing these results with the published estimates we find somewhat larger 
discrepancies than for the quasi-Poisson regressions, but the results are
remarkably consistent.  We would conjecture that this degree of agreement 
would be rare in most instances where independent software was used to do
non-linear maximum likelihood estimation.


\section{Challenges and Conclusions} \label{sec:challenges}

From an economic perspective, the real challenge of reproducible econometric
research lies in restructuring incentives to encourage better archiving and
distribution of the gory details of computationally oriented research.
Technical progress in software and computer networking have dramatically lowered
the cost of reproducibility, but without stronger incentives from journals and
research funding agencies, further progress can be expected to be slow.
The JAE is exemplary in this respect since it has strongly encouraged data/software
archiving as well as replication studies.  It would be excellent if other journals
followed this lead.  Authors ultimately need to be convinced that it is in their
interest to provide detailed protocols for the computational aspects of their work.
This may require a sea change in attitudes about acknowledgment and citation 
practices.

Further technical progress can be expected in all of the realms we have reviewed:
more convenient archiving and version control, better tools for literate programming,
improved algorithms and user interfaces for statistical computing are all under active
development.  More rapid  diffusion of this new technology is what is really needed.

Web-based ``electronic appendices'' are increasingly common and this too is a welcome
development. However, further pressure by the journals, a better understanding
of the corresponding required quality standards by the scientific community, as well as
simplified automatic access would be very valuable.  
We are still far away from the Claerbout Principle, but good models do exist.  
WaveLab of \cite{repro:Buckheit+Donoho:1995} is a relatively early example,
the concept of a \emph{data compendium} suggested by \cite{repro:Gentleman+TempleLang:2007}
is another. Within economics there has been some discussion and evaluation of the
archival policies of the \emph{American Economic Review} and \emph{Journal of Money, Credit and Banking}
\citep[see][]{repro:McCullough+Vinod:2003,repro:McCullough+McGeary+Harrison:2006},
but much more is needed.

\section{Computational Details} \label{sec:computational}

Our results were obtained using 
\proglang{R}~\Sexpr{paste(R.Version()[6:7], collapse = ".")}---with the packages
\pkg{lmtest}~\Sexpr{packageDescription("lmtest")["Version"]}, \pkg{sandwich}~\Sexpr{packageDescription("sandwich")["Version"]},
and \pkg{MASS}~\Sexpr{packageDescription("MASS")["Version"]}---and
were identical on various platforms including Debian GNU/Linux (with a
2.6.26 kernel) and Mac OS X, version~10.4.10. {\TeX} Live 2007 was used for the original typesetting.
The full sources for this document (including data, \proglang{R} code and {\LaTeX} sources)
are available from \url{http://www.econ.uiuc.edu/~roger/research/repro/}. Hence, readers
can do as we do, not as we say, and fully reproduce our analyses.


\bibliography{repro}

\end{document}