Stein's (1956) celebrated shrinkage results imply that one can improve upon in Gaussian regression problems with parametric dimension by shrinking toward some fixed point, thereby trading bias for variance reduction. Judge and Bock (1978) treat this subject in some detail from an econometric standpoint. One might characterize this as statistical stoicism - through restraint, self-discipline and temperance we achieve the noble purpose of reduced mean square error. For others, it may be interpreted as a form of Bayesian parsimony.
We have no quarrel with such philosophies. They are fine for those who, like La Fontaine's ant, prefer to toil all summer to prepare for the hardships of winter. But who speaks out for the profligacy of the grasshopper, for gluttony and reckless abandon? Can such ideas have a place in the dismal annals of econometrics? We beg the gentle reader's momentary indulgence to consider the following foolishly profligate estimator:
Augment the p-columns of the initial design matrix X, by q randomly generated columns, D. Let , and consider estimating the augmented model by ordinary least squares,
and denote the familiar Eicker-White covariance matrix estimator for by
where and . Finally, let , so , and define the GMM estimator of , by solving
yielding,
This may appear to be the recipe of some demented sack-guzzler, but there is method in the madness. Our first result shows that we are not trading bias for variance reduction as in Stein estimation.
Proof: The argument is essentially that of Kakwani(1967); see also the treatment in Schmidt(1976). Fix Z and write
Observe that is an even function of u, that is u and -u yield the same . Since by assumption u and -u have the same distribution, it follows that and have the same distribution. And the result then follows by unconditioning on Z.
We can conclude from this result that any improvement in mean squared error acheived by must be purely a matter of variance reduction. Since for Gaussian F it is well known that is minimum variance unbiased we obviously must narrow the class of F's to exclude this case. Our next result which is an immediate corollary of Theorem 2.2 of Koenker, Machado, Skeels, and Welsh(1994), henceforth KMSW, specifies the class of distributions for which we may expect an improvement.
The first term in this variance expansion is familiar, it is the variance that would result had we used the true . The second term which is of order may be attributed to the ``heteroscedasticity correction'' of the GMM estimator and is probably less so. It is easy to see that the matrix is positive definite and consequently for distributions with kurtosis greater than 5, the Falstaff estimator, , has strictly smaller asymptotic covariance matrix, to order than the Gauss-Markov estimator .
Of course, for Gaussian F and other distributions with modest kurtosis the second term contributes a positive component and consequently the ``correction for heteroscedasticity'' is counter-productive. This degradation in performance at the Gaussian model is hardly surprising since classical sufficiency arguments as in Rothenberg(1984) imply such a loss is inevitable.
Intuitively, we would expect that ignoring the fact that our observations are homoscedastic couldn't help us. We should be punished for ignoring relevant information. Shouldn't we? How then do we gain from the profligate behavior of the Falstaff estimator? How can estimating an artificially expanded model and then correcting for heteroscedasticity in an iid error model conceivably increase the precision of our estimates? To explore these questions we begin by considering the particularly simple special case of estimating a scalar location parameter. Since in this case X = 1 an n-vector of ones, the form of is especially simple: , , and therefore , the number of augmented columns in D. Thus our expansion reduces in this case to
In the next section we report on a small monte-carlo experiment designed to evaluate the accuracy of this expansion for moderate sample sizes. What is the Falstaff estimator doing in this simple location context? The Falstaff estimator of location may be expressed as
Obviously, if is proportional to the identity matrix and D is orthogonal to X then . However, generally, all the coordinates of contribute to . If converges in probability to a nonstochastic matrix, the iid error assumption ensures that the limit is proportional to the identity. This simply restates the obvious fact that if is consistent as it would be in the present circumstances if q were fixed, then the Falstaff improvement vanishes as . Whether there may be some scope for asymptotic improvement if the sequence of matrices could be chosen to preserve a stochastic contribution from remains an open question.