Refuting Criticism

Contents

11 Refuting Potential Criticisms of Gets
  11.1 Introduction
  11.2 Data-based model selection
  11.3 Measurement without theory
  11.4 Data mining
  11.5 Pre-test biases
  11.6 Ignoring selection effects
  11.7 Spurious significance from repeated testing
  11.8 Arbitrary choices of significance levels
  11.9 Lack of identification
  11.10 Path dependence of selection
  11.11 Implications
  11.12 What are the alternatives?
  11.12.1 The problems of simple-to-general modelling
  11.12.2 Retaining the initial general model
  11.12.3 Selecting models by minimizing an information criterion
  11.12.4 Testing theory models
  11.12.5 Other model-simplification approaches
  11.12.6 Gets
  11.12.7 Non-nested hypothesis tests and encompassing
  11.12.8 Bayesian model comparisons

Chapter 11 Refuting Potential Criticisms of Gets

11.1 Introduction

In this chapter, we will review and refute most of the extant criticisms of general-to-specific modelling. Critics of Gets have pointed to a number of potential difficulties, including the problems of:

  1. `data-based model selection', see, inter alia, Keynes (1939, 1940);

  2. `measurement without theory', see, inter alia, Koopmans (1947);

  3. `data mining', see, inter alia, Lovell (1983);

  4. `pre-test biases', see, inter alia, Judge and Bock (1978);

  5. `ignoring selection effects', see, inter alia, Leamer (1978);

  6. `repeated testing', see, inter alia, Leamer (1983a);

  7. `arbitrary choice of significance levels', see, inter alia, Hendry, Leamer and Poirier (1990);

  8. `lack of identification', and see, inter alia, Faust and Whiteman (1997);

  9. `potential path dependence of any selection': see, inter alia, Pagan (1987).

Despite the plethora of apparent problems which this list suggests afflicts any form of data modelling, almost all of these inter-related criticisms are refutable, as we now show. The following discussion draws on Hendry and Morgan (1995), Hendry (1995a) and Hendry (2000a).

As will be seen, much of the criticism has been based on theoretical arguments, without any examination of the actual operational characteristics of Gets: insights gleaned in analyzing and developing PcGets now allow theoretical refutations of these criticisms. Only a few studies have investigated how well general-to-specific (Gets) modelling actually achieves its objective of discovering the LDGP. However, Hoover and Perez (1999) offer important evidence in a major Monte Carlo, reconsidering the Lovell (1983) experiments. Lovell formed a databank of 20 macro-economic variables; generated one (denoted y) as a function of 0 to 5 others; regressed y on all 20 plus all lags thereof and an intercept, then examined how well some selection methods performed -- none did even reasonably. By way of contrast, Hoover and Perez (1999) develop a Gets-type approach, and let their algorithm simplify the GUM till it found a congruent, irreducible result. They checked up to 10 different search paths, testing at every stage for mis-specification, collected the results from each, then selected one choice from the remainder. By exploring many paths, the algorithm was protected against following a false route by chance, so generally delivered an undominated congruent model. Hendry and Krolzig (1999b) improved on their algorithm in several important respects; and PcGets further advances the efficiency of computer automation.

11.2 Data-based model selection

The earliest attack on data-based model selection is by Keynes (1939, 1940) on Tinbergen (1940a, 1940b): see Hendry (1980). Keynes asserts that several `pre-conditions' are needed to sustain inference from economic data including: a complete prior theoretical analysis; that all relevant phenomena are measurable; explanatory variables should represent `independent' influences; the choice of variables and lags should be pre-specified; and parameters should be constant over time. In Keynes's view, Tinbergen's work did not satisfy such pre-conditions, so he dismissed it. Thus, Keynes implicitly argued that statistical work in economics is impossible without complete knowledge of the final specification in advance. However, if partial explanations were in general an inadequate starting point, then empirical sciences could never discover anything new. Consequently, no science could have progressed -- a proposition that is strongly refuted by the historical record. The second fallacy in Keynes's argument is that theoretical models are unfortunately neither complete nor correct, so an econometrics that was forced to use such theories as the only permissible starting point for data analysis could contribute little useful knowledge -- except perhaps by rejecting such theories. When invariant features of reality exist, progressive research can discover them in part without prior knowledge of the whole: see Hendry (1995b).

11.3 Measurement without theory

The critique in Koopmans (1947) of the study by Burns and Mitchell (1946) followed up the previous debate, and set the scene for doubting all econometric analyses that failed to commence from pre-specified models. The catch-phrase `measurement without theory' became an insult capable of dismissing empirical work from further consideration -- without any empirical evaluation. But a similar critical analysis to that in the previous section also applies to Koopmans' claims: he relies on the (unstated) assumption that only one sort of economic theory is applicable, that it is correct, and that it is immutable (see Hendry and Morgan, 1995). Yet economic theory was progressing rapidly in his time (and still is), sometimes radically altering its views. Slavishly implementing current theories in econometric models without careful data analysis risks the problem of `theory dependence', namely, when theories are altered, `evidence' has to be discarded. That is not a good recipe for progress.

11.4 Data mining

As noted above, Lovell (1983) investigated the data-based selection of a `small' relation (zero to five regressors) hidden in a `large' model (40 variables), using a number of selection algorithms. In these experiments, one variable (yt) was generated by simulation as a function of the observed values (xi,t,  i=1,...,k) of a few other macro-economic variables (k=0,...,5 depending on the experiment) using the estimated parameter value from the observed data as the vector b, plus a set of random numbers {et}:

yt=åi=1kbixi,t+et   where et~IN[ 0,se2] .
(eq:11.1)

The {xi,t}are fixed across replications, and the error processes are mutually and serially independent, so xt is strongly exogenous for b (see Engle, Hendry and Richard, 1983). The generated {yt}was then regressed on all the variables in the database plus their first lags, commencing from the GUM:

yt=åj=14ajyt-j+åi=118åj=01gi,jxi,t-j+wt.
(eq:11.2)

The `methodology' was tested by seeing how often (eq:11.1) was found when commencing from (eq:11.2). Lovell achieved a low success rate, thereby suggesting that search procedures had high costs, and supporting the already adverse view of data-based model selection. Indeed, `data mining' also became a pejorative phrase, easily used to dismiss evidence without refutation (but see e.g., Sargan, 1973, Leamer, 1978, Hendry, 1995a, and Campos and Ericsson, 1999, who distinguish various senses of `data mining', and propose remedies). Some `modelling' activities seem unlikely to produce useful results -- including many of those considered by Lovell -- whereas others could still have excellent properties. For example, searching through `literally hundreds of regressions' (as in Friedman and Schwartz, 1982) for a result that corroborates a priori beliefs seems almost bound to deliver spurious findings. That version of data mining is revealed when conflicting evidence exists within the estimates that were scanned for support; or more generally, when rival models of the same phenomena cannot be encompassed -- because if they could be encompassed, then an undominated would model result even if an inappropriate selection procedure had been used (many scientific discoveries have actually resulted from mis-conceived approaches: see e.g., Hendry, 1995b, for a summary and bibliographic perspective). Thus, stringent critical evaluation renders any `data mining' criticism otiose: Gilbert (1986) suggests separating all relevant output into two groups, where the first contains only redundant results (those parsimoniously encompassed by the finally-selected model), and the second contains all other findings. If the second group is non-null, then there has been data mining. On such a characterization, Gets cannot involve data mining, despite depending heavily on data basing.

11.5 Pre-test biases

The fourth criticism concerns the use of significance tests to select variables in regression models. To understand this `pre-test' problem, we first distinguish between costs of inference and costs of search. Statistical tests have non-degenerate null distributions, and hence have non-zero size, and (generally) non-unit power. Consequently, even if the LDGP was correctly specified a priori from economic theory, when an investigator did not know that the resulting model was `true' -- so sought to test hypotheses about its coefficients -- then inferential mistakes can occur, the seriousness of which depend on the characteristics of the LDGP and the sample drawn. Should the selected model thereby differ from the LDGP, it will deliver biased coefficient estimates: this is called the `pre-test' problem, since unbiased estimates could have been obtained from the unrestricted model by conducting no tests. As no `search' is involved, we refer to the associated costs as those of inference: they are inevitable when the truth is unknown and tests must be conducted. The costs of search are any additional mistakes induced by commencing from an initial model that is larger than the LDGP. Thus, search costs comprise retaining any irrelevant variables (that chance to be significant), plus omitting relevant variables more often than an investigator who began from the LDGP. Surprisingly, search costs transpire to be fairly small, as shown in Chapter 10: PcGets retains relevant variables with probabilities close to those achieved when testing the LDGP, and omits irrelevant variables the anticipated percentage of the time. Moreover, the potentially huge costs of starting from too small a model -- which Gets helps protect against -- must not be forgotten.

11.6 Ignoring selection effects

The fifth critique concerns potential (downward) biases in reported coefficient standard errors from treating a data-selected model as if there was no uncertainty involved in its choice. If many different models could have been reported when applying a selection procedure to different samples from a given DGP, then results based on a single sample would seem to understate the `true' uncertainty. With his claim that `the mapping is the message', Leamer (1983a), even suggests that how a model is selected affects its `credibility', thereby emphasizing the selection process over the properties of the final choice. Certainly, conventionally-reported coefficient standard errors can only reflect sampling variation conditional on a fixed specification, and do not reflect changes in that specification from adventitious selection effects across samples (see e.g., Leamer, 1978, and Chatfield, 1995). Indeed, such reported standard errors are computed under the hypothesis of a congruent specification, and can be highly mis-leading when that hypothesis is false.

[Note: Using `autocorrelation and heteroscedasticity-consistent' standard errors (as in White, 1980, and Andrews, 1991) simply alters the maintained hypothesis to one in which the observed behaviour of the residuals is due to those characteristics in the errors, rather than other sources of mis-specification.]

Since few investigators `inflate' reported uncertainty to reflect the fact that models were selected from data evidence, which might be called `model uncertainty', Leamer (1983a, 1990) proposed investigating such sensitivity formally: his `extreme bounds analysis' (EBA) usually suggests considerable uncertainty. However, at no stage does EBA check whether any model is congruent, nor whether one representation provides an undominated contender, even though the consequences of deleting the most significant variable are frequently disastrous (see e.g., McAleer, Pagan and Volker, 1985, Breusch, 1990, Ericsson and Irons, 1994, and Hendry and Mizon, 1990, for criticisms).

Reported measures of uncertainty in PcGets are conditional on the selected equation being a good approximation to the LDGP: undominated (i.e., encompassing) congruent models have a strong claim to provide such an approximation. Consequently, the remaining questions concern the `reliability' with which the same selection will be made in repeated samples and the impact on uncertainty when different specifications are selected.

It is surprisingly difficult to determine the importance of `model uncertainty' theoretically, as it is highly dependent on the purpose of modelling. For example, even when PcGets correctly omits an irrelevant variable, there is a sense in which that `certainty' (of a zero standard error) is overstated since in a different sample drawing from that LDGP, such a variable could have been adventitiously significant! Generally, the outcome depends on the probabilities of omitting relevant, and including irrelevant, variables relative to the LDGP. The former depends on the unknown significance of each variable in the population (of samples of the given size), and the latter on the null-rejection frequencies of the selection procedures. Moreover, `collinearity' further confounds the analysis.

Perfect collinearity is an exact linear dependence between variables, whereas perfect orthogonality entails a diagonal estimated-coefficient covariance matrix. However, any other state depends on which `version' of a model is inspected, since many econometric models are invariant under linear transformations, whereas measures of collinearity are not. When the correlation between two I(0) variables xt and zt is rxz=0.9999, eliminating one of the two variables in any sample seems inevitable. Which will be dropped depends on chance, and oscillating between retaining xt or zt, as would happen in a Monte Carlo study where different variables were retained in different replications, might be thought to indicate considerable `model uncertainty'. However, a more appropriate metric would be to see how well the `correct' combination, say bxt+gzt, was captured, since each variable individually is a near-perfect proxy for any linear combination of them: here, selecting either variable alone, or any combination, does not greatly increase the uncertainty -- so long as the relation remains constant. That comment remains true even when one of the variables is irrelevant, although then PcGets' multiple-path search is more likely to select the correct equation. Conversely, if the system is not constant, the `collinearity' will be broken, albeit by the model suffering forecast failure. For example, when models are estimated to implement economic policy, changing only one of the two variables in such a `collinear' setting will not have the anticipated outcome -- although it will end the `collinearity' and thereby allow precise estimates of the separate effects.

The remainder of the chapter assumes that variables have been transformed to a `near orthogonal' representation before modelling. By having a high probability of selecting the LDGP in such an orthogonal setting, the reported uncertainties (such as estimated coefficient standard errors) in PcGets are not much distorted by selection effects. Thus, we prefer to consider the issue in terms of the operating characteristics of PcGets.

11.7 Spurious significance from repeated testing

The next critique in our list argues that the probability of retaining `irrelevant' variables (those that should not enter a relationship) will be high when a multitude of selection tests is applied, as some must deliver `significant' outcomes by chance despite the null being true.

The theory of repeated testing is easily understood: the probability p that none of n tests rejects at a significance level of 100a%is:

pa,n=( 1-a) n.

When n=40 tests of correct null hypotheses are conducted at (say) a=0.05, then p0.05,40~=0.13, so there is an 87%chance of retaining at least one irrelevant variable, which might seem an unacceptably-large type-I error. In fact, the average number of irrelevant variables retained is 2 (=na). However, p0.01,40~=0.67 and now na=0.4, so on more than two-thirds of the occasions that 40 tests are conducted under the null, none will reject. It is difficult to obtain spurious t-test values much in excess of three despite repeated testing: as Sargan (1981) pointed out, the t-distribution is `thin tailed', so the 0.5%critical value is less than three even for 50 degrees of freedom. Unfortunately, a stringent criterion for avoiding rejections when the null is true lowers the power of rejection when it is false, although this effect is unimportant for large t-values (e.g., in excess of five in absolute value).

The logic of repeated testing is accurate as a description of the statistical properties of mis-specification testing: conducting four independent diagnostic tests at 5%will lead to about 19%false rejections. Nevertheless, even in that context, there are possible solutions -- such as using a single combined test -- which can substantially lower the size without too great a power loss (see e.g., Godfrey and Veale, 1999). Alternatively, set a=0.01 as p0.01,4~=0.96, which delivers a 4%chance of `false rejection' overall.

It is less clear that the analysis is a valid characterization of selection procedures in general. First, as discussed in greater detail below, block tests can radically alter null selection probabilities: for example, when all 40 variables are irrelevant in the Monte Carlo experiments designed by Hoover and Perez (1999), PcGets actually determines that outcome more than 97%of the time (see Hendry and Krolzig, 1999b). Secondly, the operational characteristics of model-selection algorithms are dramatically improved when more than one path is searched, so there is `error correction' for invalid inclusions or reductions: see Hoover and Perez (1999). Searching only one path -- as in `step-wise' regression, which Leamer (1983a) parodies as `unwise' -- is generally detrimental because an early inappropriate decision can lead to a large final model. Finally, it transpires from the theoretical analyses presented below that the serious practical difficulty is not one of avoiding `spuriously significant' regressors because of repeated testing, it is retaining all the variables that genuinely matter -- even if the analysis commenced from the LDGP: see section 10.4. In general, our models will be too small, omitting relevant factors that happen not to be `significant' in the given sample, rather than too large by retaining adventitiously-significant variables.

11.8 Arbitrary choices of significance levels

Investigators often select a conventional nominal null-rejection frequency of a=5%, and set the critical values for test decisions accordingly. This controls the type-I error (when the actual rejection frequency is correctly calibrated), but ignores the corresponding impact on test power, and the relation of both to sample size (see Hendry, Leamer and Poirier, 1990). Since no theory is precisely true, when T is large overwhelming evidence can accrue against a null even when it is an excellent approximation. Conversely, if a does not gradually converge on zero, spurious variables will be retained even asymptotically. Thus, it does not seem sensible to fix a independently of T: the balance between Type I and II errors should alter with the weight of evidence in a progressive research strategy: as information accumulates, more precise statements should be possible.

Rules of the form a=1.6×T-0.9 match conventional choices at most sample sizes, namely, 10 per cent at T=20, 5 per cent at T=50, 2.5 per cent at about T=100 and 1 per cent around T=300, dropping to under 0.02 per cent at T=2000 (see Hendry, 1995a). Such rules do not affect the consistency of the test strategy, and seem to offer a reasonable balance between Type I and Type II errors when the actual cost of either mistake cannot be assigned. PcGets embodies rules of this form, which are in fact similar to those implicit in the Schwartz criterion.

11.9 Lack of identification

In econometrics, `identification' has three attributes, namely `uniqueness', `satisfying the required interpretation', and `correspondence to the desired entity' (see Hendry, 1997). A non-unique result is clearly not identified, so the first attribute is necessary: see Koopmans, Rubin and Leipnik (1950) for conditions ensuring the uniqueness of coefficients in simultaneous systems. Equally, uniqueness is insufficient, since it can be achieved by arbitrary restrictions (criticized by Sims, 1980, inter alia). There can exist a unique combination of several relationships which is incorrectly interpreted as one of those equations: e.g., a reduced form that has a positive price effect, wrongly interpreted as a supply relation. Finally, a unique, interpretable model of (say) a money-demand relation may in fact correspond to a Central Bank's response function, and this too is sometimes called `a failure to identify the demand relation'.

Most of these ideas have deep historical origins, as discussed in Hendry and Morgan (1995). In his critical review of Moore (1914), Wright (1915) introduced three notions of identification. First, he used correlation analysis to `identify' (i.e., discover) empirical regularities in noisy data and thereby cast doubt on cycles `identified' by Moore in his approach: this sense has persisted in the time-series literature (see Box and Jenkins, 1976). Secondly, Wright reacted to Moore's interpretation of an upward-sloping price-quantity equation for pig-iron as a demand relationship, and argued that the curve had been wrongly `identified' (i.e., mis-interpreted) as demand rather than supply. Finally, he demonstrated diagrammatically a third aspect of identification (i.e., uniqueness), by fixing one relationship and shifting the other to trace out the first. However, the problem that none of the relations so obtained need correspond to reality was not raised.

Simultaneity was once a potentially serious hindrance to identification, since economies are highly interdependent, but higher-frequency observations have attenuated that problem. However, a substantial role for `agents' expectations' in economic behaviour might also pose identification problems when no expectations data are available, even if the extreme informational and processing assumptions of `rational expectation' suggest it has little empirical relevance in a non-stationary world (see e.g., Pesaran, 1987, and Hendry and Mizon, 2000). The regular occurrence of structural breaks both `identifies' any remaining constant relations (those not combined with shifting relations), and highlights mis-matches between models and their corresponding LDGPs. Succinctly, `uniqueness' can always be achieved, perhaps by arbitrary restrictions; `interpretation' is in the eye of the beholder; but `correspondence to reality' requires data basing and rigorous evaluation, so identification need not be a problem for Gets.

11.10 Path dependence of selection

The final criticism of `path dependence' is that the results obtained in a modelling exercise might depend on the particular simplification sequence adopted by an investigator. Since the `quality' of a model is intrinsic to it, and progressive research induces a sequence of mutually-encompassing congruent models, proponents of Gets consider that the path adopted is unlikely to matter. As Hendry and Mizon (1990) expressed the matter: `the model is the message'. Nevertheless, some simplifications will lead to poorer representations than others. Hoover and Perez (1999) turned this problem on its head by proposing to search many feasible reduction paths. Since different outcomes might eventuate from different paths searched, this suggestion initially leads to a proliferation of choices. However, all of these are congruent models, so encompassing tests can be used to select the dominant equation. Consequently, a unique outcome results, with the property that it is congruent and undominated, resolving any `path dependence' critique: since PcGets ensures a unique outcome, the path does not matter.

11.11 Implications

Thus, every criticism of Gets has been refuted. Nevertheless, the outcome of these many attacks on data-based empirical research has been that almost all econometric studies had to commence from pre-specified models (or pretend they did). Summers (1991) failed to notice that this was the source of his claimed `scientific illusion': econometric evidence had become theory dependent, with little value added, and a strong propensity to be discarded when fashions in theory changed. Much empirical evidence only depends on low-level theories, which are part of the background knowledge base -- not subject to scrutiny in any current analysis -- so a data-based approach to studying the economy is feasible. Since theory dependence has at least as many drawbacks as sample dependence, data modelling procedures are essential: see Hendry (1995a).

On the positive side, productive data mining requires a procedure that has a high probability of locating the LDGP. In Monte Carlo, such an attribute can be checked, since the DGP itself is known: methods which do badly in that setting seem unlikely to do well in empirical research. Structured searches, such as those embodied in PcGets, perform well in simulation experiments, suggesting low search costs. At the theoretical level, White (1990) showed that, with sufficiently-rigorous testing, the selected model will converge to the DGP. Thus, `overfitting' (and mis-specification) problems are primarily finite sample. Also, Mayo (1981) emphasized that diagnostic test information is effectively independent of the sufficient statistics from which parameter estimates are derived, and protects against serious mis-specifications. While `never testing' is a brilliant strategy when a model coincides with the LDGP, it is usually exceptionally poor in the more likely situation that an a priori model is badly mis-specified. Thus, there is theoretical support for a Gets approach. Moreover, Hoover and Perez (1999) show how much better Gets is than any method Lovell considered, suggesting that modelling per se need not be bad. Indeed, the overall size of their selection procedure is close to that expected, and the power is reasonable. Re-running their experiments using PcGets delivers substantively better outcomes (see Hendry and Krolzig, 1999b).

11.12 What are the alternatives?

Many investigators in econometrics have worried about the consequences of selecting models from data evidence, pre-dating even the Cowles Commission, as noted above. Eight literature strands can be delineated, which comprise distinct research strategies, if somewhat overlapping at the margins. We briefly consider these in turn, as alternatives to Gets:

  1. simple-to-general modelling (see Anderson, 1971, Hendry and Mizon, 1978, and Hendry, 1979, for critiques);

  2. retaining the general model (see, e.g., Yancey and Judge, 1976, and Judge and Bock, 1978);

  3. model selection purely by information criteria: see e.g., Schwarz (1978), Hannan and Quinn (1979), Amemiya (1980), Shibata (1980), Chow (1981), and Akaike (1985);

  4. testing theory models (see e.g., Hendry and Mizon, 2000, for a critical appraisal);

  5. other `rules' for model selection, such as step-wise regression (see e.g., Leamer, 1983a, for a critical appraisal), and `optimal' regression (see e.g., Coen, Gomme and Kendall, 1969, and the following discussion);

  6. Gets: see e.g., Anderson (1971), Sargan (1973, 1981), Mizon (1977a, 1977b), Hendry (1979), and White (1990), with specific examples such as COMFAC (see Sargan, 1980), as well as the related literature on multiple hypothesis testing (well reviewed by Savin, 1984);

  7. model comparisons, often based on non-nested hypothesis tests or encompassing: see e.g., Cox (1961, 1962), Pesaran (1974), Deaton (1982), Kent (1986), Vuong (1989), and Gourieroux and Monfort (1995) for the former, and Hendry and Richard (1982), Mizon (1984), Mizon and Richard (1986), for the latter, surveyed in Hendry and Richard (1989);

  8. Bayesian model comparisons: see e.g., Leamer (1978), Clayton, Geisser and Jennings (1986)

PcGets blends aspects of all but the last of these, usually with substantive modifications. For example, `top down' searches which retain the most significant variables, and block eliminate all others, partly resemble simple-to-general in the order of testing, but with the fundamental difference that the whole inference procedure is conducted within a congruent GUM. If the GUM cannot be reduced by the criteria set by the user, then it will be retained, though not weighted as suggested in the literature noted. The ability to fix some variables as selected while evaluating the roles of others provides a viable alternative to simply including only the theoretically-relevant variables. Next, key problems with `stepwise' regression are that it only explores one path, so can get stuck, and does not check the congruence of either the starting model or reductions thereof. Clearly, PcGets is a member of the Gets class, but also implements many pre-search reduction tests, and emphasizes block tests over individual where ever that is feasible. Further, minimizing a model-selection criterion by itself does not check the congruence of the selection, which could therefore be rather poor. However, such criteria are applicable to select between mutually-encompassing congruent reductions. Finally, parsimonious encompassing is used to select between congruent simplifications within a common GUM, once contending terminal models have been chosen.

The only empirical `test' of the efficacy of alternative approaches to econometric modelling is reported in Magnus and Morgan (1999). Unfortunately, their attempt to have a researcher implement some of the approaches was decidedly unsuccessful in practical terms (although it did highlight the role of `tacit' knowledge of expert modellers): see (e.g.) the critique in Hendry (1999).

11.12.1 The problems of simple-to-general modelling

The paradigm of postulating a simple model and seeking to generalize it in the light of test rejections or anomalies is suspect for a number of reasons. First, there is no clear stopping point to an unordered search: the first non-rejection is obviously a bad strategy (see Anderson, 1971). Further, no control is offered over the significance level of testing, as it is not clear how many tests will be conducted.

Secondly, even if a general model is postulated at the outset as a definitive ending point, there remain difficulties with S-to-G. Often, simple models are non-congruent, inducing multiple test rejections. When two or more statistics reject, which (if either) has caused the problem? Should both, or only one, be `corrected'? Or should other factors be sought? If several tests are computed seriatim, and a `later' one rejects, then that invalidates all the earlier inferences, inducing a very inefficient research strategy. Indeed, until a model adequately characterizes the data, conventional tests are invalid: and it is obviously not sensible to skip testing in the hope that a model is valid.

Thirdly, alternative routes begin to multiply because simple-to-general is a divergent branching process -- there are many possible generalizations, and the selected model evolves differently depending on which is selected, and in what order. Thus, genuine path dependence can be created by such a search strategy.

Fourthly, once a problem is revealed by a test, it is unclear how to proceed. It is a potentially dangerous non sequitur to adopt the alternative hypothesis of the test which rejected: e.g., assuming residual autocorrelation is error autoregression.

Finally, if a model displays symptoms of mis-specification, there is little point in imposing further restrictions on it.

11.12.2 Retaining the initial general model

Another alternative is to keep every variable in the GUM, but `shrink' the estimates (see, e.g., Yancey and Judge, 1976, and Judge and Bock, 1978). Shrinkage entails weighting coefficient estimates in relation to their significance, so using a `smooth' discount rather than the zero-one weight of retain/delete. This approach has also been suggested as a solution to the `pre-test' problem. Naturally, shrinkage delivers biased estimators, so does not resolve that aspect, but is argued to have a lower `risk' than `pre-test' estimators. However, such a claim has no theoretical underpinnings in relationships linking processes which are subject to intermittent deterministic shifts: retained irrelevant variables than can precipitate predictive failure (see e.g., Clements and Hendry, 1999b). Moreover, progress entails explaining `more by less', which such an approach hardly facilitates. Notice that mis-specification testing is still essential to check the congruence of the GUM, which leads straight back to a testing approach.

The other area which often eschews reduction is VAR modelling (see e.g., Sims, 1980), but its lack of success has motivated a variety of `solutions' including the `Minnesota prior' for restricting the parameterization without checking the congruence of doing so (see e.g., Doan, Litterman and Sims, 1984). The apparent success of such a procedure is explained in Clements and Hendry (1998) by its `shrinking' towards a forecasting device that is `robust' to deterministic shifts, rather than its reflecting attributes of reality.

11.12.3 Selecting models by minimizing an information criterion

Another route would be to select models by minimizing an information criterion, especially a criterion which can be shown to lead to a consistent model selection (see e.g., Akaike, 1985, Schwarz, 1978, and Hannan and Quinn, 1979). These three `selection' rules are denoted AIC (for the Akaike information criterion), HQ (for Hannan--Quinn) and SC (for the Schwarz criterion). The associated information criteria are defined as follows:

AIC =-2 ln L/T+2n/T,
HQ =-2 ln L/T+2n ln ( ln (T))/T,
SC =-2 ln L/T+n ln (T)/T,

where L is the maximized likelihood, n is the number of estimated parameters and T is the sample size. The last two criteria ensure a consistent model selection: see e.g., Sawa (1978), Judge, Griffiths, Hill, Lütkepohl and Lee (1985), and Chow (1981). In practice, however, without checking that both the GUM and the selected model are congruent, the model which minimizes any information criterion has little to commend it (see Bontemps and Mizon, 2001).

11.12.4 Testing theory models

Finally, conventional econometric testing of economic theories is often conducted without first ascertaining the congruence of the models used: the dangers of doing so are discussed in Hendry and Mizon (2000). For example, when the data are not carefully analyzed, but a prior-specified theory-model simply imposed, unmodelled structural breaks can induce the opposite outcome to that which should be reported, namely accepting invalid theories and conversely. By protecting against such serious problems, PcGets may help `data mining' to become a compliment.

11.12.5 Other model-simplification approaches

There are probably an almost uncountable number of ways in which models could be selected from data evidence (or otherwise). The only other `rules' for model selection we consider here are step-wise regression (see e.g., Leamer, 1983a, for a critical appraisal), and `optimal' regression (see e.g., Coen, Gomme and Kendall, 1969, and the following discussion). As noted above, there are three key problems with `stepwise' regression. First, it does not check the congruence of the starting model, so cannot be sure the inference rules adopted have the operational characteristics believed of them (e.g., residual autocorrelation will distort estimated standard errors). Secondly, there are no checks on the congruence of reductions of the GUM, so again inference can become unreliable. Neither of these is intrinsic to step-wise or `optimal' regression (the latter tries almost every combination of variables, so borders on an information-criterion approach), and could be added to those approaches. However, the key problem with step-wise is that only one simplification path is explored, usually based on successively eliminating the least significant variable. Consequently, if a relevant variable is inadvertently eliminated early in the simplification, many others may be retained later to `proxy' its role, so the algorithm can get stuck and select far too large a model: Hoover and Perez (1999) found that this was an important problem in early versions of their algorithm for Gets, and hence implemented multiple-path searches.

11.12.6 Gets

General to simple approaches have a long pedigree: see inter alia, Anderson (1971), Sargan (1973, 1981), Mizon (1977a, 1977b), Hendry (1979), and White (1990). Important practical examples include COMFAC (see Sargan, 1980, and Hendry and Mizon, 1978), where the GUM must be sufficiently general to allow dynamic common factors to be detected; and cointegration (see e.g., Johansen, 1988), where the key feature is that all the inferences take place within the GUM (usually a vector autoregression). There is a very large related literature on testing multiple hypothesis (well reviewed by Savin, 1984), where the consequences of sequential tests are discussed, but most implicitly assumes a constant underlying stochastic process.

11.12.7 Non-nested hypothesis tests and encompassing

Model comparisons can be based on non-nested hypothesis tests or encompassing: see e.g., Cox (1961, 1962), Pesaran (1974), Deaton (1982), Kent (1986), and Vuong (1989) for the former, and Hendry and Richard (1982), Mizon (1984), Mizon and Richard (1986), and the survey in Hendry and Richard (1989) for the latter. However, as pointed out by Gourieroux and Monfort (1995), it is important that the models be congruent if inferences are to be valid: see in particular the discussion in Bontemps and Mizon (2001). Consequently, the role of this aspect in PcGets seems fully appropriate.

11.12.8 Bayesian model comparisons

Again, there are important overlaps in approach: for example, the Schwarz (1978) criterion (SC) is also known as the Bayesian information criterion (BIC). The most enthusiastic proponent of a Bayesian approach to model selection is Leamer (1978) (also see Leamer, 1983b), leading to his `practical' extreme-bounds analysis (see Leam 1983, 1984, 1990). However, this in turn has been heavily criticized: see inter alia, McAleer, Pagan and Volker (1985), Breusch (1990), and Hendry and Mizon (1990). If there was a good empirical basis for `prior' information, it would seem sensible to use it; but since the claims in Leamer' writing are that previous empirical research is seriously flawed, it is difficult to see where such priors might originate. Thus, we conclude that Gets is the most useful of the available approaches.

References

Akaike, A. (1973). "Information theory and an extension of the maximum likelihood principle" In Petrov, B. N., and Saki, F. L.(eds.), Second International Symposium of Information Theory. Budapest.

Akaike, H. (1985). "Prediction and entropy" In Atkinson, A. C., and Fienberg, S. E.(eds.), A Celebration of Statistics, pp. 1--24. New York: Springer-Verlag.

Akerlof, G. A. (1979). "Irving Fisher on his head: The consequences of constant target-threshold monitoring of money holdings" Quarterly Journal of Economics, 93, 169--188.

Amemiya, T. (1980). "Selection of regressors" International Economic Review, 21, 331--354.

Anderson, T. W. (1971). The Statistical Analysis of Time Series. New York: John Wiley & Sons.

Andrews, D. W. K. (1991). "Heteroskedasticity and autocorrelation consistent covariance matrix estimation" Econometrica, 59, 817--858.

Banerjee, A., Dolado, J. J., Galbraith, J. W., and Hendry, D. F. (1993). Co-integration, Error Correction and the Econometric Analysis of Non-Stationary Data. Oxford: Oxford University Press.

Bårdsen, G. (1989). "The estimation of long run coefficients from error correction models" Oxford Bulletin of Economics and Statistics, 50.

Bean, C. R. (1977). "More consumers' expenditure equations" Academic panel paper (77)35, H.M. Treasury, London.

Bean, C. R. (1978). "The determination of consumers' expenditure in the UK" Government economic service working paper 4, H.M. Treasury, London.

Bontemps, C., and Mizon, G. E. (2001). "Congruence and encompassing" In Stigum, B.(ed.), Studies in Economic Methodology. Cambridge, Mass.: MIT Press.

Boswijk, H. P. (1992). Cointegration, Identification and Exogeneity, Vol. 37 of Tinbergen Institute Research Series. Amsterdam: Thesis Publishers.

Bowman, K. O., and Shenton, L. R. (1975). "Omnibus test contours for departures from normality based on Öb1 and b2" Biometrika, 62, 243--250.

Box, G. E. P., and Jenkins, G. M. (1976). Time Series Analysis, Forecasting and Control. San Francisco: Holden-Day. First published, 1970.

Box, G. E. P., and Pierce, D. A. (1970). "Distribution of residual autocorrelations in autoregressive-integrated moving average time series models" Journal of the American Statistical Association, 65, 1509--1526.

Breusch, T. S. (1990). "Simplified extreme bounds" in Granger 1990, pp. 72--81.

Brown, R. L., Durbin, J., and Evans, J. M. (1975). "Techniques for testing the constancy of regression relationships over time (with discussion)" Journal of the Royal Statistical Society B, 37, 149--192.

Burns, A. F., and Mitchell, W. C. (1946). Measuring Business Cycles. New York: NBER.

Campos, J., and Ericsson, N. R. (1999). "Constructive data mining: Modeling consumers' expenditure in Venezuela" Econometrics Journal, 2, 226--240.

Carruth, A., and Henley, A. (1990). "Can existing consumption functions forecast consumer spending in the late 1980s?" Oxford Bulletin of Economics and Statistics, 52, 211--222.

Chatfield, C. (1995). "Model uncertainty, data mining and statistical inference" Journal of the Royal Statistical Society, A, 158, 419--466. With discussion.

Chow, G. C. (1960). "Tests of equality between sets of coefficients in two linear regressions" Econometrica, 28, 591--605.

Chow, G. C. (1981). "Selection of econometric models by the information criteria" In Charatsis, E. G.(ed.), Proceedings of the Econometric Society European Meeting 1979, Ch. 8. Amsterdam: North-Holland.

Clayton, M. K., Geisser, S., and Jennings, D. E. (1986). "A comparison of several model selection procedures" In Goel, P., and Zellner, A.(eds.), Bayesian Inference and Decision Techniques: Elsevier Science.

Clements, M. P., and Hendry, D. F. (1998). Forecasting Economic Time Series. Cambridge: Cambridge University Press.

Clements, M. P., and Hendry, D. F. (1999a). Forecasting Non-stationary Economic Time Series. Cambridge, Mass.: MIT Press.

Clements, M. P., and Hendry, D. F. (1999b). "Modelling methodology and forecast failure" Unpublished typescript, Economics Department, University of Oxford.

Coen, P. G., Gomme, E. D., and Kendall, M. G. (1969). "Lagged relationships in economic forecasting" Journal of the Royal Statistical Society A, 132, 133--163.

Cox, D. R. (1961). "Tests of separate families of hypotheses" In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, pp. 105--123 Berkeley: University of California Press.

Cox, D. R. (1962). "Further results on tests of separate families of hypotheses" Journal of the Royal Statistical Society B, 24, 406--424.

Cran, G. W., Martin, K. J., and Thomas, G. E. (1977). "A remark on algorithms. AS 63: The incomplete beta integral. AS 64: Inverse of the incomplete beta function ratio" Applied Statistics, 26, 111--112.

D'Agostino, R. B. (1970). "Transformation to normality of the null distribution of g1" Biometrika, 57, 679--681.

Davidson, J. E. H., and Hendry, D. F. (1981). "Interpreting econometric evidence: The behaviour of consumers' expenditure in the UK" European Economic Review, 16, 177--192. Reprinted in Hendry, D. F., op. cit., (1993) and (2000).

Davidson, J. E. H., Hendry, D. F., Srba, F., and Yeo, J. S. (1978). "Econometric modelling of the aggregate time-series relationship between consumers' expenditure and income in the United Kingdom" Economic Journal, 88, 661--692. Reprinted in Hendry, D. F., op. cit., (1993) and (2000).

Deaton, A. S. (1982). "Model selection procedures or, does the consumption function exist" In Chow, G. C., and Corsi, P.(eds.), Evaluating the Reliability of Macro-Economic Models, Ch. 5. New York: John Wiley.

Doan, T., Litterman, R., and Sims, C. A. (1984). "Forecasting and conditional projection using realistic prior distributions" Econometric Reviews, 3, 1--100.

Doornik, J. A. (1999). Object-Oriented Matrix Programming using Ox 3rd ed. London: Timberlake Consultants Press.

Doornik, J. A., and Hansen, H. (1994). "A practical test for univariate and multivariate normality" Discussion paper, Nuffield College.

Doornik, J. A., and Hendry, D. F. (1996). GiveWin: An Interactive Empirical Modelling Program. London: Timberlake Consultants Press.

Doornik, J. A., and Hendry, D. F. (2001a). Econometric Modelling using PcGive 10, Volume II. London: Timberlake Consultants Press.

Doornik, J. A., and Hendry, D. F. (2001b). Interactive Monte Carlo Experimentation in Econometrics using PcNaive. London: Timberlake Consultants Press.

Engle, R. F. (1982a). "Autoregressive conditional heteroscedasticity, with estimates of the variance of United Kingdom inflation" Econometrica, 50, 987--1007.

Engle, R. F. (1982b). "Autoregressive conditional heteroskedasticity with estimates of the variance of UK inflation". 50, 987--1008.

Engle, R. F., Hendry, D. F., and Richard, J.-F. (1983). "Exogeneity" Econometrica, 51, 277--304. Reprinted in Hendry, D. F., Econometrics: Alchemy or Science? Oxford: Blackwell Publishers, 1993, and Oxford University Press, 2000; and in Ericsson, N. R. and Irons, J. S. (eds.) Testing Exogeneity, Oxford: Oxford University Press, 1994.

Engle, R. F., Hendry, D. F., and Trumbull, D. (1985). "Small sample properties of ARCH estimators and tests" Canadian Journal of Economics, 43, 66--93.

Engle, R. F., and White, H.(eds.)(1999). Cointegration, Causality and Forecasting. Oxford: Oxford University Press.

Ericsson, N. R. (1983). "Asymptotic properties of instrumental variables statistics for testing non-nested hypotheses" Review of Economic Studies, 50, 287--303.

Ericsson, N. R., Campos, J., and Tran, H.-A. (1990). "PC-GIVE and David Hendry's econometric methodology" Revista De Econometria, 10, 7--117.

Ericsson, N. R., Hendry, D. F., and Prestwich, K. M. (1998). "The demand for broad money in the United Kingdom, 1878--1993" Scandinavian Journal of Economics, 100, 289--324.

Ericsson, N. R., and Irons, J. S.(eds.)(1994). Testing Exogeneity. Oxford: Oxford University Press.

Faust, J., and Whiteman, C. H. (1997). "General-to-specific procedures for fitting a data-admissible, theory-inspired, congruent, parsimonious, encompassing, weakly-exogenous, identified, structural model of the DGP: A translation and critique" Carnegie--Rochester Conference Series on Public Policy, 47, 121--161.

Friedman, M., and Schwartz, A. J. (1982). Monetary Trends in the United States and the United Kingdom: Their Relation to Income, Prices, and Interest Rates, 1867--1975. Chicago: University of Chicago Press.

Frisch, R., and Waugh, F. V. (1933). "Partial time regression as compared with individual trends" Econometrica, 1, 221--223.

Gilbert, C. L. (1986). "Professor Hendry's econometric methodology" Oxford Bulletin of Economics and Statistics, 48, 283--307. Reprinted in Granger, C. W. J. (ed.) (1990), Modelling Economic Series. Oxford: Clarendon Press.

Godfrey, L. G. (1978). "Testing for higher order serial correlation in regression equations when the regressors include lagged dependent variables" Econometrica, 46, 1303--1313.

Godfrey, L. G., and Orme, C. D. (1994). "The sensitivity of some general checks to omitted variables in the linear model" International Economic Review, 35, 489--506.

Godfrey, L. G., and Veale, M. R. (1999). "Alternative approaches to testing by variable addition" Mimeo, York University, UK.

Gourieroux, C., and Monfort, A. (1995). "Testing, encompassing, and simulating dynamic econometric models" Econometric Theory, 11, 195--228.

Granger, C. W. J. (1969). "Investigating causal relations by econometric models and cross-spectral methods" Econometrica, 37, 424--438.

Granger, C. W. J.(ed.)(1990). Modelling Economic Series. Oxford: Clarendon Press.

Haavelmo, T. (1944). "The probability approach in econometrics" Econometrica, 12, 1--118. Supplement.

Hannan, E. J., and Quinn, B. G. (1979). "The determination of the order of an autoregression" Journal of the Royal Statistical Society, B, 41, 190--195.

Harris, R. I. D. (1995). Using Cointegration Analysis in Econometric Modelling. London: Prentice Hall.

Harvey, A. C. (1981). The Econometric Analysis of Time Series. Deddington: Philip Allan.

Harvey, A. C. (1990). The Econometric Analysis of Time Series, 2nd ed. Hemel Hempstead: Philip Allan.

Hendry, D. F. (1976). "The structure of simultaneous equations estimators" Journal of Econometrics, 4, 51--88. Reprinted in Hendry, D. F., op. cit., (1993) and (2000).

Hendry, D. F. (1979). "Predictive failure and econometric modelling in macro-economics: The transactions demand for money" In Ormerod, P.(ed.), Economic Modelling, pp. 217--242. London: Heinemann. Reprinted in Hendry, D. F., op. cit., (1993) and (2000).

Hendry, D. F. (1980). "Econometrics: Alchemy or science?" Economica, 47, 387--406. Reprinted in Hendry, D. F., Econometrics: Alchemy or Science? Oxford: Blackwell Publishers, 1993, and Oxford University Press, 2000.

Hendry, D. F. (1985). "Monetary economic myth and econometric reality" Oxford Review of Economic Policy, 1, 72--84. Reprinted in Hendry, D. F., op. cit., (1993) and (2000).

Hendry, D. F. (1994). "HUS revisited" Oxford Review of Economic Policy, 10, 86--106.

Hendry, D. F. (1995a). Dynamic Econometrics. Oxford: Oxford University Press.

Hendry, D. F. (1995b). "Econometrics and business cycle empirics" Economic Journal, 105, 1622--1636.

Hendry, D. F. (1996). "On the constancy of time-series econometric equations" Economic and Social Review, 27, 401--422.

Hendry, D. F. (1997). "On congruent econometric relations: A comment" Carnegie--Rochester Conference Series on Public Policy, 47, 163--190.

Hendry, D. F. (1999). "An econometric analysis of US food expenditure, 1931--1989" in Magnus, and Morgan 1999, pp. 341--361.

Hendry, D. F. (2000a). Econometrics: Alchemy or Science? Oxford: Oxford University Press. New Edition.

Hendry, D. F. (2000b). "Epilogue: The success of general-to-specific model selection" In Econometrics: Alchemy or Science?, pp. 467--490. Oxford: Oxford University Press. New Edition.

Hendry, D. F., and Doornik, J. A. (1994). "Modelling linear dynamic econometric systems" Scottish Journal of Political Economy, 41, 1--33.

Hendry, D. F., and Doornik, J. A. (1999). "The impact of computational tools on time-series econometrics" In Coppock, T.(ed.), Information Technology and Scholarship, pp. 257--269. Oxford: Oxford University Press.

Hendry, D. F., and Doornik, J. A. (2001). Econometric Modelling using PcGive 10: Volume I. London: Timberlake Consultants Press.

Hendry, D. F., and Ericsson, N. R. (1991a). "An econometric analysis of UK money demand in `Monetary Trends in the United States and the United Kingdom' by Milton Friedman and Anna J. Schwartz" American Economic Review, 81, 8--38.

Hendry, D. F., and Ericsson, N. R. (1991b). "Modeling the demand for narrow money in the United Kingdom and the United States" European Economic Review, 35, 833--886.

Hendry, D. F., and Krolzig, H.-M. (1999a). "General-to-specific model selection using PcGets for Ox" Unpublished paper, Economics Department, Oxford University.

Hendry, D. F., and Krolzig, H.-M. (1999b). "Improving on `Data mining reconsidered' by K.D. Hoover and S.J. Perez" Econometrics Journal, 2, 202--219.

Hendry, D. F., and Krolzig, H.-M. (2000). "The econometrics of general-to-simple modelling" Mimeo, Economics Department, Oxford University.

Hendry, D. F., Leamer, E. E., and Poirier, D. J. (1990). "A conversation on econometric methodology" Econometric Theory, 6, 171--261.

Hendry, D. F., and Mizon, G. E. (1978). "Serial correlation as a convenient simplification, not a nuisance: A comment on a study of the demand for money by the Bank of England" Economic Journal, 88, 549--563. Reprinted in Hendry, D. F., op. cit., (1993) and (2000).

Hendry, D. F., and Mizon, G. E. (1990). "Procrustean econometrics: or stretching and squeezing data" in Granger 1990, pp. 121--136.

Hendry, D. F., and Mizon, G. E. (1993). "Evaluating dynamic econometric models by encompassing the VAR" In Phillips, P. C. B.(ed.), Models, Methods and Applications of Econometrics, pp. 272--300. Oxford: Basil Blackwell.

Hendry, D. F., and Mizon, G. E. (1999). "The pervasiveness of Granger causality in econometrics" in Engle, and White 1999.

Hendry, D. F., and Mizon, G. E. (2000). "Reformulating empirical macro-econometric modelling" Oxford Review of Economic Policy, 16, 138--159.

Hendry, D. F., and Morgan, M. S. (1995). The Foundations of Econometric Analysis. Cambridge: Cambridge University Press.

Hendry, D. F., Muellbauer, J. N. J., and Murphy, T. A. (1990). "The econometrics of DHSY" In Hey, J. D., and Winch, D.(eds.), A Century of Economics, pp. 298--334. Oxford: Basil Blackwell.

Hendry, D. F., and Neale, A. J. (1987). "Monte Carlo experimentation using PC-NAIVE" In Fomby, T., and Rhodes, G. F.(eds.), Advances in Econometrics, Vol. 6, pp. 91--125. Greenwich, Connecticut: Jai Press Inc.

Hendry, D. F., and Richard, J.-F. (1982). "On the formulation of empirical models in dynamic econometrics" Journal of Econometrics, 20, 3--33. Reprinted in Granger, C. W. J. (ed.) (1990), Modelling Economic Series. Oxford: Clarendon Press and in Hendry D. F., op. cit., (1993) and (2000).

Hendry, D. F., and Richard, J.-F. (1989). "Recent developments in the theory of encompassing" In Cornet, B., and Tulkens, H.(eds.), Contributions to Operations Research and Economics. The XXth Anniversary of CORE, pp. 393--440. Cambridge, MA: MIT Press.

Hendry, D. F., and von Ungern-Sternberg, T. (1981). "Liquidity and inflation effects on consumers' expenditure" In Deaton, A. S.(ed.), Essays in the Theory and Measurement of Consumers' Behaviour, pp. 237--261. Cambridge: Cambridge University Press. Reprinted in Hendry, D. F., op. cit., (1993) and (2000).

Hendry, D. F., and Wallis, K. F.(eds.)(1984). Econometrics and Quantitative Economics. Oxford: Basil Blackwell.

Hoover, K. D., and Perez, S. J. (1999). "Data mining reconsidered: Encompassing and the general-to-specific approach to specification search" Econometrics Journal, 2, 167--191.

Jarque, C. M., and Bera, A. K. (1980). "Efficient tests for normality, homoscedasticity and serial independence of regression residuals" Economics Letters, 6, 255--259.

Johansen, S. (1988). "Statistical analysis of cointegration vectors" Journal of Economic Dynamics and Control, 12, 231--254. Reprinted in R.F. Engle and C.W.J. Granger (eds), Long-Run Economic Relationships, Oxford: Oxford University Press, 1991, 131--52.

Johansen, S. (1992). "Testing weak exogeneity and the order of cointegration in UK money demand" Journal of Policy Modeling, 14, 313--334.

Judge, G. G., and Bock, M. E. (1978). The Statistical Implications of Pre-Test and Stein-Rule Estimators in Econometrics. Amsterdam: North Holland Publishing Company.

Judge, G. G., Griffiths, W. E., Hill, R. C., Lütkepohl, H., and Lee, T.-C. (1985). The Theory and Practice of Econometrics, 2nd ed. New York: John Wiley.

Kent, J. T. (1986). "The underlying nature of nonnested hypothesis tests" Biometrika, 73, 333--343.

Keynes, J. M. (1939). "Professor Tinbergen's method" Economic Journal, 44, 558--568.

Keynes, J. M. (1940). "Comment" Economic Journal, 50, 154--156.

Kiviet, J. F. (1985). "Model selection test procedures in a single linear equation of a dynamic simultaneous system and their defects in small samples" Journal of Econometrics, 28, 327--362.

Kiviet, J. F. (1986). "On the rigor of some mis-specification tests for modelling dynamic relationships" Review of Economic Studies, 53, 241--261.

Koopmans, T. C. (1947). "Measurement without theory" Review of Economics and Statistics, 29, 161--179.

Koopmans, T. C., Rubin, H., and Leipnik, R. B. (1950). "Measuring the equation systems of dynamic economics" In Koopmans, T. C.(ed.), Statistical Inference in Dynamic Economic Models, No. 10 in Cowles Commission Monograph, Ch. 2. New York: John Wiley & Sons.

Krolzig, H.-M. (2001). "General-to-specific reductions of vector autoregressive processes" Economics discussion paper 2000-W34, Nuffield College, Oxford.

Krolzig, H.-M., and Hendry, D. F. (2001). "Computer automation of general-to-specific model selection procedures" Journal of Economic Dynamics and Control, 25, 831--866.

Leamer, E. E. (1978). Specification Searches. Ad-Hoc Inference with Non-Experimental Data. New York: John Wiley.

Leamer, E. E. (1983a). "Let's take the con out of econometrics" American Economic Review, 73, 31--43. Reprinted in Granger, C. W. J. (ed.) (1990), Modelling Economic Series. Oxford: Clarendon Press.

Leamer, E. E. (1983b). "Model choice and specification analysis" In Griliches, Z., and Intriligator, M. D.(eds.), Handbook of Econometrics, Vol. 1, Ch. 5. Amsterdam: North-Holland.

Leamer, E. E. (1984). "Global sensitivity results for generalized least squares estimates" Journal of the American Statistical Association, 79, 867--870.

Leamer, E. E. (1990). "Sensitivity analyses would help" in Granger 1990, pp. 88--96.

Ljung, G. M., and Box, G. E. P. (1978). "On a measure of lack of fit in time series models" Biometrika, 65, 297--303.

Lovell, M. C. (1983). "Data mining" Review of Economics and Statistics, 65, 1--12.

Lütkepohl, H. (1991). Introduction to Multiple Time Series Analysis. Berlin: Springer.

Magnus, J. R., and Morgan, M. S.(eds.)(1999). Methodology and Tacit Knowledge: Two Experiments in Econometrics. Chichester: John Wiley and Sons.

Majunder, K. L., and Bhattacharjee, G. P. (1973a). "Algorithm AS 63. The incomplete beta integral" Applied Statistics, 22, 409--411.

Majunder, K. L., and Bhattacharjee, G. P. (1973b). "Algorithm AS 64. Inverse of the incomplete beta function ratio" Applied Statistics, 22, 411--414.

Mayo, D. (1981). "Testing statistical testing" In Pitt, J. C.(ed.), Philosophy in Economics, pp. 175--230: D. Reidel Publishing Co. Reprinted as pp. 45--73 in Caldwell B. J. (1993), The Philosophy and Methodology of Economics, Vol. 2, Aldershot: Edward Elgar.

McAleer, M., Pagan, A. R., and Volker, P. A. (1985). "What will take the con out of econometrics?" American Economic Review, 95, 293--301. Reprinted in Granger, C. W. J. (ed.) (1990), Modelling Economic Series. Oxford: Clarendon Press.

Mizon, G. E. (1977a). "Inferential procedures in nonlinear models: An application in a UK industrial cross section study of factor substitution and returns to scale" Econometrica, 45, 1221--1242.

Mizon, G. E. (1977b). "Model selection procedures" In Artis, M. J., and Nobay, A. R.(eds.), Studies in Modern Economic Analysis, pp. P97--120. Oxford: Basil Blackwell.

Mizon, G. E. (1984). "The encompassing approach in econometrics" in Hendry, and Wallis 1984, pp. 135--172.

Mizon, G. E. (1995). "Progressive modelling of macroeconomic time series: the LSE methodology" In Hoover, K. D.(ed.), Macroeconometrics: Developments, Tensions and Prospects, pp. 107--169. Dordrecht: Kluwer Academic Press.

Mizon, G. E., and Richard, J.-F. (1986). "The encompassing principle and its application to non-nested hypothesis tests" Econometrica, 54, 657--678.

Moore, H. L. (1914). Economic Cycles -- Their Law and Cause. New York: MacMillan.

Muellbauer, J. N. J. (1994). "The assessment: Consumer expenditure" Oxford Review of Economic Policy, 10, 1--41.

Nicholls, D. F., and Pagan, A. R. (1983). "Heteroscedasticity in models with lagged dependent variables" Econometrica, 51, 1233--1242.

Pagan, A. R. (1984). "Model evaluation by variable addition" in Hendry, and Wallis 1984, pp. 103--135.

Pagan, A. R. (1987). "Three econometric methodologies: A critical appraisal" Journal of Economic Surveys, 1, 3--24. Reprinted in Granger, C. W. J. (ed.) (1990), Modelling Economic Series. Oxford: Clarendon Press.

Paroulo, P. (1996). "On the determination of integration indices in I(2) systems" Journal of Econometrics, 72, 313--356.

Pesaran, M. H. (1974). "On the general problem of model selection" Review of Economic Studies, 41, 153--171.

Pesaran, M. H.(ed.)(1987). The Limits of Rational Expectations. Oxford: Basil Blackwell.

Pike, M. C., and Hill, I. D. (1966). "Logarithm of the gamma function" Communications of the ACM, 9, 684.

Rahbek, A., Kongsted, H. C., and Jørgensen, C. (1999). "Trend-stationarity in the I(2) cointegration model" Journal of Econometrics, 90, 265--289.

Sargan, J. D. (1964). "Wages and prices in the United Kingdom: A study in econometric methodology (with discussion)" In Hart, P. E., Mills, G., and Whitaker, J. K.(eds.), Econometric Analysis for National Economic Planning, Vol. 16 of Colston Papers, pp. 25--63. London: Butterworth Co. Reprinted as pp. 275--314 in Hendry D. F. and Wallis K. F. (eds.) (1984). Econometrics and Quantitative Economics. Oxford: Basil Blackwell, and as pp. 124--169 in Sargan J. D. (1988), Contributions to Econometrics, Vol. 1, Cambridge: Cambridge University Press.

Sargan, J. D. (1973). "Model building and data mining" Discussion paper, London School of Economics. Presented to the Association of University Teachers of Economics, Meeting, Manchester, April 1973.

Sargan, J. D. (1980). "Some tests of dynamic specification for a single equation" Econometrica, 48, 879--897. Reprinted as pp. 191--212 in Sargan J. D. (1988), Contributions to Econometrics, Vol. 1, Cambridge: Cambridge University Press.

Sargan, J. D. (1981). "The choice between sets of regressors" Mimeo, Economics Department, London School of Economics.

Savin, N. E. (1984). "Multiple hypothesis testing" In Griliches, Z., and Intriligator, M. D.(eds.), Handbook of Econometrics, Vol. 2--3, Ch. 14. Amsterdam: North-Holland.

Sawa, T. (1978). "Information criteria for discriminating among alternative regression models" Econometrica, 46, 1273--1292.

Schwarz, G. (1978). "Estimating the dimension of a model" Annals of Statistics, 6, 461--464.

Shea, B. L. (1988). "Algorithm AS 239: Chi-squared and incomplete gamma integral" Applied Statistics, 37, 466--473.

Shenton, L. R., and Bowman, K. O. (1977). "A bivariate model for the distribution of Öb1 and b2" Journal of the American Statistical Association, 72, 206--211.

Shibata, R. (1980). "Asymptotically efficient selection of the order of the model for estimating parameters of a linear process" Annals of Statistics, 8, 147--164.

Sims, C. A. (1980). "Macroeconomics and reality" Econometrica, 48, 1--48. Reprinted in Granger, C. W. J. (ed.) (1990), Modelling Economic Series. Oxford: Clarendon Press.

Sims, C. A., Stock, J. H., and Watson, M. W. (1990). "Inference in linear time series models with some unit roots" Econometrica, 58, 113--144.

Smith, G. W. (1986). "A dynamic Baumol-Tobin model of money demand" Review of Economic Studies, 53, 465--469.

Spanos, A. (1989). "On re-reading Haavelmo: A retrospective view of econometric modeling" Econometric Theory, 5, 405--429.

Sullivan, R., Timmermann, A., and White, H. (1998). "Dangers of data-driven inference: The case of calendar effects in stock returns" Mimeo, Economics Department, University of California at San Diego.

Summers, L. H. (1991). "The scientific illusion in empirical macroeconomics" Scandinavian Journal of Economics, 93, 129--148.

Teräsvirta, T. (1976). "Effect of feedback on the distribution of the portmanteau statistic" Manuscript, London School of Economics.

Theil, H. (1971). Principles of Econometrics. London: John Wiley.

Tinbergen, J. (1940a). Statistical Testing of Business-Cycle Theories. Geneva: League of Nations. Vol. I: A Method and its application to Investment Activity.

Tinbergen, J. (1940b). Statistical Testing of Business-Cycle Theories. Geneva: League of Nations. Vol. II: Business Cycles in the United States of America, 1919--1932.

Vuong, Q. H. (1989). "Likelihood ratio tests for model selection and nonnested hypotheses" Econometrica, 50, 1--25.

White, H. (1980). "A heteroskedastic-consistent covariance matrix estimator and a direct test for heteroskedasticity" Econometrica, 48, 817--838.

White, H. (1984). Asymptotic Theory for Econometricians. London: Academic Press.

White, H. (1990). "A consistent model selection" in Granger 1990, pp. 369--383.

Wooldridge, J. M. (1999). "Asymptotic properties of some specification tests in linear models with integrated processes" in Engle, and White 1999, pp. 366--384.

Wright, P. G. (1915). "Moore's economic cycles" Quarterly Journal of Economics, 29, 631--641.

Yancey, T. A., and Judge, G. G. (1976). "A Monte Carlo comparison of traditional and stein-rule estimators under squared error loss" Journal of Econometrics, 4, 285--294.