Variable Selection  Introduction
Sometimes a large number of independent variables, X_{i},
is available for a given modeling problem, and not all of these predictor
variables may contribute equally well to the explanation of the predicted
variable Y. Some of the independent variables may not contribute at all
to the model. Thus we have to select from these variables to obtain a model
which contains as little variables as possible while still being the "best"
model. In principle, all possible combinations of independent variables
should be tried for calculating a suitable model. This could turn out to
be a formidable task, even if high performance computers are available.
Besides the practicability of this approach, there are also several theoretical
considerations which should be taken into account:

trying all possible combinations may lead to chance
correlations

the contribution of a single variable to the explanation of Y may not easily
be assessed if only a small number of observations is available

a simple criterion, like the goodness of fit, r^{2}, may lead to
wrong conclusions if the number of selected variables approaches the number
of observations

for more complicated models (e.g. artificial neural networks) the calculation
of a single model may be so timeconsuming that it is practically impossible
to find the "best" combination of independent variables

the selection of combinations is guided by the available data; thus the
resulting final selection reflects the "best" model for the given data
set, and not the "best" subset for the population

some of the selection methods are specifically tailored to linear (regression)
models; they are unusable with nonlinear methods such as neural networks.
Depending on the type of model being used, there are several strategies
to (partially) solve the problem:
Using all possible subsets of variables:
Stepwise procedures:
