cov.rob(x, cor=FALSE, quantile.used=floor((n + p + 1)/2), method=c("mve", "mcd", "classical"), nsamp="best", seed) cov.mve(x, cor=FALSE, quantile.used=floor((n + p + 1)/2), nsamp="best", seed) cov.mcd(x, cor=FALSE, quantile.used=floor((n + p + 1)/2), nsamp="best", seed)
x
| a matrix or data frame. |
cor
| should the returned result include a correlation matrix? |
quantile.used
|
the minimum number of the data points regarded as good points.
|
method
|
the method to be used minimum volume ellipsoid, minimum
covariance determinant or classical product-moment. Using
cov.mve or cov.mcd forces mve or mcd
respectively.
|
nsamp
|
the number of samples or "best" or "exact" or
"sample" .
If "sample" the number chosen is min(5*p, 3000) , taken
from Rousseeuw and Hubert (1997). If "best" exhaustive
enumeration is done up to 5000 samples: if "exact"
exhaustive enumeration will be attempted however many samples are needed.
|
seed
|
the seed to be used for random sampling: see RNGkind . The
current value of .Random.seed will be preserved if it is set.
|
good
part of the data. cov.mve
and
cov.mcd
are compatibility wrappers."mve"
, an approximate search is made of a subset of
size quantile.used
with an enclosing ellipsoid of smallest volume; in
method "mcd"
it is the volume of the Gaussian confidence
ellipsoid, equivalently the determinant of the classical covariance
matrix, that is minimized. The mean of the subset provides a first
estimate of the location, and the rescaled covariance matrix a first
estimate of scatter. The Mahalanobis distances of all the points from
the location estimate for this covariance matrix are calculated, and
those points within the 97.5% point under Gaussian assumptions are
declared to be good
. The final estimates are the mean and rescaled
covariance of the good
points.
The rescaling is by the appropriate percentile under Gaussian data; in addition the first covariance matrix has an ad hoc finite-sample correction given by Marazzi.
For method "mve"
the search is made over ellipsoids determined
by the covariance matrix of p
of the data points. For method
"mcd"
an additional improvement step suggested by Rousseeuw and
van Driessen (1997) is used, in which once a subset of size
quantile.used
is selected, an ellipsoid based on its covariance
is tested (as this will have no larger a determinant, and may be smaller).
center
| the final estimate of location. |
cov
| the final estimate of scatter. |
cor
|
(only is cor=TRUE ) the estimate of the correlation
matrix.
|
sing
| message giving number of singular samples out of total |
crit
| the value of the criterion on log scale. For MCD this is the determinant, and for MVE it is proportional to the volume. |
best
|
the subset used. For MVE the best sample, for MCD the best
set of size quantile.used .
|
n.obs
| total number of observations. |
A. Marazzi (1993) Algorithms, Routines and S Functions for Robust Statistics. Wadsworth and Brooks/Cole.
P. J. Rousseeuw and B. C. van Zomeren (1990) Unmasking multivariate outliers and leverage points, Journal of the American Statistical Association, 85, 633-639.
P. J. Rousseeuw and K. van Driessen (1997) A fast algorithm for the minimum covariance determinant estimator. Technical Report, Department of Mathematics and Computer Science, Universitaire Instelling Antwerpen.
lqs
data(stackloss) .Random.seed <- 1:4 cov.rob(stackloss) cov.rob(stack.x, method="mcd", nsamp="exact")