R: Kernel Density Estimation

Kernel Density Estimation

Usage

density(x, bw, adjust = 1, kernel="gaussian", window = kernel,
        n = 512, width, from, to, cut = 3, na.rm = FALSE)
print(dobj)
plot(dobj, main = NULL, xlab = NULL, ylab = "Density", type = "l",
     zero.line = TRUE, ...)

Arguments

`x`	the data from which the estimate is to be computed.
`n`	the number of equally spaced points at which the density is to be estimated. When `n > 512`, it is rounded up to the next power of 2 for efficieny reasons (`fft`).
`kernel,window`	a character string giving the smoothing kernel to be used. This must be one of `"gaussian"`, `"rectangular"`, `"triangular"`, or `"cosine"`, and may be abbreviated to a single letter.
`bw`	the smoothing bandwith to be used. This is the standard deviation of the smoothing kernel. It defaults to 0.9 times the minimum of the standard deviation and the interquartile range divided by 1.34 times the sample size to the negative one fifth power (= Silverman's ``rule of thumb''). The specified value of `bw` is multiplied by `adjust`.
`adjust`	the bandwith used is actually `adjust*bw`. This makes it easy to specify values like ``half the default'' bandwidth.
`width`	this exists for compatibility with S.
`from,to`	the left and right-most points of the grid at which the density is to be estimated.
`cut`	by default, the values of `left` and `right` are `cut` bandwidths beyond the extremes of the data. This allows the estimated density to drop to approximately zero at the extremes.
`na.rm`	logical; if `TRUE`, missing values are eliminated from `x` in advance to further computation.
`dobj`	a ``density'' object.
`main, xlab, ylab, type`	plotting parameters with useful defaults.
`...`	further plotting parameters.
`zero.line`	logical; if `TRUE`, add a base line at y = 0

Description

The function density computes kernel density estimates with the given kernel and bandwidth.

The generic functions plot and print have methods for density objects.

Details

The algorithm used in density disperses the mass of the empirical distribution function over a regular grid of at least 512 points and then uses the fast Fourier transform to convolve this approximation with a discretized version of the kernel and then uses linear approximation to evaluate the density at the specified points.

Value

An object with class "density". The underlying structure is a list containing the following components.

`x`	the `n` coordinates of the points where the density is estimated.
`y`	the estimated density values.
`bw`	the bandwidth used.
`N`	the sample size `length(x)`.
`call`	the call which produced the result.
`data.name`	the deparsed name of the `x` argument.
`has.na`	logical, indicating if there were `NA`s in the sample and `na.rm == FALSE`.

References

Silverman, B. W. (1986). Density Estimation. London: Chapman and Hall.

Venables, W. N. and B. D. Ripley (1994). Modern Applied Statistics with S-Plus. New York: Springer.

Scott, D. W. (1992). Multivariate Density Estimation. Theory, Practice and Visualization. New York: Wiley.

Sheather, S. J. and M. C. Jones (1991). ``A reliable data-based bandwidth selection method for kernel density estimation. J. Roy. Statist. Soc. B, 683-690.

Examples

# The Old Faithful geyser data
data(faithful)
d <- density(faithful$eruptions, bw=0.15)
d
plot(d)

plot(d, type="n")
polygon(d, col="wheat")

## Missing values:
x <- xx <- faithful$eruptions
x[i.out <- sample(length(x), 10)] <- NA
doR <- density(x, bw=0.15, na.rm = TRUE)
doN <- density(x, bw=0.15, na.rm = FALSE)
lines(doR, col="blue")
lines(doN, col="red")
points(xx[i.out], rep(.01,10))

[Package Contents]