Skip to contents

A function implementing a fast and efficient Modal EM algorithm for Gaussian mixtures.

Usage

GaussianMixtureMEM(data, pro, mu, sigma,
                   control = list(eps = 1e-5, 
                                  maxiter = 1e3, 
                                  stepsize = function(t) 1-exp(-0.1*t),
                                  denoise = TRUE,
                                  alpha = 0.01,
                                  keep.path = FALSE),
                   ...)

Arguments

data

A numeric vector, matrix, or data frame of observations. Categorical variables are not allowed. If a matrix or data frame, rows correspond to observations (\(n\)) and columns correspond to variables (\(d\)).

pro

A \((G \times 1)\) vector of mixing probabilities for a Gaussian mixture of \(G\) components.

mu

A \((d \times G)\) matrix of component means for a \(d\)-variate Gaussian mixture of \(G\) components.

sigma

A \((d \times d \times G)\) array of component covariance matrices for a \(d\)-variate Gaussian mixture of \(G\) components.

control

A list of control parameters:

eps, maxiter

numerical values setting the tolerance and the maximum number of iterations of the MEM algorithm;

stepsize

a function controlling the step size of the MEM algorithm;

denoise

a logical, if TRUE a denoising procedure is used when \(d > 1\) to discard all modes whose density is negligible;

alpha

a numerical value used when denoise = TRUE for computing the hypervolume of central \((1-\alpha)100\) region of a multivariate Gaussian:

keep.path

a logical controlling whether or not the full paths to modes must be returned.

...

Further arguments passed to or from other methods.

Value

Returns a list containing the following elements:

n

The number of input data points.

d

The number of variables/features.

parameters

The Gaussian mixture parameters.

iter

The number of iterations of MEM algorithm.

nmodes

The number of modes estimated by the MEM algorithm.

modes

The coordinates of modes estimated by MEM algorithm.

path

If requested, the coordinates of full paths to modes for each data point.

logdens

The log-density at the estimated modes.

logvol

The log-volume used for denoising (if requested).

classification

The modal clustering classification of input data points.

See also

Author

Luca Scrucca

References

Scrucca L. (2021) A fast and efficient Modal EM algorithm for Gaussian mixtures. Statistical Analysis and Data Mining, 14:4, 305–314. https://doi.org/10.1002/sam.11527