An R package implementing Variable Selection for Gaussian Model-Based Clustering.
Variable selection for Gaussian model-based clustering as implemented in the mclust package. The methodology allows to find the (locally) optimal subset of variables in a data set that have group/cluster information. A greedy or headlong search can be used, either in a forward-backward or backward-forward direction, with or without sub-sampling at the hierarchical clustering stage for starting mclust models. By default the algorithm uses a sequential search, but parallelisation is also available.
Installation
You can install the released version of clustvarsel from CRAN using:
install.packages("clustvarsel")
Usage
Usage of the main functions and several examples are included in the papers shown in the references section below.
For an intro see the vignette A quick tour of clustvarsel, which is available as
vignette("clustvarsel")
The vignette is also available in the Vignette section on the navigation bar on top of the package’s web page.
References
Raftery, A. E. and Dean, N. (2006) Variable Selection for Model-Based Clustering. Journal of the American Statistical Association, 101(473), 168-178.
Maugis, C., Celeux, G., Martin-Magniette M. (2009) Variable Selection for Clustering With Gaussian Mixture Models. Biometrics, 65(3), 701-709.
Scrucca, L. and Raftery, A. E. (2018) clustvarsel: A Package Implementing Variable Selection for Gaussian Model-based Clustering in R. Journal of Statistical Software, 84(1), pp. 1-28.