sessionInfo()
## R version 4.4.0 (2024-04-24)
## Platform: x86_64-apple-darwin20
## Running under: macOS Ventura 13.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: Europe/Rome
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics utils datasets grDevices methods base
##
## loaded via a namespace (and not attached):
## [1] gtable_0.3.5 jsonlite_1.8.8 dplyr_1.1.4
## [4] compiler_4.4.0 tidyselect_1.2.1 Rcpp_1.0.13
## [7] parallel_4.4.0 gridExtra_2.3 scales_1.3.0
## [10] fastmap_1.2.0 ggplot2_3.5.1 R6_2.5.1
## [13] generics_0.1.3 curl_5.2.1 knitr_1.48
## [16] htmlwidgets_1.6.4 tibble_3.2.1 munsell_0.5.1
## [19] pillar_1.9.0 rlang_1.1.4 utf8_1.2.4
## [22] V8_4.4.2 inline_0.3.19 xfun_0.46
## [25] rstan_2.32.6 RcppParallel_5.1.8 cli_3.6.3
## [28] magrittr_2.0.3 digest_0.6.36 grid_4.4.0
## [31] rstudioapi_0.16.0 lifecycle_1.0.4 StanHeaders_2.32.10
## [34] vctrs_0.6.5 evaluate_0.24.0 glue_1.7.0
## [37] QuickJSR_1.3.1 codetools_0.2-20 stats4_4.4.0
## [40] pkgbuild_1.4.4 fansi_1.0.6 colorspace_2.1-1
## [43] rmarkdown_2.27 tools_4.4.0 matrixStats_1.3.0
## [46] loo_2.8.0 pkgconfig_2.0.3 htmltools_0.5.8.1
Preface
Model-based clustering and classification methods provide a systematic statistical modeling framework for cluster analysis and classification. The model-based approach has gained in popularity because it allows the problems of choosing or developing an appropriate clustering or classification method to be understood within the context of statistical modeling.
mclust is a widely-used software package for the statistical environment R. It provides functionality for model-based clustering, classification, and density estimation, including methods for summarizing and visualizing the estimated models.
This book aims at giving a detailed overview of mclust and its features. A description of the modeling underpinning the software is provided, along with examples of its usage. In addition to serving as a reference manual for mclust, the book will be particularly useful to readers who plan to employ these model-based techniques in their research or applications.
Who is this book for?
The book is written to appeal to quantitatively trained readers from a wide range of backgrounds. An understanding of basic statistical methods, including statistical inference and statistical computing, is required. Throughout the book, examples and code are used extensively in an expository style to demonstrate the use of mclust for model-based clustering, classification, and density estimation.
Additionally, the book can serve as a reference for courses in multivariate analysis, statistical learning, machine learning, and data mining. It would also be a useful reference for advanced quantitative courses in application areas, including social sciences, physical sciences, and business.
Companion website
A companion website for this book is available at https://mclust-org.github.io/book
The website contains the R code to reproduce the examples and figures presented in the book, errata and various supplementary material.
Software information and conventions
The R session information when compiling this book is shown below:
Every R input command starts on a new line without any additional prompt (as >
or +
). The corresponding output is shown on lines starting with two hashes ##
, as it can be seen from the R session information above. Package names are in bold text (e.g., mclust), and inline code and file names are formatted in a typewriter font (e.g., data("iris", package = "datasets")
). Function names are followed by parentheses (e.g., Mclust()
).
Acknowledgments
The idea for writing this book arose during one of the yearly meetings of the Working Group on Model-Based Clustering, which constitutes a small but very active place for scholars from all over the world interested in mixture modeling. We thank all of the participants for providing the stimulating environment in which we started this project.
We are also fortunate to have benefited from a thorough review contributed by Bettina Grün, a leading expert in mixture modeling.
We have many others to thank for their contributions to mclust as users, collaborators, and developers. Thanks also to the R core team, and to those responsible for the many packages we have leveraged.
The development of the mclust package was supported over many years by the U.S. Office of Naval Research (ONR), and we acknowledge the encouragement and enthusiasm of our successive ONR program officers, Julia Abrahams and Wendy Martinez.
Chris Fraley is indebted to Tableau for supporting her efforts as co-author.
Brendan Murphy’s research was supported by the Science Foundation Ireland (SFI) Insight Research Centre (SFI/12/RC/2289\(\_\)P2), Vistamilk Research Centre (16/RC/3835) and Collegium de Lyon — Institut d’Études Avancées, Université de Lyon.
Adrian Raftery’s research was supported by the Eunice Kennedy Shriver National Institute for Child Health and Human Development (NICHD) under grant number R01 HD070936, by the Blumstein-Jordan and Boeing International Professorships at the University of Washington,and by the Fondation des Sciences Mathématiques de Paris (FSMP) and Université Paris-Cité.
Finally, special thanks to Rob Calver, Senior Publisher at Chapman & Hall/CRC, for his encouragement and enthusiastic support for this book.