Model-based mixture density estimation for bounded data
densityMclustBounded.RdDensity estimation for bounded data via transformation-based approach for Gaussian mixtures.
Usage
densityMclustBounded(data,
G = NULL, modelNames = NULL,
lbound = NULL,
ubound = NULL,
lambda = c(-3, 3),
prior = NULL,
noise = NULL,
nstart = 25,
parallel = FALSE,
seed = NULL,
...)
# S3 method for densityMclustBounded
print(x, digits = getOption("digits"), ...)
# S3 method for densityMclustBounded
summary(object, parameters = FALSE, classification = FALSE, ...)Arguments
- data
A numeric vector, matrix, or data frame of observations. If a matrix or data frame, rows correspond to observations and columns correspond to variables.
- G
An integer vector specifying the numbers of mixture components. By default
G=1:3.- modelNames
A vector of character strings indicating the Gaussian mixture models to be fitted on the transformed-data space. See
mclustModelNamesfor a descripton of available models.- lbound
Numeric vector proving lower bounds for variables.
- ubound
Numeric vector proving upper bounds for variables.
- lambda
A numeric vector providing the range of searched values for the transformation parameter(s).
- prior
A function specifying a prior for Bayesian regularization of Gaussian mixtures. See
priorControlfor details.- noise
A specification for the noise component. Currently, not available.
- nstart
An integer value specifying the number of replications of k-means clustering to be used for initializing the EM algorithm. See kmeans.
- parallel
An optional argument which allows to specify if the search over all possible models should be run sequentially (default) or in parallel.
For a single machine with multiple cores, possible values are:
a logical value specifying if parallel computing should be used (
TRUE) or not (FALSE, default) for evaluating the fitness function;a numerical value which gives the number of cores to employ. By default, this is obtained from the function
detectCores;a character string specifying the type of parallelisation to use. This depends on system OS: on Windows OS only
"snow"type functionality is available, while on Unix/Linux/Mac OSX both"snow"and"multicore"(default) functionalities are available.
In all the cases described above, at the end of the search the cluster is automatically stopped by shutting down the workers.
If a cluster of multiple machines is available, evaluation of the fitness function can be executed in parallel using all, or a subset of, the cores available to the machines belonging to the cluster. However, this option requires more work from the user, who needs to set up and register a parallel back end. In this case the cluster must be explicitely stopped with
stopCluster.- seed
An integer value containing the random number generator state. This argument can be used to replicate the result of k-means initialisation strategy. Note that if parallel computing is required, the doRNG package must be installed.
- x, object
An object of class
"densityMclustBounded".- digits
The number of significant digits to use for printing.
- parameters
Logical; if
TRUE, the parameters of mixture components are printed.- classification
Logical; if
TRUE, the MAP classification/clustering of observations is printed.- ...
Further arguments passed to or from other methods.
References
Scrucca L. (2019) A transformation-based approach to Gaussian mixture density estimation for bounded data. Biometrical Journal, 61:4, 873–888. https://doi.org/10.1002/bimj.201800174
Examples
# \donttest{
# univariate case with lower bound
x <- rchisq(200, 3)
xgrid <- seq(-2, max(x), length=1000)
f <- dchisq(xgrid, 3) # true density
dens <- densityMclustBounded(x, lbound = 0)
summary(dens)
#> ── Density estimation for bounded data via GMMs ───────────
#>
#> Boundaries: x
#> lower 0
#> upper Inf
#>
#> Model E (univariate, equal variance) model with 1 component
#> on the transformation scale:
#>
#> log-likelihood n df BIC ICL
#> -407.3154 200 3 -830.5257 -830.5257
#>
#> x
#> Range-power transformation: 0.2730149
summary(dens, parameters = TRUE)
#> ── Density estimation for bounded data via GMMs ───────────
#>
#> Boundaries: x
#> lower 0
#> upper Inf
#>
#> Model E (univariate, equal variance) model with 1 component
#> on the transformation scale:
#>
#> log-likelihood n df BIC ICL
#> -407.3154 200 3 -830.5257 -830.5257
#>
#> x
#> Range-power transformation: 0.2730149
#>
#> Mixing probabilities:
#> 1
#> 1
#>
#> Means:
#> 1
#> 0.9492891
#>
#> Variances:
#> 1
#> 1.183863
plot(dens, what = "BIC")
plot(dens, what = "density")
lines(xgrid, f, lty = 2)
plot(dens, what = "density", data = x, breaks = 15)
# univariate case with lower & upper bounds
x <- rbeta(200, 5, 1.5)
xgrid <- seq(-0.1, 1.1, length=1000)
f <- dbeta(xgrid, 5, 1.5) # true density
dens <- densityMclustBounded(x, lbound = 0, ubound = 1)
summary(dens)
#> ── Density estimation for bounded data via GMMs ───────────
#>
#> Boundaries: x
#> lower 0
#> upper 1
#>
#> Model E (univariate, equal variance) model with 1 component
#> on the transformation scale:
#>
#> log-likelihood n df BIC ICL
#> 121.6621 200 3 227.4293 227.4293
#>
#> x
#> Range-power transformation: -0.227797
plot(dens, what = "BIC")
plot(dens, what = "density")
plot(dens, what = "density", data = x, breaks = 9)
# bivariate case with lower bounds
x1 <- rchisq(200, 3)
x2 <- 0.5*x1 + sqrt(1-0.5^2)*rchisq(200, 5)
x <- cbind(x1, x2)
plot(x)
dens <- densityMclustBounded(x, lbound = c(0,0))
summary(dens, parameters = TRUE)
#> ── Density estimation for bounded data via GMMs ───────────
#>
#> Boundaries: x1 x2
#> lower 0 0
#> upper Inf Inf
#>
#> Model EEE (ellipsoidal, equal volume, shape and orientation) model with 2 components
#> on the transformation scale:
#>
#> log-likelihood n df BIC ICL
#> -848.3453 200 10 -1749.674 -1777.231
#>
#> x1 x2
#> Range-power transformation: 0.2439339 0.3001117
#>
#> Mixing probabilities:
#> 1 2
#> 0.2186624 0.7813376
#>
#> Means:
#> [,1] [,2]
#> [1,] 0.06442665 1.080146
#> [2,] 2.78470340 1.815451
#>
#> Variances:
#> [,,1]
#> [,1] [,2]
#> [1,] 0.9661706 0.5514568
#> [2,] 0.5514568 0.6529538
#> [,,2]
#> [,1] [,2]
#> [1,] 0.9661706 0.5514568
#> [2,] 0.5514568 0.6529538
plot(dens, what = "BIC")
plot(dens, what = "density")
plot(dens, what = "density", type = "hdr")
plot(dens, what = "density", type = "persp")
# }