Model-based Sliced Inverse Regression (MSIR)
msir.Rd
A dimension reduction method based on Gaussian finite mixture models which provides an extension to sliced inverse regression (SIR). The basis of the subspace is estimated by modeling the inverse distribution within slice using Gaussian finite mixtures with number of components and covariance matrix parameterization selected by BIC or defined by the user.
Usage
msir(x, y, nslices = msir.nslices, slice.function = msir.slices,
modelNames = NULL, G = NULL, cov = c("mle", "regularized"), ...)
Arguments
- x
A \((n \times p)\) design matrix containing the predictors data values.
- y
A \((n \times 1)\) vector of data values for the response variable. It can be a numeric vector (regression) but also a factor (classification). In the latter case, the levels of the factor define the slices used.
- nslices
The number of slices used, unless
y
is a factor. By default the value returned bymsir.nslices
.- slice.function
The slice functions to be used, by default
msir.slices
, but the user can provide a different slicing function.- modelNames
A vector of character strings indicating the Gaussian mixture models to be fitted as described in
mclustModelNames
. If a vector of strings is given they are used for all the slices. If a list of vectors is provided then each vector refers to a single slice.- G
An integer vector specifying the numbers of mixture components used in fitting Gaussian mixture models. If a list of vectors is provided then each vector refers to a single slice.
- cov
The predictors marginal covariance matrix. Possible choices are:
"mle"
: for the maximum likelihood estimate"regularized"
: for a regularized estimate of the covariance matrix (seemsir.regularizedSigma
)R matrix
: a \((p \times p)\) user defined covariance matrix
- ...
other arguments passed to
msir.compute
.
Value
Returns an object of class 'msir'
with attributes:
- call
the function call.
- x
the design matrix.
- y
the response vector.
- slice.info
output from slicing function.
- mixmod
a list of finite mixture model objects as described in
mclustModel
.- loglik
the log-likelihood for the mixture models.
- f
a vector of length equal to the total number of mixture components containing the fraction of observations in each fitted component within slices.
- mu
a matrix of component within slices predictors means.
- sigma
the marginal predictors covariance matrix.
- M
the msir kernel matrix.
- evalues
the eigenvalues from the generalized eigen-decomposition of
M
.- evectors
the raw eigenvectors from the generalized eigen-decomposition of
M
ordered according to the eigenvalues.- basis
the normalized eigenvectors from the generalized eigen-decomposition of
M
ordered according to the eigenvalues.- std.basis
standardized basis vectors obtained by multiplying each coefficient of the eigenvectors by the standard deviation of the corresponding predictor. The resulting coefficients are scaled such that all predictors have unit standard deviation.
- numdir
the maximal number of directions estimated.
- dir
the estimated MSIR directions from mean-centered predictors.
References
Scrucca, L. (2011) Model-based SIR for dimension reduction. Computational Statistics & Data Analysis, 55(11), 3010-3026.
Author
Luca Scrucca luca.scrucca@unipg.it
Examples
# 1-dimensional simple regression
n <- 200
p <- 5
b <- as.matrix(c(1,-1,rep(0,p-2)))
x <- matrix(rnorm(n*p), nrow = n, ncol = p)
y <- exp(0.5 * x%*%b) + 0.1*rnorm(n)
MSIR <- msir(x, y)
summary(MSIR)
#> --------------------------------------------------
#> Model-based SIR
#> --------------------------------------------------
#>
#> Slices:
#> 1 2 3 4 5 6
#> GMM XII EEE XXX EEI XXX VII
#> Num.comp. 1 2 1 3 1 2
#> Num.obs. 33 26|7 33 15|6|12 33 28|7
#>
#> Estimated basis vectors:
#> Dir1 Dir2 Dir3 Dir4 Dir5
#> x1 0.7097864 -0.48270 0.53769 0.12034 0.0022288
#> x2 -0.7041481 -0.42937 0.45878 0.13769 -0.1827196
#> x3 0.0111170 0.71477 0.68775 -0.22556 -0.0522356
#> x4 0.0052141 0.16238 0.16504 0.60551 0.7581749
#> x5 -0.0150997 -0.21300 -0.01333 -0.74097 0.6237394
#>
#> Dir1 Dir2 Dir3 Dir4 Dir5
#> Eigenvalues 0.91016 0.26585 0.15332 0.060237 0.012936
#> Cum. % 64.89545 83.85069 94.78262 99.077612 100.000000
plot(MSIR, type = "2Dplot")
# 1-dimensional symmetric response curve
n <- 200
p <- 5
b <- as.matrix(c(1,-1,rep(0,p-2)))
x <- matrix(rnorm(n*p), nrow = n, ncol = p)
y <- (0.5 * x%*%b)^2 + 0.1*rnorm(n)
MSIR <- msir(x, y)
summary(MSIR)
#> --------------------------------------------------
#> Model-based SIR
#> --------------------------------------------------
#>
#> Slices:
#> 1 2 3 4 5 6
#> GMM VVE XII XII XII EEE EEE
#> Num.comp. 2 1 1 1 2 2
#> Num.obs. 20|13 33 33 33 14|19 22|13
#>
#> Estimated basis vectors:
#> Dir1 Dir2 Dir3 Dir4 Dir5
#> x1 -0.740396 -0.39489 -0.247911 -0.45435 -0.37276
#> x2 0.666642 -0.41304 -0.083044 -0.49875 -0.29250
#> x3 0.068615 -0.65422 0.176128 0.67171 -0.33899
#> x4 -0.022374 -0.42329 0.596660 -0.23544 0.62061
#> x5 0.046850 -0.25746 -0.737983 0.19544 0.52481
#>
#> Dir1 Dir2 Dir3 Dir4 Dir5
#> Eigenvalues 0.76871 0.053058 0.031793 0.0088756 6.346e-03
#> Cum. % 88.48121 94.588434 98.247933 99.2695486 1.000e+02
plot(MSIR, type = "2Dplot")
plot(MSIR, type = "coefficients")
# 2-dimensional response curve
n <- 300
p <- 5
b1 <- c(1, 1, 1, rep(0, p-3))
b2 <- c(1,-1,-1, rep(0, p-3))
b <- cbind(b1,b2)
x <- matrix(rnorm(n*p), nrow = n, ncol = p)
y <- x %*% b1 + (x %*% b1)^3 + 4*(x %*% b2)^2 + rnorm(n)
MSIR <- msir(x, y)
summary(MSIR)
#> --------------------------------------------------
#> Model-based SIR
#> --------------------------------------------------
#>
#> Slices:
#> 1 2 3 4 5 6 7 8
#> GMM EEI EVE VEI XII EEV VEV EEV XXI
#> Num.comp. 2 2 2 1 2 2 2 1
#> Num.obs. 20|22 26|16 9|33 42 18|24 18|24 28|14 6
#>
#> Estimated basis vectors:
#> Dir1 Dir2 Dir3 Dir4 Dir5
#> x1 -0.3884439 0.939737 -0.101742 0.147305 0.026227
#> x2 0.6952440 0.215864 -0.613699 0.048259 0.063465
#> x3 0.6036119 0.209895 0.776560 0.059747 -0.091979
#> x4 0.0369820 -0.161043 -0.093403 0.985688 -0.065283
#> x5 0.0056592 0.017512 -0.035387 0.028667 -0.991243
#>
#> Dir1 Dir2 Dir3 Dir4 Dir5
#> Eigenvalues 0.71463 0.49623 0.29359 0.071781 0.047476
#> Cum. % 44.01218 74.57402 92.65524 97.076059 100.000000
plot(MSIR, which = 1:2)
if (FALSE) plot(MSIR, type = "spinplot")
plot(MSIR, which = 1, type = "2Dplot", span = 0.7)
plot(MSIR, which = 2, type = "2Dplot", span = 0.7)