Title: | Easily Carry Out Latent Profile Analysis (LPA) Using Open-Source or Commercial Software |
---|---|
Description: | Easily carry out latent profile analysis ("LPA"), determine the correct number of classes based on best practices, and tabulate and plot the results. Provides functionality to estimate commonly-specified models with free means, variances, and covariances for each profile. Follows a tidy approach, in that output is in the form of a data frame that can subsequently be computed on. Models can be estimated using the free open source 'R' packages 'Mclust' and 'OpenMx', or using the commercial program 'MPlus', via the 'MplusAutomation' package. |
Authors: | Joshua M Rosenberg [aut, cre], Caspar van Lissa [aut], Jennifer A Schmidt [ctb], Patrick N Beymer [ctb], Daniel Anderson [ctb], Matthew J. Schell [ctb] |
Maintainer: | Joshua M Rosenberg <[email protected]> |
License: | MIT + file LICENSE |
Version: | 2.0.0 |
Built: | 2024-11-17 05:43:17 UTC |
Source: | https://github.com/cjvanlissa/tidyLPA |
tidyLPA suggests using the pipe operator, %>%
, from the magrittr
package (imported here from the dplyr package).
lhs , rhs
|
An object and a function to apply to it |
# Instead of subset(iris, select = c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")) # you can write iris %>% subset(select = c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"))
# Instead of subset(iris, select = c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")) # you can write iris %>% subset(select = c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width"))
Integrates information from several fit indices, and selects the best model.
AHP( fitindices, relative_importance = c(AIC = 0.2323, AWE = 0.1129, BIC = 0.2525, CLC = 0.0922, KIC = 0.3101) )
AHP( fitindices, relative_importance = c(AIC = 0.2323, AWE = 0.1129, BIC = 0.2525, CLC = 0.0922, KIC = 0.3101) )
fitindices |
A matrix or data.frame of fit indices, with colnames
corresponding to the indices named in |
relative_importance |
A named numeric vector. Names should correspond to
columns in |
Many fit indices are available for model selection. Following the procedure developed by Akogul and Erisoglu (2017), this function integrates information from several fit indices, and selects the best model, using Saaty's (1990) Analytic Hierarchy Process (AHP). Conceptually, the process consists of the following steps:
For each fit index, calculate the amount of support provided for each model, relative to the other models.
From these comparisons, obtain a "priority vector" of the amount of support for each model.
Compute a weighted average of the priority vectors for all fit indeces, with weights based on a simulation study examining each fit index' ability to recover the correct number of clusters (Akogul & Erisoglu, 2016).
Select the model with the highest weighted average priority.
Numeric.
Caspar J. van Lissa
iris[,1:4] %>% estimate_profiles(1:4) %>% get_fit() %>% AHP()
iris[,1:4] %>% estimate_profiles(1:4) %>% get_fit() %>% AHP()
Implements the ad-hoc adjusted likelihood ratio test (LRT) described in Formula 15 of Lo, Mendell, & Rubin (2001), or LMR LRT.
calc_lrt(n, null_ll, null_param, null_classes, alt_ll, alt_param, alt_classes)
calc_lrt(n, null_ll, null_param, null_classes, alt_ll, alt_param, alt_classes)
n |
Integer. Sample size |
null_ll |
Numeric. Log-likelihood of the null model. |
null_param |
Integer. Number of parameters of the null model. |
null_classes |
Integer. Number of classes of the null model. |
alt_ll |
Numeric. Log-likelihood of the alternative model. |
alt_param |
Integer. Number of parameters of the alternative model. |
alt_classes |
Integer. Number of classes of the alternative model. |
A numeric vector containing the likelihood ratio LR, the ad-hoc corrected LMR, degrees of freedom, and the LMR p-value.
Lo Y, Mendell NR, Rubin DB. Testing the number of components in a normal mixture. Biometrika. 2001;88(3):767–778. doi:10.1093/biomet/88.3.767
calc_lrt(150L, -741.02, 8, 1, -488.91, 13, 2)
calc_lrt(150L, -741.02, 8, 1, -488.91, 13, 2)
Takes an object of class 'tidyLPA', containing multiple latent profile models with different number of classes or model specifications, and helps select the optimal number of classes and model specification.
compare_solutions(x, statistics = "BIC")
compare_solutions(x, statistics = "BIC")
x |
An object of class 'tidyLPA'. |
statistics |
Character vector. Which statistics to examine for determining the optimal model. Defaults to 'BIC'. |
An object of class 'bestLPA' and 'list', containing a tibble of fits
'fits', a named vector 'best', indicating which model fit best according to
each fit index, a numeric vector 'AHP' indicating the best model according to
the AHP
, an object 'plot' of class 'ggplot', and a numeric
vector 'statistics' corresponding to argument of the same name.
Caspar J. van Lissa
iris_subset <- sample(nrow(iris), 20) # so examples execute quickly results <- iris %>% subset(select = c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")) %>% estimate_profiles(1:3) %>% compare_solutions()
iris_subset <- sample(nrow(iris), 20) # so examples execute quickly results <- iris %>% subset(select = c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")) %>% estimate_profiles(1:3) %>% compare_solutions()
This simulated dataset, based on Curry et al., 2019, contains data on moral relevance and judgment across the seven domains of the Morality As Cooperation scale.
data(curry_mac)
data(curry_mac)
A data.frame with 1392 rows and 42 variables.
sex | factor |
Self-identified sex of participants, Male, Female, or Transgendered. |
age_years | numeric |
Participants' age in years. |
KinshipR | numeric |
Mean score of moral relevance, kinship subscale. |
MutualismR | numeric |
Mean score of moral relevance, mutualism subscale. |
ExchangeR | numeric |
Mean score of moral relevance, exchange subscale. |
HawkR | numeric |
Mean score of moral relevance, hawk subscale. |
DoveR | numeric |
Mean score of moral relevance, dove subscale. |
DivisionR | numeric |
Mean score of moral relevance, division subscale. |
PossessionR | numeric |
Mean score of moral relevance, possession subscale. |
KinshipJ | numeric |
Mean score of moral judgment, kinship subscale. |
MutualismJ | numeric |
Mean score of moral judgment, mutualism subscale. |
ExchangeJ | numeric |
Mean score of moral judgment, exchange subscale. |
HawkJ | numeric |
Mean score of moral judgment, hawk subscale. |
DoveJ | numeric |
Mean score of moral judgment, dove subscale. |
DivisionJ | numeric |
Mean score of moral judgment, division subscale. |
PossessionJ | numeric |
Mean score of moral judgment, possession subscale. |
Curry, O. S., Jones Chesters, M., & Van Lissa, C. J. (2019). Mapping morality with a compass: Testing the theory of ‘morality-as-cooperation’ with a new questionnaire. Journal of Research in Personality, 78, 106–124. doi:10.1016/j.jrp.2018.10.008
This simulated dataset, based on Van Lissa et al., 2014, contains six annual assessments of adolescents' mean scores on the empathic concern and perspective taking subscales of the Interpersonal Reactivity Index (Davis, 1983). The first measurement wave occurred when adolescents were, on average, 13 years old, and the last one when they were 18 years old.
data(empathy)
data(empathy)
A data frame with 467 rows and 13 variables.
ec1 | numeric |
Mean score of empathic concern in wave 1 |
ec2 | numeric |
Mean score of empathic concern in wave 2 |
ec3 | numeric |
Mean score of empathic concern in wave 3 |
ec4 | numeric |
Mean score of empathic concern in wave 4 |
ec5 | numeric |
Mean score of empathic concern in wave 5 |
ec6 | numeric |
Mean score of empathic concern in wave 6 |
pt1 | numeric |
Mean score of perspective taking in wave 1 |
pt2 | numeric |
Mean score of perspective taking in wave 2 |
pt3 | numeric |
Mean score of perspective taking in wave 3 |
pt4 | numeric |
Mean score of perspective taking in wave 4 |
pt5 | numeric |
Mean score of perspective taking in wave 5 |
pt6 | numeric |
Mean score of perspective taking in wave 6 |
sex | factor |
Adolescent sex; M = male, F = female. |
Van Lissa, C. J., Hawk, S. T., Branje, S. J., Koot, H. M., Van Lier, P. A., & Meeus, W. H. (2014). Divergence Between Adolescent and Parental Perceptions of Conflict in Relationship to Adolescent Empathy Development. Journal of Youth and Adolescence, (Journal Article), 1–14. doi:10.1007/s10964-014-0152-5
Estimates latent profiles (finite mixture models) using either
the open source package mclust
or
[OpenMx:mxModel]{OpenMx}
, or the commercial program Mplus (using the
R-interface of MplusAutomation
).
estimate_profiles( df, n_profiles, models = NULL, variances = "equal", covariances = "zero", package = "mclust", select_vars = NULL, ... )
estimate_profiles( df, n_profiles, models = NULL, variances = "equal", covariances = "zero", package = "mclust", select_vars = NULL, ... )
df |
data.frame of numeric data; continuous indicators are required for mixture modeling. |
n_profiles |
Integer vector of the number of profiles (or mixture components) to be estimated. |
models |
Integer vector. Set to |
variances |
Character vector. Specifies which variance components to estimate. Defaults to "equal" (constrain variances across profiles); the other option is "varying" (estimate variances freely across profiles). Each element of this vector refers to one of the models you wish to run. |
covariances |
Character vector. Specifies which covariance components to estimate. Defaults to "zero" (do not estimate covariances; this corresponds to an assumption of conditional independence of the indicators); other options are "equal" (estimate covariances between items, constrained across profiles), and "varying" (free covariances across profiles). |
package |
Character. Which package to use; 'OpenMx', 'mclust', or 'MplusAutomation' (requires Mplus to be installed). Default: 'OpenMx'. |
select_vars |
Character. Optional vector of variable names in |
... |
Additional arguments are passed to the estimating function; i.e.,
|
Six models are currently available in tidyLPA, corresponding to the most common requirements. All models estimate the observed variable means for each class. The remaining parameters are:
Equal variances across classes; no covariances between observed variables
Varying variances across classes; no covariances between observed variables
Equal variances and equal covariances across classes
Varying variances and equal covariances (not available for package = 'mclust'
)
Equal variances and varying covariances (not available for package = 'mclust'
)
Varying variances and varying covariances
Two interfaces are available to estimate these models; specify their numbers
in the models
argument (e.g., models = 1
, or
models = c(1, 2, 3)
), or specify the variances/covariances to be
estimated (e.g.,: variances = c("equal", "varying"), covariances =
c("zero", "equal")
). Note that when package = 'mclust'
is used,
models = c(4, 5)
are not available. Use package = 'OpenMx'
or
package = 'Mplus'
to estimate these models.
A list of class 'tidyLPA'.
# to make example run more quickly iris_sample <- iris[c(1:10, 51:60, 101:114), ] # Example 1: iris_sample %>% subset(select = c("Sepal.Length", "Sepal.Width", "Petal.Length")) %>% estimate_profiles(3) # Example 2: iris %>% subset(select = c("Sepal.Length", "Sepal.Width", "Petal.Length")) %>% estimate_profiles(n_profiles = 1:4, models = 1:3) # Example 3: iris_sample %>% subset(select = c("Sepal.Length", "Sepal.Width", "Petal.Length")) %>% estimate_profiles(n_profiles = 1:4, variances = c("equal", "varying"), covariances = c("zero", "zero"))
# to make example run more quickly iris_sample <- iris[c(1:10, 51:60, 101:114), ] # Example 1: iris_sample %>% subset(select = c("Sepal.Length", "Sepal.Width", "Petal.Length")) %>% estimate_profiles(3) # Example 2: iris %>% subset(select = c("Sepal.Length", "Sepal.Width", "Petal.Length")) %>% estimate_profiles(n_profiles = 1:4, models = 1:3) # Example 3: iris_sample %>% subset(select = c("Sepal.Length", "Sepal.Width", "Petal.Length")) %>% estimate_profiles(n_profiles = 1:4, variances = c("equal", "varying"), covariances = c("zero", "zero"))
Estimates latent profiles (finite mixture models) using the open source
package mclust
.
estimate_profiles_mclust(df, n_profiles, model_numbers, select_vars, ...)
estimate_profiles_mclust(df, n_profiles, model_numbers, select_vars, ...)
df |
data.frame with two or more columns with continuous variables |
n_profiles |
Numeric vector. The number of profiles (or mixture components) to be estimated. Each number in the vector corresponds to an analysis with that many mixture components. |
model_numbers |
Numeric vector. Numbers of the models to be estimated.
See |
select_vars |
Character. Optional vector of variable names in |
... |
Parameters passed directly to |
An object of class 'tidyLPA' and 'list'
Caspar J. van Lissa
Estimates latent profiles (finite mixture models) using the commercial
program Mplus, through the R-interface of
MplusAutomation
.
estimate_profiles_mplus2( df, n_profiles, model_numbers, select_vars, ..., keepfiles = FALSE )
estimate_profiles_mplus2( df, n_profiles, model_numbers, select_vars, ..., keepfiles = FALSE )
df |
data.frame with two or more columns with continuous variables |
n_profiles |
Numeric vector. The number of profiles (or mixture components) to be estimated. Each number in the vector corresponds to an analysis with that many mixture components. |
model_numbers |
Numeric vector. Numbers of the models to be estimated.
See |
select_vars |
Character. Optional vector of variable names in |
... |
Parameters passed directly to
|
keepfiles |
Logical. Whether to retain the files created by
|
An object of class 'tidyLPA' and 'list'
Caspar J. van Lissa
Estimates latent profiles (finite mixture models) using the R-package OpenMx.
estimate_profiles_openmx(df, n_profiles, model_numbers, select_vars, ...)
estimate_profiles_openmx(df, n_profiles, model_numbers, select_vars, ...)
df |
data.frame with two or more columns with continuous variables |
n_profiles |
Numeric vector. The number of profiles (or mixture components) to be estimated. Each number in the vector corresponds to an analysis with that many mixture components. |
model_numbers |
Numeric vector. Numbers of the models to be estimated.
See |
select_vars |
Character. Optional vector of variable names in |
... |
Parameters passed to and from functions. |
An object of class 'tidyLPA' and 'list'
Caspar J. van Lissa
Get data from objects generated by tidyLPA.
get_data(x, ...) ## S3 method for class 'tidyLPA' get_data(x, ...) ## S3 method for class 'tidyProfile' get_data(x, ...)
get_data(x, ...) ## S3 method for class 'tidyLPA' get_data(x, ...) ## S3 method for class 'tidyProfile' get_data(x, ...)
x |
An object generated by tidyLPA. |
... |
further arguments to be passed to or from other methods. They are ignored in this function. |
If one model is fit, the data is returned in wide format as a tibble. If more than one model is fit, the data is returned in long form. See the examples.
tidyLPA
: Get data for a latent profile analysis with multiple
numbers of classes and models, of class 'tidyLPA'.
tidyProfile
: Get data for a single latent profile analysis object,
of class 'tidyProfile'.
Caspar J. van Lissa
## Not run: if(interactive()){ library(dplyr) # the data is returned in wide form results <- iris %>% select(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) %>% estimate_profiles(3) get_data(results) # note that if more than one model is fit, the data is returned in long form results1 <- iris %>% select(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) %>% estimate_profiles(c(3, 4)) get_data(results1) } ## End(Not run)
## Not run: if(interactive()){ library(dplyr) # the data is returned in wide form results <- iris %>% select(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) %>% estimate_profiles(3) get_data(results) # note that if more than one model is fit, the data is returned in long form results1 <- iris %>% select(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) %>% estimate_profiles(c(3, 4)) get_data(results1) } ## End(Not run)
Get estimates from objects generated by tidyLPA.
get_estimates(x, ...) ## S3 method for class 'tidyLPA' get_estimates(x, ...) ## S3 method for class 'tidyProfile' get_estimates(x, ...)
get_estimates(x, ...) ## S3 method for class 'tidyLPA' get_estimates(x, ...) ## S3 method for class 'tidyProfile' get_estimates(x, ...)
x |
An object generated by tidyLPA. |
... |
further arguments to be passed to or from other methods. They are ignored in this function. |
A tibble.
tidyLPA
: Get estimates for a latent profile analysis with
multiple numbers of classes and models, of class 'tidyLPA'.
tidyProfile
: Get estimates for a single latent profile analysis
object, of class 'tidyProfile'.
Caspar J. van Lissa
## Not run: if(interactive()){ results <- iris %>% select(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) %>% estimate_profiles(3) get_estimates(results) get_estimates(results[[1]]) } ## End(Not run)
## Not run: if(interactive()){ results <- iris %>% select(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) %>% estimate_profiles(3) get_estimates(results) get_estimates(results[[1]]) } ## End(Not run)
Get fit indices from objects generated by tidyLPA.
get_fit(x, ...) ## S3 method for class 'tidyLPA' get_fit(x, ...) ## S3 method for class 'tidyProfile' get_fit(x, ...)
get_fit(x, ...) ## S3 method for class 'tidyLPA' get_fit(x, ...) ## S3 method for class 'tidyProfile' get_fit(x, ...)
x |
An object generated by tidyLPA. |
... |
further arguments to be passed to or from other methods. They are ignored in this function. |
A tibble. Learn more at https://data-edu.github.io/tidyLPA/articles/Introduction_to_tidyLPA.html#getting-fit-statistics
tidyLPA
: Get fit indices for a latent profile analysis with
multiple numbers of classes and models, of class 'tidyLPA'.
tidyProfile
: Get fit indices for a single latent profile analysis
object, of class 'tidyProfile'.
Caspar J. van Lissa
## Not run: if(interactive()){ results <- iris %>% select(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) %>% estimate_profiles(3) get_fit(results) get_fit(results[[1]]) } ## End(Not run)
## Not run: if(interactive()){ results <- iris %>% select(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) %>% estimate_profiles(3) get_fit(results) get_fit(results[[1]]) } ## End(Not run)
This simulated dataset, based on Crochetti et al., 2014, contains five annual assessments of adolescents' mean scores on the commitment, exploration (in depth), and reconsideration subscales of the Utrecht-Management of Identity Commitments Scale (Crocetti et al., 2008). The scores reported here reflect the educational identity subscales of this instrument. The first measurement wave occurred when adolescents were, on average, 14 years old, and the last one when they were 18 years old.
data(id_edu)
data(id_edu)
A data frame with 443 rows and 16 variables.
com1 | numeric |
Mean score of educational commitment in wave 1 |
exp1 | numeric |
Mean score of educational exploration in wave 1 |
rec1 | numeric |
Mean score of educational reconsideration in wave 1 |
com2 | numeric |
Mean score of educational commitment in wave 2 |
exp2 | numeric |
Mean score of educational exploration in wave 2 |
rec2 | numeric |
Mean score of educational reconsideration in wave 2 |
com3 | numeric |
Mean score of educational commitment in wave 3 |
exp3 | numeric |
Mean score of educational exploration in wave 3 |
rec3 | numeric |
Mean score of educational reconsideration in wave 3 |
com4 | numeric |
Mean score of educational commitment in wave 4 |
exp4 | numeric |
Mean score of educational exploration in wave 4 |
rec4 | numeric |
Mean score of educational reconsideration in wave 4 |
com5 | numeric |
Mean score of educational commitment in wave 5 |
exp5 | numeric |
Mean score of educational exploration in wave 5 |
rec5 | numeric |
Mean score of educational reconsideration in wave 5 |
sex | factor |
Adolescent sex; M = male, F = female. |
Crocetti, E., Klimstra, T. A., Hale, W. W., Koot, H. M., & Meeus, W. (2013). Impact of early adolescent externalizing problem behaviors on identity development in middle to late adolescence: A prospective 7-year longitudinal study. Journal of Youth and Adolescence, 42(11), 1745-1758. doi:10.1007/s10964-013-9924-6
student questionnaire data with four variables from the 2015 PISA for students in the United States
pisaUSA15
pisaUSA15
Data frame with columns #'
composite measure of students' self reported broad interest
composite measure of students' self reported enjoyment
composite measure of students' self reported instrumental motivation
composite measure of students' self reported self efficacy
...
http://www.oecd.org/pisa/data/
Takes in a data.frame, and applies POMS (proportion of of maximum)-coding to the numeric columns.
poms(data)
poms(data)
data |
A data.frame. |
A data.frame.
Caspar J. van Lissa
data <- data.frame(a = c(1, 2, 2, 4, 1, 6), b = c(6, 6, 3, 5, 3, 4), c = c("a", "b", "b", "t", "f", "g")) poms(data)
data <- data.frame(a = c(1, 2, 2, 4, 1, 6), b = c(6, 6, 3, 5, 3, 4), c = c("a", "b", "b", "t", "f", "g")) poms(data)
S3 method 'print' for class 'tidyLPA'.
## S3 method for class 'tidyLPA' print( x, stats = c("AIC", "BIC", "Entropy", "prob_min", "prob_max", "n_min", "n_max", "BLRT_p"), digits = 2, na.print = "", ... )
## S3 method for class 'tidyLPA' print( x, stats = c("AIC", "BIC", "Entropy", "prob_min", "prob_max", "n_min", "n_max", "BLRT_p"), digits = 2, na.print = "", ... )
x |
An object of class 'tidyLPA'. |
stats |
Character vector. Statistics to be printed. Default: c("AIC", "BIC", "Entropy", "prob_min", "prob_max", "n_min", "n_max", "BLRT_p" ). |
digits |
minimal number of significant digits, see
|
na.print |
a character string which is used to indicate NA values in
printed output, or NULL. See |
... |
further arguments to be passed to or from other methods. They are ignored in this function. |
Caspar J. van Lissa
## Not run: if(interactive()){ iris %>% select(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) %>% estimate_profiles(3) } ## End(Not run)
## Not run: if(interactive()){ iris %>% select(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) %>% estimate_profiles(3) } ## End(Not run)
S3 method 'print' for class 'tidyProfile'.
## S3 method for class 'tidyProfile' print(x, digits = 2, na.print = "", ...)
## S3 method for class 'tidyProfile' print(x, digits = 2, na.print = "", ...)
x |
An object of class 'tidyProfile'. |
digits |
minimal number of significant digits, see
|
na.print |
a character string which is used to indicate NA values in
printed output, or NULL. See |
... |
further arguments to be passed to or from other methods. They are ignored in this function. |
Caspar J. van Lissa
## Not run: if(interactive()){ tmp <- iris %>% select(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) %>% estimate_profiles(3) tmp[[2]] } ## End(Not run)
## Not run: if(interactive()){ tmp <- iris %>% select(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) %>% estimate_profiles(3) tmp[[2]] } ## End(Not run)
This function accommodates several methods for single imputation of data. Currently, the following methods are defined:
"imputeData"Applies the mclust native imputation function
imputeData
"missForest"Applies non-parameteric, random-forest based data
imputation using missForest
. Radom forests can
accommodate any complex interactions and non-linear relations in the data. My
simulation studies indicate that this method is preferable to mclust's
imputeData
(see examples).
single_imputation(x, method = "imputeData")
single_imputation(x, method = "imputeData")
x |
A data.frame or matrix. |
method |
Character. Imputation method to apply, Default: 'imputeData' |
A data.frame
Caspar J. van Lissa
## Not run: library(ggplot2) library(missForest) library(mclust) dm <- 2 k <- 3 n <- 100 V <- 4 # Example of one simulation class <- sample.int(k, n, replace = TRUE) dat <- matrix(rnorm(n*V, mean = (rep(class, each = V)-1)*dm), nrow = n, ncol = V, byrow = TRUE) results <- estimate_profiles(data.frame(dat), 1:5) plot_profiles(results) compare_solutions(results) # Simulation for parametric data (i.e., all assumptions of latent profile # analysis met) simulation <- replicate(100, { class <- sample.int(k, n, replace = TRUE) dat <- matrix(rnorm(n*V, mean = (rep(class, each = V)-1)*dm), nrow = n, ncol = V, byrow = TRUE) d <- prodNA(dat) d_mf <- missForest(d)$ximp m_mf <- Mclust(d_mf, G = 3, "EEI") d_im <- imputeData(d, verbose = FALSE) m_im <- Mclust(d_im, G = 3, "EEI") class_tabl_mf <- sort(prop.table(table(class, m_mf$classification)), decreasing = TRUE)[1:3] class_tabl_im <- sort(prop.table(table(class, m_im$classification)), decreasing = TRUE)[1:3] c(sum(class_tabl_mf), sum(class_tabl_im)) }) # Performance on average rowMeans(simulation) # Performance SD colSD(t(simulation)) # Plot shows slight advantage for missForest plotdat <- data.frame(accuracy = as.vector(simulation), model = rep(c("mf", "im"), n)) ggplot(plotdat, aes(x = accuracy, colour = model))+geom_density() # Simulation for real data (i.e., unknown whether assumptions are met) simulation <- replicate(100, { d <- prodNA(iris[,1:4]) d_mf <- missForest(d)$ximp m_mf <- Mclust(d_mf, G = 3, "EEI") d_im <- imputeData(d, verbose = FALSE) m_im <- Mclust(d_im, G = 3, "EEI") class_tabl_mf <- sort(prop.table(table(iris$Species, m_mf$classification)), decreasing = TRUE)[1:3] class_tabl_im <- sort(prop.table(table(iris$Species, m_im$classification)), decreasing = TRUE)[1:3] c(sum(class_tabl_mf), sum(class_tabl_im)) }) # Performance on average rowMeans(simulation) # Performance SD colSD(t(simulation)) # Plot shows slight advantage for missForest plotdat <- data.frame(accuracy = as.vector(tmp), model = rep(c("mf", "im"), n)) ggplot(plotdat, aes(x = accuracy, colour = model))+geom_density() ## End(Not run)
## Not run: library(ggplot2) library(missForest) library(mclust) dm <- 2 k <- 3 n <- 100 V <- 4 # Example of one simulation class <- sample.int(k, n, replace = TRUE) dat <- matrix(rnorm(n*V, mean = (rep(class, each = V)-1)*dm), nrow = n, ncol = V, byrow = TRUE) results <- estimate_profiles(data.frame(dat), 1:5) plot_profiles(results) compare_solutions(results) # Simulation for parametric data (i.e., all assumptions of latent profile # analysis met) simulation <- replicate(100, { class <- sample.int(k, n, replace = TRUE) dat <- matrix(rnorm(n*V, mean = (rep(class, each = V)-1)*dm), nrow = n, ncol = V, byrow = TRUE) d <- prodNA(dat) d_mf <- missForest(d)$ximp m_mf <- Mclust(d_mf, G = 3, "EEI") d_im <- imputeData(d, verbose = FALSE) m_im <- Mclust(d_im, G = 3, "EEI") class_tabl_mf <- sort(prop.table(table(class, m_mf$classification)), decreasing = TRUE)[1:3] class_tabl_im <- sort(prop.table(table(class, m_im$classification)), decreasing = TRUE)[1:3] c(sum(class_tabl_mf), sum(class_tabl_im)) }) # Performance on average rowMeans(simulation) # Performance SD colSD(t(simulation)) # Plot shows slight advantage for missForest plotdat <- data.frame(accuracy = as.vector(simulation), model = rep(c("mf", "im"), n)) ggplot(plotdat, aes(x = accuracy, colour = model))+geom_density() # Simulation for real data (i.e., unknown whether assumptions are met) simulation <- replicate(100, { d <- prodNA(iris[,1:4]) d_mf <- missForest(d)$ximp m_mf <- Mclust(d_mf, G = 3, "EEI") d_im <- imputeData(d, verbose = FALSE) m_im <- Mclust(d_im, G = 3, "EEI") class_tabl_mf <- sort(prop.table(table(iris$Species, m_mf$classification)), decreasing = TRUE)[1:3] class_tabl_im <- sort(prop.table(table(iris$Species, m_im$classification)), decreasing = TRUE)[1:3] c(sum(class_tabl_mf), sum(class_tabl_im)) }) # Performance on average rowMeans(simulation) # Performance SD colSD(t(simulation)) # Plot shows slight advantage for missForest plotdat <- data.frame(accuracy = as.vector(tmp), model = rep(c("mf", "im"), n)) ggplot(plotdat, aes(x = accuracy, colour = model))+geom_density() ## End(Not run)
Latent Profile Analysis (LPA) is a statistical modeling approach for estimating distinct profiles, or groups, of variables. In the social sciences and in educational research, these profiles could represent, for example, how different youth experience dimensions of being engaged (i.e., cognitively, behaviorally, and affectively) at the same time.
tidyLPA provides the functionality to carry out LPA in R. In particular, tidyLPA provides functionality to specify different models that determine whether and how different parameters (i.e., means, variances, and covariances) are estimated and to specify (and compare solutions for) the number of profiles to estimate.