Title: | Helper Functions for Linear Model Analysis |
---|---|
Description: | A collection of helper functions for multiple regression models fitted by lm(). Most of them are simple functions for simple tasks which can be done with coding, but may not be easy for occasional users of R. Most of the tasks addressed are those sometimes needed when using the 'manymome' package (Cheung and Cheung, 2023, <doi:10.3758/s13428-023-02224-z>) and 'stdmod' package (Cheung, Cheung, Lau, Hui, and Vong, 2022, <doi:10.1037/hea0001188>). However, they can also be used in other scenarios. |
Authors: | Shu Fai Cheung [aut, cre] |
Maintainer: | Shu Fai Cheung <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.4.0 |
Built: | 2024-11-14 05:30:41 UTC |
Source: | https://github.com/sfcheung/lmhelprs |
A eight-variable dataset with 100 cases.
data_test1
data_test1
A data frame with 100 rows and 8 variables:
Predictor. Numeric.
Predictor. Numeric.
Predictor. Numeric.
Predictor. Numeric.
Predictor. Numeric.
Outcome. Numeric.
Predictor. String. Values: "Alpha", "Beta", "Gamma"
Predictor. String. Values: "North", "South", "East", "West"
data(data_test1) lm(y ~ x1 + cat2 + cat1 + cat2:cat1, data_test1)
data(data_test1) lm(y ~ x1 + cat2 + cat1 + cat2:cat1, data_test1)
Check a list of 'lm' objects to see whether they are be ordered in a way for doing hierarchical regression analysis.
hierarchical(...)
hierarchical(...)
... |
The outputs of |
Two models can be compared by hierarchical regression analysis if one model can be formed by adding one or more terms to the other model.
This function checks whether a list
of lm
outputs can be ordered from
the simplest model to the most
complex model, with a more complex
model formed by adding one or more
terms to a simpler model.
It extracts the terms in each model
by stats::terms()
and then extracts
the labels of the terms by
labels()
. The labels are then used
to determine the hierarchical order.
Therefore, in principle, this
function can be used for the outputs
of other model fitting functions as
long as their outputs support the
stats::terms()
and the labels can
be used to determine hierarchical
order of two models.
If the models can be ordered in a
hierarchical way, the output is a
list of the original lm
outputs, sorted from the model with
the smallest number of terms to the
model with the largest number of
terms. If the models cannot be
ordered this way, NA
is returned.
Shu Fai Cheung https://orcid.org/0000-0002-9871-9448
dat <- data_test1 lm1 <- lm(y ~ x1 + x2, dat) lm2 <- lm(y ~ x1 + x2 + x3 + x4, dat) lm3 <- lm(y ~ x1 + cat1 + cat2 + x2 + x3 + x4, dat) lm4 <- lm(y ~ x1 + x2*x3 + x4, dat) # The order of entry does not matter hierarchical(lm1, lm4, lm2) # The following three models yield NA hierarchical(lm3, lm4, lm2)
dat <- data_test1 lm1 <- lm(y ~ x1 + x2, dat) lm2 <- lm(y ~ x1 + x2 + x3 + x4, dat) lm3 <- lm(y ~ x1 + cat1 + cat2 + x2 + x3 + x4, dat) lm4 <- lm(y ~ x1 + x2*x3 + x4, dat) # The order of entry does not matter hierarchical(lm1, lm4, lm2) # The following three models yield NA hierarchical(lm3, lm4, lm2)
Do hierarchical regression analysis on two or more models fitted by 'lm()'.
hierarchical_lm(...)
hierarchical_lm(...)
... |
The outputs of |
It conducted hierarchical
regression analysis on two or more
models fitted by stats::lm()
.
The models must be able to be ordered
from the simplest to the most complex,
with each more complex model formed
by adding one or more terms to the
simpler model.
ANOVA will be conducted to compare each model with the next more complex model in the order, with R-squared change computed.
If the models can be ordered in a
hierarchical way, the output is an
ANOVA table with the R-squared
estimate of each model, and the
R-squared change of each model
compared to the simpler model
preceding this model in the order.
The class of the output is
hierarchical_lm
, with a print
method. If the models cannot be
ordered this way, NA
is returned.
It call hierarchical()
firsts to
order the outputs for stats::lm()
,
If they can be ordered in a
hierarchical way, they will be passed
to stats::anova()
. R-squared and
R-squared change will be computed
if they are available in the
summary()
method applied to each
model.
Therefore, in principle, this
function can also be used for the
outputs of other model fitting
functions if their outputs have
stats::anova()
and summary()
methods.
The comparison is meaningful only
if all models are fitted to the
same datasets. There is not way
to guarantee this is the case, given
only the output of lm()
. However,
there are necessary conditions to
claim that the same datasets are used:
the number of cases are the same,
the means, variances, and covariances
of numerical variables, and the
frequency distributions of variables
common to two models are identical.
If at least one of these conditions
is not met, then two models must have
been fitted to two different datasets.
The function will check these conditions and raise an error if any of these necessary conditions are not met.
Shu Fai Cheung https://orcid.org/0000-0002-9871-9448
dat <- data_test1 lm1 <- lm(y ~ x1 + x2, dat) lm2 <- lm(y ~ x1 + x2 + x3 + x4, dat) lm3 <- lm(y ~ x1 + cat1 + cat2 + x2 + x3 + x4, dat) lm4 <- lm(y ~ x1 + x2*x3 + x4, dat) hierarchical_lm(lm1, lm3, lm2) hierarchical_lm(lm1, lm2, lm4) # The following models will yield an error message: tryCatch(hierarchical_lm(lm1, lm3, lm2, lm4), error = function(e) e)
dat <- data_test1 lm1 <- lm(y ~ x1 + x2, dat) lm2 <- lm(y ~ x1 + x2 + x3 + x4, dat) lm3 <- lm(y ~ x1 + cat1 + cat2 + x2 + x3 + x4, dat) lm4 <- lm(y ~ x1 + x2*x3 + x4, dat) hierarchical_lm(lm1, lm3, lm2) hierarchical_lm(lm1, lm2, lm4) # The following models will yield an error message: tryCatch(hierarchical_lm(lm1, lm3, lm2, lm4), error = function(e) e)
Convert the output of
many_lm()
to a lavann
-style
parameter table.
lm_list_to_partable( object, keep_intercepts = FALSE, vcov_args = list(), pvalue_fun = NULL, rsquare = FALSE )
lm_list_to_partable( object, keep_intercepts = FALSE, vcov_args = list(), pvalue_fun = NULL, rsquare = FALSE )
object |
The output of
|
keep_intercepts |
Logical. If
|
vcov_args |
A named list of
arguments to be passed to |
pvalue_fun |
The function to be used to compute the p-values of regression coefficients. Ignored for now. Included for adding this feature in the future. |
rsquare |
Logical. Whether
R-squares will be included in the
output, with |
This function convert a a
lit of lm
objects, such as the
output of many_lm()
or
manymome::lm2list()
, to a table of
parameter estimates similar to the
output of lavaan::parameterTable.
The output is designed to be used by
semPlot::semPaths()
and so contains
only information necessary for the
plot.
The output of stats::lm()
is
already supported by
semPlot::semPaths()
, and it can
also combine a list of regression
models into on single plot. However,
it will convert interaction terms to
knots. Moreover, if two interaction
terms in two different models share
the a variable, it will be incorrectly
combined to become a single knot
(Version 1.1.6). Therefore, this
function was developed to let users
to draw the model as if it were a
path model in structural equation
modeling.
A data frame object with columns such
as lhs
, op
, rhs
, and est
,
major columns of the output of
lavaan::parameterTable()
necessary
for plotting the model using
semPlot::semPaths()
.
Shu Fai Cheung https://orcid.org/0000-0002-9871-9448
many_lm()
and manymome::lm2list()
.
data(data_test1) mod <- "x3 ~ x2*x1 x4 ~ x3 x5 ~ x4 + x3" out <- many_lm(mod, data_test1) out_ptable <- lm_list_to_partable(out) out_ptable m <- matrix(c("x1", "x2", "x2:x1", NA, "x3", NA, "x4", NA, NA, NA, "x5", NA), nrow = 3, ncol = 4) m # The output can be used by semPlot::semPaths() if (requireNamespace("semPlot", quietly = TRUE)) { library(semPlot) p <- semPaths(out_ptable, what = "paths", whatLabels = "est", nCharNodes = 0, style = "ram", layout = m, exoCov = FALSE, DoNotPlot = TRUE) plot(p) # If it is desired to use knots to # denote interaction terms, then, # the output of many_lm() can be used # directly. m2 <- matrix(c("x1", NA, "x2", NA, "x3", NA, "x4", NA, NA, NA, "x5", NA), nrow = 3, ncol = 4) p2 <- semPaths(out, what = "paths", whatLabels = "est", nCharNodes = 0, style = "ram", layout = m2, exoCov = FALSE, intercepts = FALSE, DoNotPlot = TRUE) plot(p2) # This illustrates the problem with using # the list of lm-outputs directly when # a variable is involved in the interaction terms # of two or more models. m3 <- matrix(c("x2", NA, "x1", NA, "x3", NA, NA, NA, NA, NA, NA, "x4", NA, "x5", NA), nrow = 5, ncol = 3) mod3 <- "x4 ~ x2*x1 x5 ~ x3*x1" out3 <- many_lm(mod3, data_test1) p3 <- semPaths(out3, what = "paths", whatLabels = "est", nCharNodes = 0, style = "ram", layout = m3, exoCov = FALSE, intercepts = FALSE, DoNotPlot = TRUE) plot(p3) }
data(data_test1) mod <- "x3 ~ x2*x1 x4 ~ x3 x5 ~ x4 + x3" out <- many_lm(mod, data_test1) out_ptable <- lm_list_to_partable(out) out_ptable m <- matrix(c("x1", "x2", "x2:x1", NA, "x3", NA, "x4", NA, NA, NA, "x5", NA), nrow = 3, ncol = 4) m # The output can be used by semPlot::semPaths() if (requireNamespace("semPlot", quietly = TRUE)) { library(semPlot) p <- semPaths(out_ptable, what = "paths", whatLabels = "est", nCharNodes = 0, style = "ram", layout = m, exoCov = FALSE, DoNotPlot = TRUE) plot(p) # If it is desired to use knots to # denote interaction terms, then, # the output of many_lm() can be used # directly. m2 <- matrix(c("x1", NA, "x2", NA, "x3", NA, "x4", NA, NA, NA, "x5", NA), nrow = 3, ncol = 4) p2 <- semPaths(out, what = "paths", whatLabels = "est", nCharNodes = 0, style = "ram", layout = m2, exoCov = FALSE, intercepts = FALSE, DoNotPlot = TRUE) plot(p2) # This illustrates the problem with using # the list of lm-outputs directly when # a variable is involved in the interaction terms # of two or more models. m3 <- matrix(c("x2", NA, "x1", NA, "x3", NA, NA, NA, NA, NA, NA, "x4", NA, "x5", NA), nrow = 5, ncol = 3) mod3 <- "x4 ~ x2*x1 x5 ~ x3*x1" out3 <- many_lm(mod3, data_test1) p3 <- semPaths(out3, what = "paths", whatLabels = "est", nCharNodes = 0, style = "ram", layout = m3, exoCov = FALSE, intercepts = FALSE, DoNotPlot = TRUE) plot(p3) }
Fit a list of linear models defined by model syntax.
many_lm(models, data, na_omit_all = TRUE, ...)
many_lm(models, data, na_omit_all = TRUE, ...)
models |
Character. Model syntax. See Details. |
data |
The data frame. Must be
supplied if |
na_omit_all |
How missing data
is handled across models. If |
... |
Additional arguments. To
be passed to |
This function extracts
linear model formulas from a
model syntax (a character vector),
fits each of them by lm()
, and
stores the results in a list.
Lines with the first non-whitespace
character "#"
are treated as comments
and ignored.
Each line must be a valid formula
for lm()
.
If na_omit_all
is TRUE
, the
default, then cases with missing
data on at least one of the variables
used in the model will be removed.
Each call to lm()
will have subset
set to an integer vector of cases
not removed (i.e., cases retained)
subset
argumentIf subset
is used when calling this
function, it will also be used to
select cases.
Note that the subset
argument in
the call in each model will be replaced
by a numeric vector of cases retained,
determined by both missing data and
the original value of the subset
.
A list of the output of lm()
. The
class is lm_list_lmhelprs
.
Shu Fai Cheung https://orcid.org/0000-0002-9871-9448
data(data_test1) mod <- "x3 ~ x2 + x1 x4 ~ x3 x5 ~ x4*x1" out <- many_lm(mod, data_test1) summary(out)
data(data_test1) mod <- "x3 ~ x2 + x1 x4 ~ x3 x5 ~ x4*x1" out <- many_lm(mod, data_test1) summary(out)
hierarchial_lm
Class ObjectPrint the content of a 'hierarchical_lm'-class object.
## S3 method for class 'hierarchical_lm' print( x, digits = 4, signif.stars = getOption("show.signif.stars"), eps.Pvalue = 0.001, ... )
## S3 method for class 'hierarchical_lm' print( x, digits = 4, signif.stars = getOption("show.signif.stars"), eps.Pvalue = 0.001, ... )
x |
A |
digits |
The minimum number of
significant digits to be used for
most numbers. To be used by the print
method of |
signif.stars |
Logical. To be
used by the print method of
|
eps.Pvalue |
To be passed to
|
... |
Optional arguments. To
be passed to the print method of
|
The printout is very similar
to that of the print method of
an anova
object. It simply
overrides the default values for
some arguments, notably esp.Pvalue
to prevent small p-values to be
presented in scientific notation.
x
is returned invisibly.
Called for its side effect.
Shu Fai Cheung https://orcid.org/0000-0002-9871-9448
dat <- data_test1 lm1 <- lm(y ~ x1 + x2, dat) lm2 <- lm(y ~ x1 + x2 + x3 + x4, dat) lm3 <- lm(y ~ x1 + cat1 + cat2 + x2 + x3 + x4, dat) lm4 <- lm(y ~ x1 + x2*x3 + x4, dat) hierarchical_lm(lm1, lm3, lm2) hierarchical_lm(lm1, lm2, lm4)
dat <- data_test1 lm1 <- lm(y ~ x1 + x2, dat) lm2 <- lm(y ~ x1 + x2 + x3 + x4, dat) lm3 <- lm(y ~ x1 + cat1 + cat2 + x2 + x3 + x4, dat) lm4 <- lm(y ~ x1 + x2*x3 + x4, dat) hierarchical_lm(lm1, lm3, lm2) hierarchical_lm(lm1, lm2, lm4)
lm_list_lmhelprs
-Class
ObjectPrint the content of the
output of many_lm()
.
## S3 method for class 'lm_list_lmhelprs' print(x, ...)
## S3 method for class 'lm_list_lmhelprs' print(x, ...)
x |
The output of |
... |
Other arguments. Not used. |
Adapted from the package manymome
such that many_lm()
can be used
with manymome
.
x
is returned invisibly.
Called for its side effect.
data(data_test1) mod <- "x3 ~ x2 + x1 x4 ~ x3 x5 ~ x4*x1" out <- many_lm(mod, data_test1) out
data(data_test1) mod <- "x3 ~ x2 + x1 x4 ~ x3 x5 ~ x4*x1" out <- many_lm(mod, data_test1) out
lm_list_lmhelprs
-Class
ObjectThe summary of content
of the output of many_lm()
.
## S3 method for class 'lm_list_lmhelprs' summary(object, ...) ## S3 method for class 'summary_lm_list_lmhelprs' print(x, digits = 3, ...)
## S3 method for class 'lm_list_lmhelprs' summary(object, ...) ## S3 method for class 'summary_lm_list_lmhelprs' print(x, digits = 3, ...)
object |
The output of
|
... |
Other arguments. Not used. |
x |
An object of class
|
digits |
The number of significant digits in printing numerical results. |
summary.lm_list_lmhelprs()
returns a
summary_lm_list_lmhelprs
-class object, which
is a list of the summary()
outputs
of the lm()
outputs stored.
print.summary_lm_list_lmhelprs()
returns x
invisibly. Called for its side
effect.
Adapted from the package manymome
such that many_lm()
can be used
without manymome
.
print(summary_lm_list_lmhelprs)
: Print
method for output of summary for
lm_list_lmhelprs.
data(data_test1) mod <- "x3 ~ x2 + x1 x4 ~ x3 x5 ~ x4*x1" out <- many_lm(mod, data_test1) summary(out)
data(data_test1) mod <- "x3 ~ x2 + x1 x4 ~ x3 x5 ~ x4*x1" out <- many_lm(mod, data_test1) summary(out)
Identify the highest order terms in a model fitted by 'lm()', and compare this model to a model with this term removed using ANOVA.
test_highest(lm_out) highest_order(lm_out)
test_highest(lm_out) highest_order(lm_out)
lm_out |
The output of
|
The function test_highest()
first check if a model fitted by
stats::lm()
has a unique highest
order term (e.g., the term x1:x2
,
in the model y ~ x1 + x2 + x1:x2
).
If yes, it will fit a model with this
term removed, and then call
hierarchical_lm()
to compare the
original model with this reduced
model.
If the model does not have a unique highest order term, an error will be raised.
A hierarchical_lm
-class
object, which is the output of
hierarchical_lm()
. Two models
are compared, the original model and
the model with the unique highest
order term removed.
test_highest()
: Test the highest order term.
highest_order()
: Find the highest order term.
It relies on terms created by
stats::lm()
to determine the order
of each term. If a higher order term
is created manually (e.g.,
I(x1 * x2)
), then it cannot know
that this term is a second order
term.
Shu Fai Cheung https://orcid.org/0000-0002-9871-9448
dat <- data_test1 lm1 <- lm(y ~ x1 + x2 + cat1*x3, dat) lm2 <- lm(y ~ x1 + x2*x3 + x4, dat) test_highest(lm1) test_highest(lm2) highest_order(lm1) highest_order(lm2) # The followings will yield an error lm3 <- lm(y ~ x1 + x2 + x3, dat) summary(lm3) tryCatch(test_highest(lm3), error = function(e) e) tryCatch(highest_order(lm3), error = function(e) e) lm4 <- lm(y ~ x1 + x2*x3 + x4*x5, dat) summary(lm4) tryCatch(test_highest(lm4), error = function(e) e) tryCatch(highest_order(lm4), error = function(e) e)
dat <- data_test1 lm1 <- lm(y ~ x1 + x2 + cat1*x3, dat) lm2 <- lm(y ~ x1 + x2*x3 + x4, dat) test_highest(lm1) test_highest(lm2) highest_order(lm1) highest_order(lm2) # The followings will yield an error lm3 <- lm(y ~ x1 + x2 + x3, dat) summary(lm3) tryCatch(test_highest(lm3), error = function(e) e) tryCatch(highest_order(lm3), error = function(e) e) lm4 <- lm(y ~ x1 + x2*x3 + x4*x5, dat) summary(lm4) tryCatch(test_highest(lm4), error = function(e) e) tryCatch(highest_order(lm4), error = function(e) e)