Title: | Maximally Selected Rank Statistics |
---|---|
Description: | Maximally selected rank statistics with several p-value approximations. |
Authors: | Torsten Hothorn |
Maintainer: | Torsten Hothorn <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.7-25 |
Built: | 2025-01-21 03:23:54 UTC |
Source: | https://github.com/cran/maxstat |
Correlation matrix of maximally selected rank statistics.
corrmsrs(X, minprop=0.1, maxprop=0.9)
corrmsrs(X, minprop=0.1, maxprop=0.9)
X |
the vector, matrix or data.frame of prognostic factors under test. |
minprop |
at least |
maxprop |
not more than |
The correlations between all two-sample rank statistics induced by all
possible cutpoints in X
are computed.
The correlation matrix with dimension depending on ties in X
is
returned.
Hothorn, T. and Lausen, B. (2003). On the Exact Distribution of Maximally Selected Rank Statistics. Computational Statistics & Data Analysis, 43, 121–137.
Lausen, B., Hothorn, T., Bretz, F. and Schmacher, M. (2004). Assessment of Optimally Selected Prognostic Factors. Biometrical Journal, 46(3), 364–374.
set.seed(29) # matrix of hypothetical prognostic factors X <- matrix(rnorm(30), ncol=3) # this function a <- corrmsrs(X, minprop=0, maxprop=0.999) # coded by just typing the definition of the correlation testcorr <- function(X) { wh <- function(cut, x) which(x <= cut) index <- function(x) { ux <- unique(x) ux <- ux[ux < max(ux)] lapply(ux, wh, x = x) } a <- unlist(test <- apply(X, 2, index), recursive=FALSE) cnull <- rep(0, nrow(X)) mycorr <- diag(length(a)) for (i in 1:(length(a)-1)) { for (j in (i+1):length(a)) { cone <- cnull cone[a[[i]]] <- 1 ctwo <- cnull ctwo[a[[j]]] <- 1 sone <- sqrt(sum((cone - mean(cone))^2)) stwo <- sqrt(sum((ctwo - mean(ctwo))^2)) tcorr <- sum((cone - mean(cone))*(ctwo - mean(ctwo))) tcorr <- tcorr/(sone * stwo) mycorr[i,j] <- tcorr } } mycorr } tc <- testcorr(X) tc <- tc + t(tc) diag(tc) <- 1 stopifnot(all.equal(tc, a))
set.seed(29) # matrix of hypothetical prognostic factors X <- matrix(rnorm(30), ncol=3) # this function a <- corrmsrs(X, minprop=0, maxprop=0.999) # coded by just typing the definition of the correlation testcorr <- function(X) { wh <- function(cut, x) which(x <= cut) index <- function(x) { ux <- unique(x) ux <- ux[ux < max(ux)] lapply(ux, wh, x = x) } a <- unlist(test <- apply(X, 2, index), recursive=FALSE) cnull <- rep(0, nrow(X)) mycorr <- diag(length(a)) for (i in 1:(length(a)-1)) { for (j in (i+1):length(a)) { cone <- cnull cone[a[[i]]] <- 1 ctwo <- cnull ctwo[a[[j]]] <- 1 sone <- sqrt(sum((cone - mean(cone))^2)) stwo <- sqrt(sum((ctwo - mean(ctwo))^2)) tcorr <- sum((cone - mean(cone))*(ctwo - mean(ctwo))) tcorr <- tcorr/(sone * stwo) mycorr[i,j] <- tcorr } } mycorr } tc <- testcorr(X) tc <- tc + t(tc) diag(tc) <- 1 stopifnot(all.equal(tc, a))
A data frame with gene expression data from DLBCL (diffuse large B-cell lymphoma) patients.
data("DLBCL")
data("DLBCL")
DLCLid
DLBCL identifier
GEG
Gene Expression Group
time
survival time in month
cens
censoring: 0 cencored, 1 dead
IPI
International Prognostic Index
MGE
Mean Gene Expression
Except of MGE
, the data is published at
http://llmpp.nih.gov/lymphoma/data.shtml. MGE
is the mean of
the gene expression.
Ash A. Alizadeh et. al (2000), Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 403, 504–509
library("survival") set.seed(29) # compute the cutpoint and plot the empirical process mod <- maxstat.test(Surv(time, cens) ~ MGE, data=DLBCL, smethod="LogRank") print(mod) ## Not run: # postscript("statDLBCL.ps", horizontal=F, width=8, height=8) pdf("statDLBCL.pdf", width=8, height=8) ## End(Not run) par(mai=c(1.0196235, 1.0196235, 0.8196973, 0.4198450)) plot(mod, cex.lab=1.6, cex.axis=1.6, xlab="Mean gene expression",lwd=2) ## Not run: dev.off() ## End(Not run) # significance of the cutpoint # limiting distribution maxstat.test(Surv(time, cens) ~ MGE, data=DLBCL, smethod="LogRank", pmethod="Lau92", iscores=TRUE) # improved Bonferroni inequality, plot with significance bound maxstat.test(Surv(time, cens) ~ MGE, data=DLBCL, smethod="LogRank", pmethod="Lau94", iscores=TRUE) mod <- maxstat.test(Surv(time, cens) ~ MGE, data=DLBCL, smethod="LogRank", pmethod="Lau94", alpha=0.05) plot(mod, xlab="Mean gene expression") ## Not run: # postscript(file="RNewsStat.ps",horizontal=F, width=8, height=8) pdf("RNewsStat.pdf", width=8, height=8) ## End(Not run) par(mai=c(1.0196235, 1.0196235, 0.8196973, 0.4198450)) plot(mod, xlab="Mean gene expression", cex.lab=1.6, cex.axis=1.6) ## Not run: dev.off() ## End(Not run) # small sample solution Hothorn & Lausen maxstat.test(Surv(time, cens) ~ MGE, data=DLBCL, smethod="LogRank", pmethod="HL") # normal approximation maxstat.test(Surv(time, cens) ~ MGE, data=DLBCL, smethod="LogRank", pmethod="exactGauss", iscores=TRUE, abseps=0.01) # conditional Monte-Carlo maxstat.test(Surv(time, cens) ~ MGE, data=DLBCL, smethod="LogRank", pmethod="condMC", B = 9999) # survival analysis and plotting like in Alizadeh et al. (2000) splitGEG <- rep(1, nrow(DLBCL)) DLBCL <- cbind(DLBCL, splitGEG) DLBCL$splitGEG[DLBCL$GEG == "Activated B-like"] <- 0 plot(survfit(Surv(time, cens) ~ splitGEG, data=DLBCL), xlab="Survival time in month", ylab="Probability") text(90, 0.7, "GC B-like") text(60, 0.3, "Activated B-like") splitIPI <- rep(1, nrow(DLBCL)) DLBCL <- cbind(DLBCL, splitIPI) DLBCL$splitIPI[DLBCL$IPI <= 2] <- 0 plot(survfit(Surv(time, cens) ~ splitIPI, data=DLBCL), xlab="Survival time in month", ylab="Probability") text(90, 0.7, "Low clinical risk") text(60, 0.25, "High clinical risk") # survival analysis using the cutpoint splitMGE <- rep(1, nrow(DLBCL)) DLBCL <- cbind(DLBCL, splitMGE) DLBCL$splitMGE[DLBCL$MGE <= mod$estimate] <- 0 ## Not run: # postscript("survDLBCL.ps",horizontal=F, width=8, height=8) pdf("survDLBCL.pdf", width=8, height=8) ## End(Not run) par(mai=c(1.0196235, 1.0196235, 0.8196973, 0.4198450)) plot(survfit(Surv(time, cens) ~ splitMGE, data=DLBCL), xlab = "Survival time in month", ylab="Probability", cex.lab=1.6, cex.axis=1.6, lwd=2) text(90, 0.9, expression("Mean gene expression" > 0.186), cex=1.6) text(90, 0.45, expression("Mean gene expression" <= 0.186 ), cex=1.6) ## Not run: dev.off() ## End(Not run)
library("survival") set.seed(29) # compute the cutpoint and plot the empirical process mod <- maxstat.test(Surv(time, cens) ~ MGE, data=DLBCL, smethod="LogRank") print(mod) ## Not run: # postscript("statDLBCL.ps", horizontal=F, width=8, height=8) pdf("statDLBCL.pdf", width=8, height=8) ## End(Not run) par(mai=c(1.0196235, 1.0196235, 0.8196973, 0.4198450)) plot(mod, cex.lab=1.6, cex.axis=1.6, xlab="Mean gene expression",lwd=2) ## Not run: dev.off() ## End(Not run) # significance of the cutpoint # limiting distribution maxstat.test(Surv(time, cens) ~ MGE, data=DLBCL, smethod="LogRank", pmethod="Lau92", iscores=TRUE) # improved Bonferroni inequality, plot with significance bound maxstat.test(Surv(time, cens) ~ MGE, data=DLBCL, smethod="LogRank", pmethod="Lau94", iscores=TRUE) mod <- maxstat.test(Surv(time, cens) ~ MGE, data=DLBCL, smethod="LogRank", pmethod="Lau94", alpha=0.05) plot(mod, xlab="Mean gene expression") ## Not run: # postscript(file="RNewsStat.ps",horizontal=F, width=8, height=8) pdf("RNewsStat.pdf", width=8, height=8) ## End(Not run) par(mai=c(1.0196235, 1.0196235, 0.8196973, 0.4198450)) plot(mod, xlab="Mean gene expression", cex.lab=1.6, cex.axis=1.6) ## Not run: dev.off() ## End(Not run) # small sample solution Hothorn & Lausen maxstat.test(Surv(time, cens) ~ MGE, data=DLBCL, smethod="LogRank", pmethod="HL") # normal approximation maxstat.test(Surv(time, cens) ~ MGE, data=DLBCL, smethod="LogRank", pmethod="exactGauss", iscores=TRUE, abseps=0.01) # conditional Monte-Carlo maxstat.test(Surv(time, cens) ~ MGE, data=DLBCL, smethod="LogRank", pmethod="condMC", B = 9999) # survival analysis and plotting like in Alizadeh et al. (2000) splitGEG <- rep(1, nrow(DLBCL)) DLBCL <- cbind(DLBCL, splitGEG) DLBCL$splitGEG[DLBCL$GEG == "Activated B-like"] <- 0 plot(survfit(Surv(time, cens) ~ splitGEG, data=DLBCL), xlab="Survival time in month", ylab="Probability") text(90, 0.7, "GC B-like") text(60, 0.3, "Activated B-like") splitIPI <- rep(1, nrow(DLBCL)) DLBCL <- cbind(DLBCL, splitIPI) DLBCL$splitIPI[DLBCL$IPI <= 2] <- 0 plot(survfit(Surv(time, cens) ~ splitIPI, data=DLBCL), xlab="Survival time in month", ylab="Probability") text(90, 0.7, "Low clinical risk") text(60, 0.25, "High clinical risk") # survival analysis using the cutpoint splitMGE <- rep(1, nrow(DLBCL)) DLBCL <- cbind(DLBCL, splitMGE) DLBCL$splitMGE[DLBCL$MGE <= mod$estimate] <- 0 ## Not run: # postscript("survDLBCL.ps",horizontal=F, width=8, height=8) pdf("survDLBCL.pdf", width=8, height=8) ## End(Not run) par(mai=c(1.0196235, 1.0196235, 0.8196973, 0.4198450)) plot(survfit(Surv(time, cens) ~ splitMGE, data=DLBCL), xlab = "Survival time in month", ylab="Probability", cex.lab=1.6, cex.axis=1.6, lwd=2) text(90, 0.9, expression("Mean gene expression" > 0.186), cex=1.6) text(90, 0.45, expression("Mean gene expression" <= 0.186 ), cex=1.6) ## Not run: dev.off() ## End(Not run)
A data frame with the left ventricular ejection fraction of patients with malignant ventricular tachyarrhythmias including recurrence-free month and censoring.
data("hohnloser")
data("hohnloser")
EF
left ventricular ejection in percent
month
recurrence-free month
cens
censoring: 0 cencored, 1 not censored
The data used here is published in Table 1 of Lausen and Schumacher (1992).
The data was first published by Hohnloser et al. (1987), the data used here is published in Table 1 of Lausen and Schumacher (1992).
Hohnloser, S.H., Raeder, E.A., Podrid, P.J., Graboys, T.B. and Lown, B. (1987), Predictors of antiarrhythmic drug efficacy in patients with malignant ventricular tachyarrhythmias. American Heart Journal 114, 1–7
Lausen, B. and Schumacher, M. (1992), Maximally Selected Rank Statistics. Biometrics 48, 73–85
set.seed(29) library("survival") # limiting distribution maxstat.test(Surv(month, cens) ~ EF, data=hohnloser, smethod="LogRank", pmethod="Lau92") # with integer valued scores for comparison maxstat.test(Surv(month, cens) ~ EF, data=hohnloser, smethod="LogRank", pmethod="Lau92", iscores=TRUE) # improved Bonferroni inequality maxstat.test(Surv(month, cens) ~ EF, data=hohnloser, smethod="LogRank", pmethod="Lau94") maxstat.test(Surv(month, cens) ~ EF, data=hohnloser, smethod="LogRank", pmethod="Lau94", iscores=TRUE) # small sample solution by Hothorn & Lausen maxstat.test(Surv(month, cens) ~ EF, data=hohnloser, smethod="LogRank", pmethod="HL") # normal approximation maxstat.test(Surv(month, cens) ~ EF, data=hohnloser, smethod="LogRank", pmethod="exactGauss") maxstat.test(Surv(month, cens) ~ EF, data=hohnloser, smethod="LogRank", pmethod="exactGauss", iscores=TRUE) # conditional Monte-Carlo maxstat.test(Surv(month, cens) ~ EF, data=hohnloser, smethod="LogRank", pmethod="condMC", B = 9999)
set.seed(29) library("survival") # limiting distribution maxstat.test(Surv(month, cens) ~ EF, data=hohnloser, smethod="LogRank", pmethod="Lau92") # with integer valued scores for comparison maxstat.test(Surv(month, cens) ~ EF, data=hohnloser, smethod="LogRank", pmethod="Lau92", iscores=TRUE) # improved Bonferroni inequality maxstat.test(Surv(month, cens) ~ EF, data=hohnloser, smethod="LogRank", pmethod="Lau94") maxstat.test(Surv(month, cens) ~ EF, data=hohnloser, smethod="LogRank", pmethod="Lau94", iscores=TRUE) # small sample solution by Hothorn & Lausen maxstat.test(Surv(month, cens) ~ EF, data=hohnloser, smethod="LogRank", pmethod="HL") # normal approximation maxstat.test(Surv(month, cens) ~ EF, data=hohnloser, smethod="LogRank", pmethod="exactGauss") maxstat.test(Surv(month, cens) ~ EF, data=hohnloser, smethod="LogRank", pmethod="exactGauss", iscores=TRUE) # conditional Monte-Carlo maxstat.test(Surv(month, cens) ~ EF, data=hohnloser, smethod="LogRank", pmethod="condMC", B = 9999)
Performs a test of independence of a response and one or more covariables using maximally selected rank statistics.
## S3 method for class 'data.frame' maxstat.test(formula, data, subset, na.action, ...) maxstat(y, x=NULL, weights = NULL, smethod=c("Wilcoxon", "Median", "NormalQuantil","LogRank", "Data"), pmethod=c("none", "Lau92", "Lau94", "exactGauss", "HL", "condMC", "min"), iscores=(pmethod=="HL"), minprop = 0.1, maxprop=0.9, alpha = NULL, keepxy=TRUE, ...)
## S3 method for class 'data.frame' maxstat.test(formula, data, subset, na.action, ...) maxstat(y, x=NULL, weights = NULL, smethod=c("Wilcoxon", "Median", "NormalQuantil","LogRank", "Data"), pmethod=c("none", "Lau92", "Lau94", "exactGauss", "HL", "condMC", "min"), iscores=(pmethod=="HL"), minprop = 0.1, maxprop=0.9, alpha = NULL, keepxy=TRUE, ...)
y |
numeric vector of data values, dependent variable. |
x |
numeric vector of data values, independent variable. |
weights |
an optional numeric vector of non-negative weights, summing to the number of observations. |
smethod |
kind of statistic to be computed, i.e. defines the scores to be used for computing the statistic. |
pmethod |
kind of p-value approximation to be used. |
iscores |
logical: should the scores be mapped into integers
|
minprop |
at least |
maxprop |
not more than |
alpha |
significance niveau, the appropriate quantile is computed if
|
keepxy |
logical: return |
formula |
a formula describing the model to be tested of the form
|
data |
an data frame containing the variables in the
model formula. |
subset |
an optional vector specifying a subset of observations to be used. |
na.action |
a function which indicates what should happen when
the data contain |
... |
additional parameters to be passed to
|
The assessment of the predictive power of a variable x
for a
dependent variable y
can be determined by a maximally selected rank
statistic.
smethod
determines the kind of statistic to be used.
Wilcoxon
and Median
denote maximally selected
Wilcoxon and Median statistics. NormalQuantile
and
LogRank
denote v.d. Waerden and log-rank
scores.
pmethod
specifies which kind of approximation of the p-value should
be used. Lau92
is the limiting distribution by a Brownian bridge
(see Lausen and Schumacher, 1992, and pLausen92
),
Lau94
the approximation based on an improved Bonferroni
inequality (see Lausen, Sauerbrei and Schumacher, 1994, and pLausen94
).
exactGauss
returns the exact p-value for a maximally selected Gauss
statistic, see Hothorn and Lausen (2003).
HL
is a small sample approximation based on the Streitberg-R\"ohmel
algorithm (see pperm
) introduced by Hothorn and
Lausen (2003). This requires integer
valued scores. For v. d. Waerden and Log-rank scores we try to find
integer valued scores having the same shape. This results in slightly
different scores (and a different test), the procedure is described in
Hothorn (2001) and Hothorn and Lausen (2003).
All the approximations are known to be conservative, so min
gives
the minimum p-value of all procedures.
condMC
simulates the distribution via conditional Monte-Carlo.
For survival problems, i.e. using a maximally selected log-rank statistic,
the interface is similar to survfit
. The depended
variable is a survival object Surv(time, event)
. The argument
event
may be a numeric vector of 0
(alive) and 1
(dead) or a vector of logicals with TRUE
indicating death.
If more than one covariable is specified in the right hand side of
formula
(or if x
is a matrix or data frame), the variable with
smallest p-value is selected and the p-value for the global test problem of
independence of y
and every variable on the right hand side is
returned (see Lausen et al., 2002).
An object of class maxtest
or mmaxtest
(if more than one
covariable was specified) containing the following components
is returned:
statistic |
the value of the test statistic. |
p.value |
the p-value for the test. |
smethod |
the type of test applied. |
pmethod |
the type of p-value approximation applied. |
estimate |
the estimated cutpoint (of |
maxstats |
a list of |
whichmin |
an integer specifying the element of |
p.value |
the p-value of the global test. |
univp.values |
the p-values for each of the variables under test. |
cm |
the correlation matrix the p-value is based on. |
plot.maxtest
and print.maxtest
can be used for
plotting and printing. If keepxy = TRUE
, there are elements y
and x
giving the response and independent variable.
Hothorn, T. and Lausen, B. (2003). On the Exact Distribution of Maximally Selected Rank Statistics. Computational Statistics & Data Analysis, 43, 121–137.
Lausen, B. and Schumacher, M. (1992). Maximally Selected Rank Statistics. Biometrics, 48, 73–85
Lausen, B., Sauerbrei, W. and Schumacher, M. (1994). Classification and Regression Trees (CART) used for the exploration of prognostic factors measured on different scales. in: P. Dirschedl and R. Ostermann (Eds), Computational Statistics, Heidelberg, Physica-Verlag, 483–496
Hothorn, T. (2001). On Exact Rank Tests in R. R News, 1, 11–12
Lausen, B., Hothorn, T., Bretz, F. and Schmacher, M. (2004). Assessment of Optimally Selected Prognostic Factors. Biometrical Journal, 46(3), 364–374.
set.seed(29) x <- sort(runif(20)) y <- c(rnorm(10), rnorm(10, 2)) mydata <- data.frame(cbind(x,y)) mod <- maxstat.test(y ~ x, data=mydata, smethod="Wilcoxon", pmethod="HL", minprop=0.25, maxprop=0.75, alpha=0.05) print(mod) plot(mod) # adjusted for more than one prognostic factor. library("survival") mstat <- maxstat.test(Surv(time, cens) ~ IPI + MGE, data=DLBCL, smethod="LogRank", pmethod="exactGauss", abseps=0.01) plot(mstat) ### sphase set.seed(29) data("sphase", package = "TH.data") maxstat.test(Surv(RFS, event) ~ SPF, data=sphase, smethod="LogRank", pmethod="Lau94") maxstat.test(Surv(RFS, event) ~ SPF, data=sphase, smethod="LogRank", pmethod="Lau94", iscores=TRUE) maxstat.test(Surv(RFS, event) ~ SPF, data=sphase, smethod="LogRank", pmethod="HL") maxstat.test(Surv(RFS, event) ~ SPF, data=sphase, smethod="LogRank", pmethod="condMC", B = 9999) plot(maxstat.test(Surv(RFS, event) ~ SPF, data=sphase, smethod="LogRank"))
set.seed(29) x <- sort(runif(20)) y <- c(rnorm(10), rnorm(10, 2)) mydata <- data.frame(cbind(x,y)) mod <- maxstat.test(y ~ x, data=mydata, smethod="Wilcoxon", pmethod="HL", minprop=0.25, maxprop=0.75, alpha=0.05) print(mod) plot(mod) # adjusted for more than one prognostic factor. library("survival") mstat <- maxstat.test(Surv(time, cens) ~ IPI + MGE, data=DLBCL, smethod="LogRank", pmethod="exactGauss", abseps=0.01) plot(mstat) ### sphase set.seed(29) data("sphase", package = "TH.data") maxstat.test(Surv(RFS, event) ~ SPF, data=sphase, smethod="LogRank", pmethod="Lau94") maxstat.test(Surv(RFS, event) ~ SPF, data=sphase, smethod="LogRank", pmethod="Lau94", iscores=TRUE) maxstat.test(Surv(RFS, event) ~ SPF, data=sphase, smethod="LogRank", pmethod="HL") maxstat.test(Surv(RFS, event) ~ SPF, data=sphase, smethod="LogRank", pmethod="condMC", B = 9999) plot(maxstat.test(Surv(RFS, event) ~ SPF, data=sphase, smethod="LogRank"))
Computes the exact probability that a maximally selected gauss statistic is
greater or equal to b
.
pexactgauss(b, x, minprop=0.1, maxprop=0.9, ...) qexactgauss(p, x, minprop=0.1, maxprop=0.9, ...)
pexactgauss(b, x, minprop=0.1, maxprop=0.9, ...) qexactgauss(p, x, minprop=0.1, maxprop=0.9, ...)
b |
quantile. |
p |
probability. |
x |
the prognostic factor(s) under test. |
minprop |
at least |
maxprop |
not more than |
... |
additional parameters to be passed to
|
This is the exact distribution of a maximally selected Gauss statistic and
the asymptotic distribution for maximally selected rank statistics.
Normal probabilities are derived from the procedures by Genz/Bretz (see
pmvnorm
for details).
The probability that, under the hypothesis of independence, a maximally
selected gauss statistic greater equal b
is observed.
Genz, A. (1992). Numerical computation of multivariate normal probabilities. Journal of Computational and Graphical Statistics, 1, 141–150
Genz, A. (1993). Comparison of methods for the computation of multivariate normal probabilities. Computing Science and Statistics, 25, 400–405
set.seed(29) x <- rnorm(20) pexact <- pexactgauss(2.5, x, abseps=0.01)
set.seed(29) x <- rnorm(20) pexact <- pexactgauss(2.5, x, abseps=0.01)
Approximates the probability that a maximally selected rank statistic is
greater or equal to b
.
pLausen92(b, minprop=0.1, maxprop=0.9) qLausen92(p, minprop=0.1, maxprop=0.9)
pLausen92(b, minprop=0.1, maxprop=0.9) qLausen92(p, minprop=0.1, maxprop=0.9)
b |
quantile. |
p |
probability. |
minprop |
at least |
maxprop |
not more than |
Large sample approximation by Miller and Siegmund (1982) based on a Brownian bridge, cf. Lausen and Schumacher (1992).
The probability that, under the hypothesis of independence, a maximally
selected statistic greater equal b
is observed.
Miller, R. and Siegmund, D. (1982), Maximally Selected Chi Square Statistics. Biometrics, 38, 1011–1016
Lausen, B. and Schumacher, M. (1992), Maximally Selected Rank Statistics. Biometrics, 48, 73–85
# Compute quantiles. Should be equal to Table 2 in Lausen and Schumacher load(file.path(system.file(package = "maxstat"), "results", "LausenTab2.rda")) a <- rev(c(0.01, 0.025, 0.05, 0.1)) prop <- rbind(c(0.25, 0.75), c(0.4, 0.6), c(0.1, 0.9), c(0.4, 0.9)) Quant <- matrix(rep(0, length(a)*nrow(prop)), nrow=length(a)) for (i in 1:length(a)) { for (j in 1:nrow(prop)) { Quant[i,j] <- qLausen92(a[i], minprop=prop[j,1], maxprop=prop[j,2]) } } Quant <- round(Quant, 3) rownames(Quant) <- a colnames(Quant) <- c("A2575", "A46", "A19", "A49") Quant <- as.data.frame(Quant) rownames(LausenTab2) <- a Quant LausenTab2 if(!all.equal(LausenTab2, Quant)) stop("error checking pLausen92")
# Compute quantiles. Should be equal to Table 2 in Lausen and Schumacher load(file.path(system.file(package = "maxstat"), "results", "LausenTab2.rda")) a <- rev(c(0.01, 0.025, 0.05, 0.1)) prop <- rbind(c(0.25, 0.75), c(0.4, 0.6), c(0.1, 0.9), c(0.4, 0.9)) Quant <- matrix(rep(0, length(a)*nrow(prop)), nrow=length(a)) for (i in 1:length(a)) { for (j in 1:nrow(prop)) { Quant[i,j] <- qLausen92(a[i], minprop=prop[j,1], maxprop=prop[j,2]) } } Quant <- round(Quant, 3) rownames(Quant) <- a colnames(Quant) <- c("A2575", "A46", "A19", "A49") Quant <- as.data.frame(Quant) rownames(LausenTab2) <- a Quant LausenTab2 if(!all.equal(LausenTab2, Quant)) stop("error checking pLausen92")
Approximates the probability that a maximally selected rank statistic is
greater or equal to b
.
pLausen94(b, N, minprop=0.1, maxprop=0.9, m=NULL) qLausen94(p, N, minprop=0.1, maxprop=0.9, m=NULL)
pLausen94(b, N, minprop=0.1, maxprop=0.9, m=NULL) qLausen94(p, N, minprop=0.1, maxprop=0.9, m=NULL)
b |
quantile. |
p |
probability. |
N |
number of observations. |
minprop |
at least |
maxprop |
not more than |
m |
a integer vector containing the sample sizes in the first groups
for each cutpoint considered. If |
Approximation based on an improved Bonferroni inequality.
The probability that, under the hypothesis of independence, a maximally
selected statistic greater equal b
is observed.
Worsley, K.J. (1982), An Improved Bonferroni Inequality and Applications. Biometrika, 69, 297–302
Lausen, B. (1990), Maximal Selektierte Rangstatistiken. Dissertation. Universit\"at Dortmund
Lausen, B., Sauerbrei, W. & Schumacher, M. (1994). Classification and Regression Trees (CART) used for the exploration of prognostic factors measured on different scales. in: P. Dirschedl & R. Ostermann (Eds), Computational Statistics, Heidelberg, Physica-Verlag, 483–496
p <- pLausen94(2.5, 20, 0.25, 0.75) # Lausen 94, page 489 if (round(p, 3) != 0.073) stop("error checking pLausen94") # the same p2 <- pLausen94(2.5, 200, 0.25, 0.75, m=seq(from=50, to=150, by=10)) stopifnot(all.equal(round(p,3), round(p2,3)))
p <- pLausen94(2.5, 20, 0.25, 0.75) # Lausen 94, page 489 if (round(p, 3) != 0.073) stop("error checking pLausen94") # the same p2 <- pLausen94(2.5, 200, 0.25, 0.75, m=seq(from=50, to=150, by=10)) stopifnot(all.equal(round(p,3), round(p2,3)))
Printing and ploting method of objects of class maxtest
## S3 method for class 'maxtest' plot(x, xlab=NULL, ylab=NULL, ...) ## S3 method for class 'maxtest' print(x, digits = getOption("digits"), ...) ## S3 method for class 'mmaxtest' plot(x, xlab=NULL, ylab=NULL, nrow=2, ...) ## S3 method for class 'mmaxtest' print(x, digits = getOption("digits"), ...)
## S3 method for class 'maxtest' plot(x, xlab=NULL, ylab=NULL, ...) ## S3 method for class 'maxtest' print(x, digits = getOption("digits"), ...) ## S3 method for class 'mmaxtest' plot(x, xlab=NULL, ylab=NULL, nrow=2, ...) ## S3 method for class 'mmaxtest' print(x, digits = getOption("digits"), ...)
x |
an object of class |
xlab |
label of x-axis. |
ylab |
label of y-axis. |
nrow |
number of rows for multiple plots at one page. |
digits |
number of significant digits to be printed. |
... |
additional arguments to |
The standardized statistics are plotted. If alpha
was given in
maxstat.test
the appropriate significance bound is plotted as
a red line. print.maxtest
is just a wrapper to print.htest
.
set.seed(29) x <- sort(runif(20)) y <- rbinom(20, 1, 0.5) mydata <- data.frame(c(x,y)) mod <- maxstat.test(y ~ x, data=mydata, smethod="Median", pmethod="HL", alpha=0.05) print(mod) plot(mod)
set.seed(29) x <- sort(runif(20)) y <- rbinom(20, 1, 0.5) mydata <- data.frame(c(x,y)) mod <- maxstat.test(y ~ x, data=mydata, smethod="Median", pmethod="HL", alpha=0.05) print(mod) plot(mod)
Approximates the probability that a maximally selected rank
statistic is greater or equal to b
.
pmaxstat(b, scores, msample, quant=FALSE) qmaxstat(p, scores, msample)
pmaxstat(b, scores, msample, quant=FALSE) qmaxstat(p, scores, msample)
b |
quantile. |
p |
propability. |
scores |
integer valued scores assigned to the observations. |
msample |
all possible splitpoints. |
quant |
logical. Returns the results of SR instead of P-values. Only
to be used in |
Small sample approximation by Hothorn and Lausen (2003).
An upper limit for the probability that, under the hypothesis of
independence, a maximally selected statistic greater equal b
is observed. qmaxstat
needs optimization.
Hothorn, T. and Lausen, B. (2003). On the Exact Distribution of Maximally Selected Rank Statistics. Computational Statistics & Data Analysis, 43, 121–137.
pmaxstat(2.5, 1:20, 5:15)
pmaxstat(2.5, 1:20, 5:15)
Counts of tree pipits at 86 raster points in oak forests.
data("treepipit")
data("treepipit")
A data frame with 86 observations on the following 2 variables.
number of tree pipits counted.
canopy overstorey in percent.
The influence of canopy overstorey on the number of bird individuals is of special interest.
Data collected and kindly provided by Joerg Mueller <[email protected]>.
Mueller J. and Hothorn T. (2004), Maximally Selected Two-sample Statistics as a New Tool for the Identification and Assessment of Habitat Factors with an Application to Breeding-bird Communities in Oak Forests, European Journal of Forest Research, 123(3), 219–228.
mod <- maxstat.test(counts ~ coverstorey, data = treepipit, smethod = "Data", pmethod = "HL", minprop = 0.2, maxprop = 0.8) print(mod) plot(mod)
mod <- maxstat.test(counts ~ coverstorey, data = treepipit, smethod = "Data", pmethod = "HL", minprop = 0.2, maxprop = 0.8) print(mod) plot(mod)