For a worked example, the SCAS 95% interval for the proportion 1/29
is obtained with scaspci()
, using closed-form calculation
(Laud 2017, Appendix
A.4):
The Jeffreys interval can also incorporate prior information about p
for an approximate Bayesian confidence interval. For example, a pilot
study estimate of 1/10 would suggest a Beta(1,9) prior for p:
If more conservative coverage is required, a “continuity adjustment”
may be deployed with cc
, as follows, giving
continuity-adjusted SCAS or Jeffreys, and (if cc
is TRUE or
0.5), the Clopper-Pearson method. Note intermediate adjustments smaller
than 0.5 give a more refined adjustment(Laud 2017, Appendix S2).
Stratified and clustered datasets
For stratified datasets (e.g. data collected in different subgroups
or collated across different studies), use scoreci()
with
contrast = "p"
and stratified = TRUE
(again,
the skewness correction is recommended, but may be omitted).
By default, a “fixed effects” analysis is produced, i.e. one which
assumes a common true parameter p across strata. The default stratum
weighting uses the inverse variance of the score underlying the
SCAS/Wilson method, evaluated at the maximum likelihood estimate for the
pooled proportion (thus avoiding infinite weights for boundary cases).
Alternative weights are sample size (weighting = "MH"
) or
custom user-specified weights supplied via the wt
argument.
For example, population weighting would be applied via a vector
(e.g. wt = Ni
) containing the true population size (or true
population proportion Ni/N
) represented by each stratum.
(Note this is not divided by the sample size/proportion per stratum,
because the weighting is applied at the group level, not the case
level.)
Below is an illustration using data from the control arm of 9 trials
studying postoperative deep vein thrombosis (DVT). This may be a
somewhat unrealistic example, as the main focus of such studies is to
estimate the effect of a treatment, rather than to estimate the
underlying risk of an event, but used here to illustrate the usage of
scoreci()
for a stratified dataset:
data(compress, package = "ratesci")
strat_p <- scoreci(x1 = compress$event.control,
n1 = compress$n.control,
contrast = "p",
stratified = TRUE)
strat_p$estimates
#> lower est upper level p1hat p1mle
#> [1,] 0.181 0.212 0.245 0.95 0.212 0.212
The function also outputs p-values for a 2-sided hypothesis test
against a default null hypothesis p = 0.5, and one-sided tests against a
user-specified value of theta0:
strat_p$pval
#> chisq pval2sided theta0 scorenull pval_left pval_right
#> [1,] 208 4.05e-47 0.5 -14.4 2.02e-47 1
The Qtest
output object provides a heterogeneity test
and related quantities. In this instance, there appears to be
significant variability between studies in the underlying event rate for
the control group, which may reflect different characteristics of the
populations in each study leading to different underlying risk. (Note
this need not prevent the evaluation of stratified treatment
comparisons):
strat_p$Qtest
#> Q Q_df pval_het I2 tau2 Qc
#> 6.37e+01 8.00e+00 8.84e-11 8.74e+01 1.74e-02 0.00e+00
#> pval_qualhet
#> 9.96e-01
Per-stratum estimates are produced, including stratum weights and
contributions to the Q-statistic. Here the studies contributing most are
the 3rd and 8th study, with estimated proportions of 0.48 (95% CI: 0.34
to 0.62) and 0.04 (95% CI: 0.01 to 0.10) respectively:
strat_p$stratdata
#> x1j n1j p1hatj wt_fixed wtpct_fixed wtpct_rand theta_j lower_j upper_j
#> [1,] 37 103 0.3592 616.0 16.43 16.43 0.3597 0.2713 0.455
#> [2,] 5 10 0.5000 59.8 1.59 1.59 0.5000 0.2175 0.782
#> [3,] 23 48 0.4792 287.1 7.66 7.66 0.4793 0.3419 0.619
#> [4,] 16 110 0.1455 657.8 17.54 17.54 0.1465 0.0888 0.221
#> [5,] 7 32 0.2188 191.4 5.10 5.10 0.2216 0.1021 0.384
#> [6,] 8 25 0.3200 149.5 3.99 3.99 0.3224 0.1626 0.517
#> [7,] 17 126 0.1349 753.5 20.10 20.10 0.1359 0.0835 0.203
#> [8,] 4 92 0.0435 550.2 14.67 14.67 0.0451 0.0143 0.101
#> [9,] 16 81 0.1975 484.4 12.92 12.92 0.1988 0.1219 0.294
#> V_j Stheta_j Q_j
#> [1,] 0.00162 0.14695 13.30145
#> [2,] 0.01672 0.28773 4.95092
#> [3,] 0.00348 0.26689 20.44757
#> [4,] 0.00152 -0.06682 2.93717
#> [5,] 0.00523 0.00648 0.00803
#> [6,] 0.00669 0.10773 1.73503
#> [7,] 0.00133 -0.07735 4.50878
#> [8,] 0.00182 -0.16880 15.67615
#> [9,] 0.00206 -0.01474 0.10529
For a “random effects” analysis, use random = TRUE
.
(This may not give a meaningful estimate of stratum variation if the
number of strata is small.)
strat_p_rand <- scoreci(x1 = compress$event.control,
n1 = compress$n.control,
contrast = "p",
stratified = TRUE,
random = TRUE)
strat_p_rand$estimates
#> lower est upper level p1hat p1mle
#> [1,] 0.136 0.25 0.364 0.95 0.25 0.25
strat_p_rand$pval
#> chisq pval2sided theta0 scorenull pval_left pval_right
#> [1,] 25.2 0.00102 0.5 -5.02 0.000511 0.999
For clustered data, use clusterpci()
, which applies the
Wilson-based method proposed by (Saha, Miller, and Wang 2015), and a
skewness-corrected version:
# Data from Liang 1992
x <- c(rep(c(0, 1), c(36, 12)),
rep(c(0, 1, 2), c(15, 7, 1)),
rep(c(0, 1, 2, 3), c(5, 7, 3, 2)),
rep(c(0, 1, 2), c(3, 3, 1)),
c(0, 2, 3, 4, 6))
n <- c(rep(1, 48),
rep(2, 23),
rep(3, 17),
rep(4, 7),
rep(6, 5))
# Wilson-based interval as per Saha et al.
clusterpci(x, n, skew = FALSE)
#> lower est upper totx totn xihat icc
#> [1,] 0.2285 0.2956 0.3728 60 203 1.349 0.1855
# Skewness-corrected version
clusterpci(x, n, skew = TRUE)
#> lower est upper totx totn xihat icc
#> [1,] 0.2276 0.2958 0.3724 60 203 1.349 0.1855
All of the above (except clusterpci()
) can also handle
Poisson exposure-adjusted rates, using distrib = "poi"
,
where n
represents the exposure time.