1: Single rate

library(ratesci)

Introduction to `rateci()` and related functions for estimation of a single binomial or Poisson rate

To calculate a confidence interval for a single binomial proportion (p̂ = x/n), the Skewness-Corrected Asymptotic Score (SCAS) method is recommended, as one that succeeds, on average, at containing the true proportion p with the appropriate nominal probability (e.g. 95%), and has evenly distributed tail probabilities (Laud 2017, Appendix S3.5). It is a modified version of the Wilson Score method. The plot below illustrates the interval non-coverage probability (i.e. 1 minus the actual probability that the interval contains the true value of p) achieved by SCAS compared to some other popular methods, using moving average smoothing:

For a worked example, the SCAS 95% interval for the proportion 1/29 is obtained with scaspci(), using closed-form calculation (Laud 2017, Appendix A.4):

scaspci(x = 1, n = 29)
#>        lower    est upper
#> [1,] 0.00199 0.0398 0.155

rateci() also provides two other methods (Jeffreys and mid-p) with similar coverage properties (Laud 2018):

rateci(x = 1, n = 29)
#> $scas
#>        lower    est upper
#> [1,] 0.00199 0.0398 0.155
#> 
#> $jeff
#>        lower    est upper
#> [1,] 0.00375 0.0403  0.15
#> 
#> $midp
#>        lower    est upper
#> [1,] 0.00172 0.0391 0.159

The Jeffreys interval can also incorporate prior information about p for an approximate Bayesian confidence interval. For example, a pilot study estimate of 1/10 would suggest a Beta(1,9) prior for p:

jeffreysci(x = 1, n = 29, ai = 1, bi = 9)
#>        lower    est upper
#> [1,] 0.00644 0.0438 0.138

If more conservative coverage is required, a “continuity adjustment” may be deployed with cc, as follows, giving continuity-adjusted SCAS or Jeffreys, and (if cc is TRUE or 0.5), the Clopper-Pearson method. Note intermediate adjustments smaller than 0.5 give a more refined adjustment(Laud 2017, Appendix S2).

rateci(x = 1, n = 29, cc = TRUE)
#> $scas_cc
#>         lower    est upper
#> [1,] 6.19e-06 0.0398 0.182
#> 
#> $jeff_cc
#>         lower    est upper
#> [1,] 0.000873 0.0403 0.178
#> 
#> $cp
#>         lower    est upper
#> [1,] 0.000873 0.0391 0.178

Stratified and clustered datasets

For stratified datasets (e.g. data collected in different subgroups or collated across different studies), use scoreci() with contrast = "p" and stratified = TRUE (again, the skewness correction is recommended, but may be omitted).

By default, a “fixed effects” analysis is produced, i.e. one which assumes a common true parameter p across strata. The default stratum weighting uses the inverse variance of the score underlying the SCAS/Wilson method, evaluated at the maximum likelihood estimate for the pooled proportion (thus avoiding infinite weights for boundary cases). Alternative weights are sample size (weighting = "MH") or custom user-specified weights supplied via the wt argument. For example, population weighting would be applied via a vector (e.g. wt = Ni) containing the true population size (or true population proportion Ni/N) represented by each stratum. (Note this is not divided by the sample size/proportion per stratum, because the weighting is applied at the group level, not the case level.)

Below is an illustration using data from the control arm of 9 trials studying postoperative deep vein thrombosis (DVT). This may be a somewhat unrealistic example, as the main focus of such studies is to estimate the effect of a treatment, rather than to estimate the underlying risk of an event, but used here to illustrate the usage of scoreci() for a stratified dataset:

data(compress, package = "ratesci")
strat_p <- scoreci(x1 = compress$event.control, 
                   n1 = compress$n.control, 
                   contrast = "p", 
                   stratified = TRUE)
strat_p$estimates
#>      lower   est upper level p1hat p1mle
#> [1,] 0.181 0.212 0.245  0.95 0.212 0.212

The function also outputs p-values for a 2-sided hypothesis test against a default null hypothesis p = 0.5, and one-sided tests against a user-specified value of theta0:

strat_p$pval
#>      chisq pval2sided theta0 scorenull pval_left pval_right
#> [1,]   208   4.05e-47    0.5     -14.4  2.02e-47          1

The Qtest output object provides a heterogeneity test and related quantities. In this instance, there appears to be significant variability between studies in the underlying event rate for the control group, which may reflect different characteristics of the populations in each study leading to different underlying risk. (Note this need not prevent the evaluation of stratified treatment comparisons):

strat_p$Qtest
#>            Q         Q_df     pval_het           I2         tau2           Qc 
#>     6.37e+01     8.00e+00     8.84e-11     8.74e+01     1.74e-02     0.00e+00 
#> pval_qualhet 
#>     9.96e-01

Per-stratum estimates are produced, including stratum weights and contributions to the Q-statistic. Here the studies contributing most are the 3rd and 8th study, with estimated proportions of 0.48 (95% CI: 0.34 to 0.62) and 0.04 (95% CI: 0.01 to 0.10) respectively:

strat_p$stratdata
#>       x1j n1j p1hatj wt_fixed wtpct_fixed wtpct_rand theta_j lower_j upper_j
#>  [1,]  37 103 0.3592    616.0       16.43      16.43  0.3597  0.2713   0.455
#>  [2,]   5  10 0.5000     59.8        1.59       1.59  0.5000  0.2175   0.782
#>  [3,]  23  48 0.4792    287.1        7.66       7.66  0.4793  0.3419   0.619
#>  [4,]  16 110 0.1455    657.8       17.54      17.54  0.1465  0.0888   0.221
#>  [5,]   7  32 0.2188    191.4        5.10       5.10  0.2216  0.1021   0.384
#>  [6,]   8  25 0.3200    149.5        3.99       3.99  0.3224  0.1626   0.517
#>  [7,]  17 126 0.1349    753.5       20.10      20.10  0.1359  0.0835   0.203
#>  [8,]   4  92 0.0435    550.2       14.67      14.67  0.0451  0.0143   0.101
#>  [9,]  16  81 0.1975    484.4       12.92      12.92  0.1988  0.1219   0.294
#>           V_j Stheta_j      Q_j
#>  [1,] 0.00162  0.14695 13.30145
#>  [2,] 0.01672  0.28773  4.95092
#>  [3,] 0.00348  0.26689 20.44757
#>  [4,] 0.00152 -0.06682  2.93717
#>  [5,] 0.00523  0.00648  0.00803
#>  [6,] 0.00669  0.10773  1.73503
#>  [7,] 0.00133 -0.07735  4.50878
#>  [8,] 0.00182 -0.16880 15.67615
#>  [9,] 0.00206 -0.01474  0.10529

For a “random effects” analysis, use random = TRUE. (This may not give a meaningful estimate of stratum variation if the number of strata is small.)

strat_p_rand <- scoreci(x1 = compress$event.control, 
                        n1 = compress$n.control, 
                        contrast = "p", 
                        stratified = TRUE, 
                        random = TRUE)
strat_p_rand$estimates
#>      lower  est upper level p1hat p1mle
#> [1,] 0.136 0.25 0.364  0.95  0.25  0.25
strat_p_rand$pval
#>      chisq pval2sided theta0 scorenull pval_left pval_right
#> [1,]  25.2    0.00102    0.5     -5.02  0.000511      0.999

For clustered data, use clusterpci(), which applies the Wilson-based method proposed by (Saha, Miller, and Wang 2015), and a skewness-corrected version:

  # Data from Liang 1992
  x <- c(rep(c(0, 1), c(36, 12)),
          rep(c(0, 1, 2), c(15, 7, 1)),
          rep(c(0, 1, 2, 3), c(5, 7, 3, 2)),
          rep(c(0, 1, 2), c(3, 3, 1)),
          c(0, 2, 3, 4, 6))
  n <- c(rep(1, 48),
          rep(2, 23),
          rep(3, 17),
          rep(4, 7),
          rep(6, 5))
  # Wilson-based interval as per Saha et al.
  clusterpci(x, n, skew = FALSE)
#>       lower    est  upper totx totn xihat    icc
#> [1,] 0.2285 0.2956 0.3728   60  203 1.349 0.1855
  # Skewness-corrected version
  clusterpci(x, n, skew = TRUE)
#>       lower    est  upper totx totn xihat    icc
#> [1,] 0.2276 0.2958 0.3724   60  203 1.349 0.1855

All of the above (except clusterpci()) can also handle Poisson exposure-adjusted rates, using distrib = "poi", where n represents the exposure time.

Technical details

The SCAS method is an extension of the Wilson Score method, using the same score function S(p) = x/n − p, where x is the observed number of events from n trials, and p is the true proportion. The variance of S(p) is V = p(1 − p)/n, and the 3rd central moment is μ₃ = p(1 − p)(1 − 2p)/n². The 100(1 − α)% confidence interval is found as the two solutions of the following equation, where z is the 1 − α/2 percentile of the standard normal distribution:

S(p)/V^1/2 − (z² − 1)μ₃/6V^3/2 = ±z

For unstratified datasets, this has a closed-form solution. The formula is extended in (Laud 2017) to incorporate stratification using inverse variance weights, w_i = 1/V, or other weighting schemes as required, with the solution being found by iteration over p ∈ [0, 1].

The Jeffreys interval is obtained as α/2 and 1 − α/2 quantiles of the Beta(x + 0.5, n − x + 0.5) distribution, with boundary modifications when x = 0 or x = n.

A Clopper-Pearson interval may also be obtained as quantiles of a beta distribution, using Beta(x, n − x + 1) for the lower confidence limit, and Beta(x + 1, n − x) for the upper limit.

Laud, Peter J. 2017. “Equal-Tailed Confidence Intervals for Comparison of Rates.” Pharmaceutical Statistics 16 (5): 334–48. https://doi.org/10.1002/pst.1813.

———. 2018. “Equal-Tailed Confidence Intervals for Comparison of Rates.” Pharmaceutical Statistics 17 (3): 290–93. https://doi.org/10.1002/pst.1855.

Saha, Krishna K., Daniel Miller, and Suojin Wang. 2015. “A Comparison of Some Approximate Confidence Intervals for a Single Proportion for Clustered Binary Outcome Data.” The International Journal of Biostatistics 12 (2). https://doi.org/10.1515/ijb-2015-0024.

- Introduction to rateci() and related functions for estimation of a single binomial or Poisson rate
- Technical details

1: Single rate

Introduction to rateci() and related functions for estimation of a single binomial or Poisson rate

Stratified and clustered datasets

Technical details

Introduction to `rateci()` and related functions for estimation of a single binomial or Poisson rate