Contingency Tables#

Classes for statistics based on contingency tables for categorical data.

MulticlassContingencyStats runs summary stats and \(\chi^2\) test stats for a contingency table with any number of IV & DV levels.

BooleanContingencyStats inherits from MulticlassContingencyStats, and is a special case for a 2x2 contingency table, also implementing Fisher’s & Boschloo’s exact tests.

class unistat.contingency.MulticlassContingencyStats(table_rows: Series, table_cols: Series, row_title: str | None = None, row_names: list[str] | None = None, col_title: str | None = None, col_names: list[str] | None = None)[source]#

Bases: _ContingencyStats

Compute contingency table stats for 3+ IV and/or DV levels.

Take 2 pandas Series representing row and column variables, compute a contingency table, and provides methods for summary stats, chi-squared tests of independence, and post hoc testing.

Parameters:
  • table_rows (pd.Series) – Series representing the row variable (typically predictor).

  • table_cols (pd.Series) – Series representing the column variable (typically outcome).

  • row_title (str, optional) – Title for the row index. Defaults to the name of table_rows.

  • row_names (list[str], optional) – Custom names for the row levels.

  • col_title (str, optional) – Title for the column index. Defaults to the name of table_cols.

  • col_names (list[str], optional) – Custom names for the column levels.

idx_series#

The row variable series.

Type:

pd.Series

col_series#

The column variable series.

Type:

pd.Series

row_title#

Title for rows.

Type:

str

row_names#

Names for row levels.

Type:

list[str]

col_title#

Title for columns.

Type:

str

col_names#

Names for column levels.

Type:

list[str]

matrix#

Crosstabulated frequency counts, with levels of idx_series and col_series as the index and columns, respectively. matrix does not include marginal row/column totals.

Type:

pd.DataFrame

exp_freq#

Crosstabulated expected frequency counts. Format mirrors matrix.

Type:

pd.DataFrame

get_table(as_pct=False, axis='rows')#

Crosstabulated frequency counts with marginal row/column totals. Format otherwise mirrors matrix.

Notes

This class assumes categorical data in the input series. For better structure, consider passing intervention and outcome series explicitly in future versions.

pairwise_post_hoc(alpha: float = 0.05, p_corr_method: PCorrectionMethod = 'holm')[source]#

Perform pairwise chi-square or Boschloo exact post hoc tests.

Appropriate if either rows or columns are binary.

By default, Holm-Bonferroni correction is used, but all correction methods supported by statsmodels.stats.multitest.multipletests() are supported.

Parameters:
  • alpha (float, default .05) – Significance level to be used for p-value correction.

  • p_corr_method (str, default 'holm') – p-Value correction method.

Returns:

Pairwise results condensed to a single DataFrame. Only returned when a pairwise test is performed.

Return type:

pd.DataFrame

Raises:

NotImplementedError – If both rows & columns have 3+ levels, which is technically possible, but currently, only the use of adjusted standardized residuals is supported for such cases.

Examples

>>> four_by_two.matrix
Outcome    dv0  dv1
Predictor
iv0        102   63
iv1         20    8
iv2          3    7
iv3          1   10
>>> four_by_two.pairwise_post_hoc()
               test  test_stat   p-value    p_corr
iv0          X^2(1)   2.572025  0.108768  0.272463
iv1          X^2(1)   2.095681  0.147716  0.272463
iv2  Boschloo exact   0.090821  0.095690  0.272463
iv3  Boschloo exact   0.000983  0.000722  0.003931

The binary axis is automatically detected, and tested against each level of the axis with 3+ levels:

>>> two_by_four.matrix
Predictor  iv0  iv1  iv2  iv3
Outcome
dv0        102   20    3    1
dv1         63    8    7   10
>>> two_by_four.pairwise_post_hoc()
               test  test_stat   p-value    p_corr
iv0          X^2(1)   2.572025  0.108768  0.272463
iv1          X^2(1)   2.095681  0.147716  0.272463
iv2  Boschloo exact   0.090821  0.095690  0.272463
iv3  Boschloo exact   0.000983  0.000722  0.003931
residuals_post_hoc(alpha: float = 0.05, p_corr_method: PCorrectionMethod = 'holm')[source]#

Perform post hoc tests using adjusted standardized residuals.

Preferred post hoc method if both rows & columns have 3+ levels (see Notes).

Uses adjusted standardized residuals (a/k/a adjusted Pearson residuals; cell-wise Z-scores), which are converted to p-values and corrected.

By default, Holm-Bonferroni correction is used, but all correction methods supported by statsmodels.stats.multitest.multipletests() are supported.

Parameters:
  • alpha (float, default .05) – Significance level to be used for p-value correction.

  • p_corr_method (str, default 'holm') – p-Value correction method.

Returns:

  • residuals (pd.DataFrame) – Adjusted standardized residuals in same format as self.matrix. Shows direction of effect in each cell.

  • p_values (pd.DataFrame) – Uncorrected p-values in same format as self.matrix.

  • p_corr (pd.DataFrame) – Corrected p-values in same format as self.matrix.

Raises:

NotImplementedError – If both rows & columns have 3+ levels, which is technically possible, but currently, use of adjusted standardized residuals is mandatory for such cases.

See also

pairwise_post_hoc

Pairwise tests if either variable is binary.

Notes

We prefer post hoc testing via adjusted standardized residuals only in cases where both rows & columns have 3+ levels.

If either rows or columns are binary, p-values are identical for both levels of the binary factor (see Examples). If comparing \(k \times 2\) rows & columns (where \(k \geq 3\)), there would only be \(k\) different p-values (duplicated twice). This method detects these cases, and only corrects for \(k\) p-values, to avoid inflating type-II error rate by naively corrected \(2k\) p- values.

However, this approach is effectively equivalent to running \(k\) pairwise \(\chi^2\) tests without Yates correction. Use of the pairwise_post_hoc() method is preferred, since this will automatically use Boschloo exact tests when expected frequencies are <5.

Examples

>>> four_by_four.matrix
Outcome    dv0  dv1  dv2  dv3
Predictor
iv0         70    0    0    0
iv1         53    4    3    0
iv2         28   17    4    2
iv3         14    7    3    9

Adjusted standardized residuals (a/k/a adjusted Pearson residuals, equivalent to Z-scores) indicate whether each cell was less (negative) or more (positive) frequent than expected by chance. p-Values indicate whether this difference is statistically significant.

>>> four_by_four.residuals_post_hoc().residuals
Outcome          dv0             dv1              dv2          dv3
Predictor
iv0         5.558156       -3.957284        -2.258185    -2.374231
iv1         2.440597       -1.737651         0.141516    -2.125546
iv2        -4.323559        4.913430         1.229105    -0.451581
iv3        -5.155365        1.505522         1.307525     6.260734
>>> four_by_four.residuals_post_hoc().p_values
Outcome             dv0           dv1       dv2           dv3
Predictor
iv0        2.726397e-08  7.580683e-05  0.023934  1.758554e-02
iv1        1.466299e-02  8.227228e-02  0.887462  3.354109e-02
iv2        1.535320e-05  8.949665e-07  0.219032  6.515711e-01
iv3        2.531376e-07  1.321900e-01  0.191034  3.831699e-10
>>> four_by_four.residuals_post_hoc().p_corr
Outcome             dv0       dv1       dv2           dv3
Predictor
iv0        4.089596e-07  0.000834  0.191473  1.582699e-01
iv1        1.466299e-01  0.493634  1.000000  2.347877e-01
iv2        1.842384e-04  0.000012  0.764137  1.000000e+00
iv3        3.543927e-06  0.660950  0.764137  6.130718e-09

When one axis is binary, compare the results pairwise_hoc_hoc() to residuals_post_hoc():

>>> four_by_two.matrix
Outcome    dv0  dv1
Predictor
iv0        102   63
iv1         20    8
iv2          3    7
iv3          1   10

pairwise_hoc_hoc() uses either \(\chi^2\) or Fisher exact tests, depending on expected frequencies for each comparison.

>>> four_by_two.pairwise_post_hoc()
               test  test_stat   p-value    p_corr
iv0          X^2(1)   2.572025  0.108768  0.272463
iv1          X^2(1)   2.095681  0.147716  0.272463
iv2  Boschloo exact   0.090821  0.095690  0.272463
iv3  Boschloo exact   0.000983  0.000722  0.003931

residuals_post_hoc() gives similar results to pairwise \(\chi^2\) tests (cf uncorrected p-values), but cannot use Boschloo (or Fisher) exact tests when expected frequencies are low (iv2 & iv3):

>>> four_by_two.residuals_post_hoc().p_values
Outcome         dv0       dv1
Predictor
iv0        0.108768  0.108768
iv1        0.147716  0.147716
iv2        0.057318  0.057318
iv3        0.000570  0.000570
>>> four_by_two.residuals_post_hoc().p_corr
Outcome         dv0       dv1
Predictor
iv0        0.217537  0.217537
iv1        0.217537  0.217537
iv2        0.171955  0.171955
iv3        0.002279  0.002279
post_hoc(alpha: float = 0.05, p_corr_method: PCorrectionMethod = 'holm')[source]#

Perform appropriate post hoc test, based on IV/DV levels.

Convenience method to choose type of post hoc test.

If either rows or columns are binary, results of pairwise \(\chi^2\) or Fisher exact tests (as appropriate for expected frequencies) are returned with uncorrected & corrected p-values.

If rows & columns both have 3+ levels, adjusted standardized residuals (a/k/a adjusted Pearson residuals; cell-wise Z-scores) are converted to p-values and corrected.

By default, Holm-Bonferroni correction is used, but all correction methods supported by statsmodels.stats.multitest.multipletests() are supported.

Parameters:
  • alpha (float, default .05) – Significance level to be used for p-value correction.

  • p_corr_method (str, default 'holm') – p-Value correction method.

Returns:

  • pd.DataFrame – Pairwise results condensed to a single DataFrame. Only returned when a pairwise test is performed.

  • residuals (pd.DataFrame) – Adjusted standardized residuals in same format as self.matrix. Shows direction of effect in each cell.

  • p_values (pd.DataFrame) – Uncorrected p-values in same format as self.matrix.

  • p_corr (pd.DataFrame) – Corrected p-values in same format as self.matrix.

See also

pairwise_post_hoc

When either rows or columns are binary.

residuals_post_hoc

Preferred when rows & columns have 3+ levels.

print_results()[source]#

Print contingency tables and Chi-squared results.

class unistat.contingency.BooleanContingencyStats(table_rows: Series, table_cols: Series, row_title: str | None = None, row_names: list[str] | None = None, col_title: str | None = None, col_names: list[str] | None = None)[source]#

Bases: _ContingencyStats

Perform contingency statistics on boolean (2x2) tables.

Extends MulticlassContingencyStats with methods specific to 2x2 tables, such as odds ratio and Fisher’s exact test.

Parameters:
  • table_rows (pd.Series) – Series representing the row variable (typically predictor), with dtype collapsible to Boolean (True/False, 1/0, 1.0/0.0).

  • table_cols (pd.Series) – Series representing the column variable (typically outcome), with dtype collapsible to Boolean (True/False, 1/0, 1.0/0.0).

  • row_title (str, optional) – Title for the row index.

  • row_names (list[str], optional) – Custom names for the row levels.

  • col_title (str, optional) – Title for the column index.

  • col_names (list[str], optional) – Custom names for the column levels.

Return type:

BooleanContingencyStats object

Warns:

ExpectedFrequencyWarning – If any cell-wise expected frequencies < 5

Notes

Assumes the contingency table is 2x2.

p-values are displayed for both the chi-square test of independence (ToI), and for an exact test like Fisher’s. Deciding which test to report can follow Cochran’s rule-of-thumb criteria 1 2, which includes (but is not limited to) the following as indication for use of an exact test (Fisher’s exact test in the original 1952 article) over chi-squared:

  • Any cell-wise expected frequency < 5
    • Actual rule is <20% must have expected frequency < 5, which means no cells can have low expected frequency in a 2x2 table.

  • N < 20

  • Cochran (1952) 1 recommends using Yates’ correction if N > 40 but any expected frequency < 500; unistat does not implement this by default.

By default, unistat never implements Yates’ correction factor. Hasselblad & Lokhnygina (2007) 3 found that in all cases, Yates-corrected chi-squared is inferior to Fisher’s exact test. Furthermore, they found that even Fisher’s exact test is too conservative, and that, depending on sample size, Fisher’s mid-p test or Barnard’s exact test offer better power while maintaining target Type I error control.

Alternative exact test(s) will be implemented in later releases; expect that at a minimum, this will include Boschloo’s exact test.

Lydersen et al. (2009) 4 compared multiple different exact tests, and noted the following:

  • Standard Fisher’s exact test is near-uniformly too conservative, though it always maintains Type I error rate

  • Fisher’s mid-p generally improves power, but occasionally violates Type I error rate.

  • Barnard’s exact test is an excellent performer, but is computationally intensive to a prohibitive degree (exponential time complexity).

  • Boschloo’s exact test (aka Fisher-Boschloo test) is an extension of Fisher’s exact, and was considered the gold standard by Lydersen et al.; it is universally more powerful than traditional Fisher’s exact & mid-p, and in trials did not violate target Type I error rate.

    • Further improved using the Berger-Boos correction, particularly for unbalanced designs (e.g. if survival occurs much more often than mortality) 4 5

    • Standard Berger-Boos correction factor is \(\gamma = 0.001\) 4

      • Not implemented by SciPy, though included in R Exact package

odds_ratio(kind: str = 'sample')[source]#

Compute the odds ratio.

Parameters:

kind (str, optional) – Type of odds ratio: ‘sample’ (default), ‘conditional’, or ‘unconditional’.

Returns:

Result object with statistic and confidence interval.

Return type:

scipy.stats._result_classes.OddsRatioResult

fisher_exact(alternative: Literal['two-sided', 'less', 'greater'] = 'two-sided')[source]#

Perform Fisher’s exact test.

Parameters:

alternative (str, default 'two-sided') – Alternative hypothesis: ‘two-sided’, ‘less’, or ‘greater’.

Returns:

p-value of the test.

Return type:

float

boschloo_exact(alternative: Literal['two-sided', 'less', 'greater'] = 'two-sided', n_sampling_points: int = 32)[source]#

Perform Boschloo’s exact test.

Parameters:
  • alternative (str, default: 'two-sided') – Alternative hypothesis: ‘two-sided’, ‘less’, or ‘greater’

  • n_sampling_points (int, default 32) – Number of sampling points used in the construction of the sampling method. See scipy.stats.boschloo_exact() documentation for further detail.

  • _bosch-exact (..)

Returns:

  • statistic (float) – Test statistic for Boschloo’s exact test, which is the lesser of the p-values given by 2 one-sided Fisher exact test.

  • pvalue (float) – Boschloo’s exact p-value.

Notes

Lydersen et al. (2009) 4 compared multiple different exact tests, and found Boschloo’s exact test to be universally more powerful than both traditional and mid-p Fisher’s exact tests. Boschloo’s exact test can be further improved using the Berger-Boos correction, particularly for unbalanced designs (e.g. if survival occurs much more often than mortality) 4 5

  • Standard Berger-Boos correction factor is \(\gamma = 0.001\) 4

    • Not implemented by SciPy, though included in R Exact package

    • May be implemented here in future update

print_results()[source]#

Print tables, odds ratio, Chi-squared, and Boschloo exact results.

Overrides the parent method to include 2x2-specific statistics.