Contingency Tables#

Classes for statistics based on contingency tables for categorical data.

MulticlassContingencyStats runs summary stats and \(\chi^2\) test stats for a contingency table with any number of IV & DV levels.

BooleanContingencyStats inherits from MulticlassContingencyStats, and is a special case for a 2x2 contingency table, also implementing Fisher’s & Boschloo’s exact tests.

class unistat.contingency.MulticlassContingencyStats(table_rows: Series, table_cols: Series, row_title: str | None = None, row_names: list[str] | None = None, col_title: str | None = None, col_names: list[str] | None = None)[source]#

Bases: _ContingencyStats

Compute contingency table stats for 3+ IV and/or DV levels.

Take 2 pandas Series representing row and column variables, compute a contingency table, and provides methods for summary stats, chi-squared tests of independence, and post hoc testing.

Parameters:

table_rows (pd.Series) – Series representing the row variable (typically predictor).
table_cols (pd.Series) – Series representing the column variable (typically outcome).
row_title (str, optional) – Title for the row index. Defaults to the name of table_rows.
row_names (list[str], optional) – Custom names for the row levels.
col_title (str, optional) – Title for the column index. Defaults to the name of table_cols.
col_names (list[str], optional) – Custom names for the column levels.

idx_series#

The row variable series.

Type:: pd.Series

col_series#

The column variable series.

Type:: pd.Series

row_title#

Title for rows.

Type:: str

row_names#

Names for row levels.

Type:: list[str]

col_title#

Title for columns.

Type:: str

col_names#

Names for column levels.

Type:: list[str]

matrix#

Crosstabulated frequency counts, with levels of idx_series and col_series as the index and columns, respectively. matrix does not include marginal row/column totals.

Type:: pd.DataFrame

exp_freq#

Crosstabulated expected frequency counts. Format mirrors matrix.

Type:: pd.DataFrame

get_table(as_pct=False, axis='rows')#: Crosstabulated frequency counts with marginal row/column totals. Format otherwise mirrors matrix.

See also

BooleanContingencyStats

Notes

This class assumes categorical data in the input series. For better structure, consider passing intervention and outcome series explicitly in future versions.

pairwise_post_hoc(alpha: float = 0.05, p_corr_method: PCorrectionMethod = 'holm')[source]#

Perform pairwise chi-square or Boschloo exact post hoc tests.

Appropriate if either rows or columns are binary.

By default, Holm-Bonferroni correction is used, but all correction methods supported by statsmodels.stats.multitest.multipletests() are supported.

Parameters:

alpha (float, default .05) – Significance level to be used for p-value correction.
p_corr_method (str, default 'holm') – p-Value correction method.

Returns:

Pairwise results condensed to a single DataFrame. Only returned when a pairwise test is performed.

Return type:

pd.DataFrame

Raises:

NotImplementedError – If both rows & columns have 3+ levels, which is technically possible, but currently, only the use of adjusted standardized residuals is supported for such cases.

Examples

>>> four_by_two.matrix
Outcome    dv0  dv1
Predictor
iv0        102   63
iv1         20    8
iv2          3    7
iv3          1   10
>>> four_by_two.pairwise_post_hoc()
               test  test_stat   p-value    p_corr
iv0          X^2(1)   2.572025  0.108768  0.272463
iv1          X^2(1)   2.095681  0.147716  0.272463
iv2  Boschloo exact   0.090821  0.095690  0.272463
iv3  Boschloo exact   0.000983  0.000722  0.003931

The binary axis is automatically detected, and tested against each level of the axis with 3+ levels:

>>> two_by_four.matrix
Predictor  iv0  iv1  iv2  iv3
Outcome
dv0        102   20    3    1
dv1         63    8    7   10
>>> two_by_four.pairwise_post_hoc()
               test  test_stat   p-value    p_corr
iv0          X^2(1)   2.572025  0.108768  0.272463
iv1          X^2(1)   2.095681  0.147716  0.272463
iv2  Boschloo exact   0.090821  0.095690  0.272463
iv3  Boschloo exact   0.000983  0.000722  0.003931

residuals_post_hoc(alpha: float = 0.05, p_corr_method: PCorrectionMethod = 'holm')[source]#

Perform post hoc tests using adjusted standardized residuals.

Preferred post hoc method if both rows & columns have 3+ levels (see Notes).

Uses adjusted standardized residuals (a/k/a adjusted Pearson residuals; cell-wise Z-scores), which are converted to p-values and corrected.

By default, Holm-Bonferroni correction is used, but all correction methods supported by statsmodels.stats.multitest.multipletests() are supported.

Parameters:

alpha (float, default .05) – Significance level to be used for p-value correction.
p_corr_method (str, default 'holm') – p-Value correction method.

Returns:

residuals (pd.DataFrame) – Adjusted standardized residuals in same format as self.matrix. Shows direction of effect in each cell.
p_values (pd.DataFrame) – Uncorrected p-values in same format as self.matrix.
p_corr (pd.DataFrame) – Corrected p-values in same format as self.matrix.

Raises:

NotImplementedError – If both rows & columns have 3+ levels, which is technically possible, but currently, use of adjusted standardized residuals is mandatory for such cases.

See also

pairwise_post_hoc: Pairwise tests if either variable is binary.

Notes

We prefer post hoc testing via adjusted standardized residuals only in cases where both rows & columns have 3+ levels.

If either rows or columns are binary, p-values are identical for both levels of the binary factor (see Examples). If comparing \(k \times 2\) rows & columns (where \(k \geq 3\)), there would only be \(k\) different p-values (duplicated twice). This method detects these cases, and only corrects for \(k\) p-values, to avoid inflating type-II error rate by naively corrected \(2k\) p- values.

However, this approach is effectively equivalent to running \(k\) pairwise \(\chi^2\) tests without Yates correction. Use of the pairwise_post_hoc() method is preferred, since this will automatically use Boschloo exact tests when expected frequencies are <5.

Examples

>>> four_by_four.matrix
Outcome    dv0  dv1  dv2  dv3
Predictor
iv0         70    0    0    0
iv1         53    4    3    0
iv2         28   17    4    2
iv3         14    7    3    9

Adjusted standardized residuals (a/k/a adjusted Pearson residuals, equivalent to Z-scores) indicate whether each cell was less (negative) or more (positive) frequent than expected by chance. p-Values indicate whether this difference is statistically significant.

>>> four_by_four.residuals_post_hoc().residuals
Outcome          dv0             dv1              dv2          dv3
Predictor
iv0         5.558156       -3.957284        -2.258185    -2.374231
iv1         2.440597       -1.737651         0.141516    -2.125546
iv2        -4.323559        4.913430         1.229105    -0.451581
iv3        -5.155365        1.505522         1.307525     6.260734
>>> four_by_four.residuals_post_hoc().p_values
Outcome             dv0           dv1       dv2           dv3
Predictor
iv0        2.726397e-08  7.580683e-05  0.023934  1.758554e-02
iv1        1.466299e-02  8.227228e-02  0.887462  3.354109e-02
iv2        1.535320e-05  8.949665e-07  0.219032  6.515711e-01
iv3        2.531376e-07  1.321900e-01  0.191034  3.831699e-10
>>> four_by_four.residuals_post_hoc().p_corr
Outcome             dv0       dv1       dv2           dv3
Predictor
iv0        4.089596e-07  0.000834  0.191473  1.582699e-01
iv1        1.466299e-01  0.493634  1.000000  2.347877e-01
iv2        1.842384e-04  0.000012  0.764137  1.000000e+00
iv3        3.543927e-06  0.660950  0.764137  6.130718e-09

When one axis is binary, compare the results pairwise_hoc_hoc() to residuals_post_hoc():

>>> four_by_two.matrix
Outcome    dv0  dv1
Predictor
iv0        102   63
iv1         20    8
iv2          3    7
iv3          1   10

pairwise_hoc_hoc() uses either \(\chi^2\) or Fisher exact tests, depending on expected frequencies for each comparison.

>>> four_by_two.pairwise_post_hoc()
               test  test_stat   p-value    p_corr
iv0          X^2(1)   2.572025  0.108768  0.272463
iv1          X^2(1)   2.095681  0.147716  0.272463
iv2  Boschloo exact   0.090821  0.095690  0.272463
iv3  Boschloo exact   0.000983  0.000722  0.003931

residuals_post_hoc() gives similar results to pairwise \(\chi^2\) tests (cf uncorrected p-values), but cannot use Boschloo (or Fisher) exact tests when expected frequencies are low (iv2 & iv3):

>>> four_by_two.residuals_post_hoc().p_values
Outcome         dv0       dv1
Predictor
iv0        0.108768  0.108768
iv1        0.147716  0.147716
iv2        0.057318  0.057318
iv3        0.000570  0.000570
>>> four_by_two.residuals_post_hoc().p_corr
Outcome         dv0       dv1
Predictor
iv0        0.217537  0.217537
iv1        0.217537  0.217537
iv2        0.171955  0.171955
iv3        0.002279  0.002279

post_hoc(alpha: float = 0.05, p_corr_method: PCorrectionMethod = 'holm')[source]#

Perform appropriate post hoc test, based on IV/DV levels.

Convenience method to choose type of post hoc test.

If either rows or columns are binary, results of pairwise \(\chi^2\) or Fisher exact tests (as appropriate for expected frequencies) are returned with uncorrected & corrected p-values.

If rows & columns both have 3+ levels, adjusted standardized residuals (a/k/a adjusted Pearson residuals; cell-wise Z-scores) are converted to p-values and corrected.

By default, Holm-Bonferroni correction is used, but all correction methods supported by statsmodels.stats.multitest.multipletests() are supported.

Parameters:

alpha (float, default .05) – Significance level to be used for p-value correction.
p_corr_method (str, default 'holm') – p-Value correction method.

Returns:

pd.DataFrame – Pairwise results condensed to a single DataFrame. Only returned when a pairwise test is performed.
residuals (pd.DataFrame) – Adjusted standardized residuals in same format as self.matrix. Shows direction of effect in each cell.
p_values (pd.DataFrame) – Uncorrected p-values in same format as self.matrix.
p_corr (pd.DataFrame) – Corrected p-values in same format as self.matrix.

See also

pairwise_post_hoc: When either rows or columns are binary.
residuals_post_hoc: Preferred when rows & columns have 3+ levels.

print_results()[source]#: Print contingency tables and Chi-squared results.

class unistat.contingency.BooleanContingencyStats(table_rows: Series, table_cols: Series, row_title: str | None = None, row_names: list[str] | None = None, col_title: str | None = None, col_names: list[str] | None = None)[source]#

Bases: _ContingencyStats

Perform contingency statistics on boolean (2x2) tables.

Extends MulticlassContingencyStats with methods specific to 2x2 tables, such as odds ratio and Fisher’s exact test.

Parameters:

table_rows (pd.Series) – Series representing the row variable (typically predictor), with dtype collapsible to Boolean (True/False, 1/0, 1.0/0.0).
table_cols (pd.Series) – Series representing the column variable (typically outcome), with dtype collapsible to Boolean (True/False, 1/0, 1.0/0.0).
row_title (str, optional) – Title for the row index.
row_names (list[str], optional) – Custom names for the row levels.
col_title (str, optional) – Title for the column index.
col_names (list[str], optional) – Custom names for the column levels.

Return type:

BooleanContingencyStats object

Warns:

ExpectedFrequencyWarning – If any cell-wise expected frequencies < 5

See also

MulticlassContingencyStats

Notes

Assumes the contingency table is 2x2.

p-values are displayed for both the chi-square test of independence (ToI), and for an exact test like Fisher’s. Deciding which test to report can follow Cochran’s rule-of-thumb criteria ¹ ², which includes (but is not limited to) the following as indication for use of an exact test (Fisher’s exact test in the original 1952 article) over chi-squared:

Any cell-wise expected frequency < 5
- Actual rule is <20% must have expected frequency < 5, which means no cells can have low expected frequency in a 2x2 table.
N < 20
Cochran (1952) ¹ recommends using Yates’ correction if N > 40 but any expected frequency < 500; unistat does not implement this by default.

By default, unistat never implements Yates’ correction factor. Hasselblad & Lokhnygina (2007) ³ found that in all cases, Yates-corrected chi-squared is inferior to Fisher’s exact test. Furthermore, they found that even Fisher’s exact test is too conservative, and that, depending on sample size, Fisher’s mid-p test or Barnard’s exact test offer better power while maintaining target Type I error control.

Alternative exact test(s) will be implemented in later releases; expect that at a minimum, this will include Boschloo’s exact test.

Lydersen et al. (2009) ⁴ compared multiple different exact tests, and noted the following:

Standard Fisher’s exact test is near-uniformly too conservative, though it always maintains Type I error rate
Fisher’s mid-p generally improves power, but occasionally violates Type I error rate.
Barnard’s exact test is an excellent performer, but is computationally intensive to a prohibitive degree (exponential time complexity).
Boschloo’s exact test (aka Fisher-Boschloo test) is an extension of Fisher’s exact, and was considered the gold standard by Lydersen et al.; it is universally more powerful than traditional Fisher’s exact & mid-p, and in trials did not violate target Type I error rate.
- Further improved using the Berger-Boos correction, particularly for unbalanced designs (e.g. if survival occurs much more often than mortality) ⁴ ⁵
- Standard Berger-Boos correction factor is \(\gamma = 0.001\) ⁴
  Not implemented by SciPy, though included in R Exact package

odds_ratio(kind: str = 'sample')[source]#

Compute the odds ratio.

Parameters:: kind (str, optional) – Type of odds ratio: ‘sample’ (default), ‘conditional’, or ‘unconditional’.
Returns:: Result object with statistic and confidence interval.
Return type:: scipy.stats._result_classes.OddsRatioResult

fisher_exact(alternative: Literal['two-sided', 'less', 'greater'] = 'two-sided')[source]#

Perform Fisher’s exact test.

Parameters:: alternative (str, default 'two-sided') – Alternative hypothesis: ‘two-sided’, ‘less’, or ‘greater’.
Returns:: p-value of the test.
Return type:: float

boschloo_exact(alternative: Literal['two-sided', 'less', 'greater'] = 'two-sided', n_sampling_points: int = 32)[source]#

Perform Boschloo’s exact test.

Parameters:

alternative (str, default: 'two-sided') – Alternative hypothesis: ‘two-sided’, ‘less’, or ‘greater’
n_sampling_points (int, default 32) – Number of sampling points used in the construction of the sampling method. See scipy.stats.boschloo_exact() documentation for further detail.
_bosch-exact (..)

Returns:

statistic (float) – Test statistic for Boschloo’s exact test, which is the lesser of the p-values given by 2 one-sided Fisher exact test.
pvalue (float) – Boschloo’s exact p-value.

Notes

Lydersen et al. (2009) ⁴ compared multiple different exact tests, and found Boschloo’s exact test to be universally more powerful than both traditional and mid-p Fisher’s exact tests. Boschloo’s exact test can be further improved using the Berger-Boos correction, particularly for unbalanced designs (e.g. if survival occurs much more often than mortality) ⁴ ⁵

Standard Berger-Boos correction factor is \(\gamma = 0.001\) ⁴
- Not implemented by SciPy, though included in R Exact package
- May be implemented here in future update

print_results()[source]#

Print tables, odds ratio, Chi-squared, and Boschloo exact results.

Overrides the parent method to include 2x2-specific statistics.

Contingency Tables#

This Page