Simple Regression#
Module for regression statistics.
This module provides an abstract base class and concrete implementations for performing regression analyses using statsmodels. It supports linear regression, logistic regression, and log-binomial regression, with features like variance inflation factor (VIF) calculation, standardized regressions, odds/risk ratios, and formatted output.
Assumes input data are pandas Series/DataFrames. Boolean columns are not subject to standardization.
- class unistat.regression.RegressionStats(X, y, bool_col_names: list | str | None = None)[source]#
Bases:
ABCAbstract base class for regression statistics.
Provides common functionality for regression models, including data preparation, standardization, VIF calculation, and properties for regression results.
- Parameters:
- X#
Feature DataFrame. NaN values are removed and all columns are converted to float64.
- Type:
pd.DataFrame
- y#
Target DataFrame. NaN values are removed and all columns are converted to float64.
- Type:
pd.DataFrame
- reg#
Fitted regression model.
- Type:
statsmodels regression result
- X_std#
X, with all non-Boolean columns transformed to Z-scores.
- Type:
pd.DataFrame
- std_reg#
Fitted standardized regression model.
- Type:
statsmodels regression result
Notes
Observations with any missing data in either
Xoryare dropped.- vif_matrix()[source]#
Calculate variance inflation factors (VIF) for feature DataFrame.
- Returns:
VIF values for each feature.
- Return type:
pd.Series
- Raises:
ValueError – If fewer than 2 columns in X.
- class unistat.regression.LogitStats(X, y, bool_col_names: list | str | None = None)[source]#
Bases:
RegressionStatsClass for logistic regression statistics.
Extends RegressionStats for logistic regression using Logit model.
- Parameters:
- logit_or(standardize: bool = False) DataFrame[source]#
Calculate odds ratios by predictor, with 95% confidence intervals.
- Parameters:
standardize (bool, optional) – Use standardized model. Defaults to False.
- Returns:
Odds ratios with 95% CI.
- Return type:
pd.DataFrame
- Raises:
ValueError – If all columns are boolean and standardize is True.
- class unistat.regression.LinRegStats(X, y, bool_col_names: list | str | None = None)[source]#
Bases:
RegressionStatsClass for linear regression statistics.
Extends RegressionStats for ordinary least squares (OLS) regression.
- Parameters:
Notes
unistatdoes NOT standardize the values ofyfor linear regression. Typically, “standardized regression” refers to a transformation of \(y \sim X\) such that \(\text{SD}\left( y \right) \sim \text{SD}\left( X \right)\); a coefficient \(\beta\) is thus interpreted as a 1-S.D. increase in \(X\) conferring a \(\beta\) S.D. increase in \(y\). We find this to be difficult to interpret, with no benefit beyond adherence to convention.Instead,
unistatopts for “X-standardized regression”. That is, since only \(X\) is Z-scored, \(y \sim X\) is transformed such that \(y \sim \text{SD}\left( X \right)\). Here, a coefficient \(\beta\) is interpreted as a 1-S.D. increase in \(X\) conferring an absolute increase of \(\beta\) units in \(y\). This is more easily interpretable, while still allowing comparison of the relative strengths of all \(X\) predictors.
- class unistat.regression.LogBinStats(X, y, bool_col_names: list | str | None = None)[source]#
Bases:
RegressionStatsClass for log-binomial regression statistics (experimental).
Extends RegressionStats for generalized linear model with binomial family and log link.
- Parameters:
Warning
This class is experimental; use with caution and verify results.
- logbin_rr(standardize: bool = False) DataFrame[source]#
Calculate risk ratios by predictor with 95% confidence intervals.
- Parameters:
standardize (bool, optional) – Use standardized model. Defaults to False.
- Returns:
Risk ratios with 95% CI.
- Return type:
pd.DataFrame
- Raises:
ValueError – If all columns are boolean and standardize is True.