All notes
LogisticRegressi

Intro

Pre-requisites

Advantages

Usage

Quora: comparion among classifications.
L2-regularized LR could be a baseline for more fancier classification approaches.

Categories

Logistic functions

Odds Ratio

Odds

Odds
$$\frac{P(A)}{P(\neg A)} = \frac{a}{b} = \frac{p}{1-p}, \quad p=\frac{a}{a+b}.$$
Odds Ratio
$$ R = \frac{\frac{p_1}{1-p_1}}{\frac{p_2}{1-p_2}} $$

Logit

For 0<p<1$$ logit(p) = log(\frac{p}{1-p}) $$

base 2 - bit
base e - nat
base 10 - ban

Logistic function

Logistic function is the inverse-logit: $$ logit^{-1}(\alpha) = \frac{1}{1+\exp ^{-\alpha}} $$

SVM

Difference with Logistic Regression

Disadvantages

Hard to train, esp. many training examples.

FAQ

Sample size requirement

StackExchange.com.

There are (at least) two different kinds of instability:
  • The model parameters vary a lot with only slight changes in the training data.
  • The predictions (for the same case) of models trained with slight changes in the training data vary a lot.
The best method is to scrutinize the two instabilities. Just relying on the 1 to 10 rule will be insufficient (see below).

1 to 10 rule

Basically, as the ratio of parameters estimated to the number of data gets close to 1, your model will become saturated, and will necessarily be overfit (unless there is, in fact, no randomness in the system). The 1 to 10 ratio rule of thumb comes from this perspective.

The 1 to 10 rule comes from the linear regression world, however, and it's important to recognize that logistic regression has additional complexities.