Medium Last updated on May 7, 2022, 1:10 a.m.

Logistic Regression is a form of **Discriminative Classifier**. In Discriminative Classifiers We only focus on P(Y/X) i.e; we make an assumption about the Probabilistic Distribution of Labels(Y), and try to somehow map our Input(X) to it. So, essentially what we try to do in Logistic Regression is:

$$ f_{LR}(X)=argmax_{c \epsilon y}P(Y=c|x) $$

Here, as we know Logistic Regression is a Classification Model, so we need to define the distribution of something of Categorical Nature. For Binary Classification, we can assume it to be **Bernoulli Distribution**, whereas, for Multi-Class Classification, it will be **Multinoulli Distribution**. Just to make derivation simpler, we will assume that we have to build Binary Classification Model, which means The probability of a class is either Θ, if y=1, or 1 − Θ, if y = 0. The likelihood is then:

\begin{equation}

P = \theta^y * (1-\theta)^{(1-y)}

\label{eq1}

\end{equation}

Here `Θ`

is a sigmoid function,

$$ \theta=\frac{1}{1+exp(-wx-b)} $$

on solving sigmoid function further we get:

$$ \theta * (1+exp(-wx-b)) = 1 $$

$$ exp(-wx-b) = \frac{1-\theta}{\theta} $$

by taking `log`

on both sides we get:

$$ wx+b = log(\frac{\theta}{1-\theta})$$

**What Algorithm to use to find optimal parameters of Logistic Regression?**

In Theory, Logistic Regression can be optimized using the Maximum Likelihood Estimation and Gradient Descent(L-BFGS) method. But in the real world, it is converged using the Gradient Descent method as Maximum Likelihood Estimation fails to provide any closed-form solution. We can prove this theoretically like this:

$$ Loss = \prod_{i=1}^{n} \theta_{i}^{y_i}*(1-\theta_i)^{(1-y_i)}$$

Using Maximum Likelihood Estimation for optimization we get:

$$ Loss = \sum_{i=1}^{n}log(1-\theta_i)+y_i * log(\frac{\theta_i}{1-\theta_i})$$

using definition of `Θ`

we will get:

$$ Loss = \sum_{i=1}^{n} -log(1+exp(-wx_i-b)) + y_i * (wx_i+b))$$

now to find optimal parameters we will have to differentiate them with respect to `w`

and `b`

, but when we differentiate them, what we observe is:

$$ \frac{\partial Loss}{\partial w} = \sum_{i=1}^{n} -\frac{exp(wx_i+b) * x_i}{1+exp(wx_i+b)}+y_i * x_i $$

we can’t set it to 0, to obtain optimal parameters, as this is a transcendental equation, and there is no closed-form solution for it. We can however approximately solve it numerically using Gradient Descent with Cross-Entropy as Loss Function.

For details on how to solve Logistic Regression, please check out the Logistic Regression From Scratch Coding Exercise.

Frequently Asked Questions by

Amazon IBM Microsoft