Saturday, November 17, 2012

How do I wrap my head around this concept of logistic regression?

Logistic regression is used when you have a classification problem. A classification problem has dependent variables that can only be either 0 or 1. For example, a dealership may collect data on the sales process and the dependent variable will be either a sale is made (0) or its not made (1).

Of course, this type of data does not work very well with a traditional linear regression because the distribution of the dependent variable is not normal. But linear regression is a good place to start with this discussion because it gives me a reference point to show you what I understood about linear regression that I didn't understand about logistic regression.

When you run a linear regression with a set of data, you get a regression equation. The general form of a regression equation with one independent variable is 
where a0 and a1 are the coefficients. (This is just a different form of the slope intercept form where a0 is the y intercept and a1 is the slope). It is intuitively obvious how to use this equation. If your model is good, you can substitute in an x value and output a prediction for y.

The first problem in PS#1 in the machine learning class that requires a program is this:

Implement2 Newton’s method for optimizing ℓ(θ), and apply it to
fit a logistic regression model to the data. Initialize Newton’s method with θ = ~0 (the
vector of all zeros). What are the coefficients θ resulting from your fit? (Remember
to include the intercept term.)

And here is the generalization of Newton's method in the notes:

where thetas are the coefficients, H is the Hessian matrix and the last set of symbols represent the derivative of the log likelihood function. This is not my tidy little regression equation where I can put in x values and get back a y value. And, really, how could it be since the y value is just 0 and 1? We are not in Kansas anymore, Toto. Not to mention that, while in theory I understood what the H matrix and log likelihood derivative vector are, in practice it was very difficult generate concrete equations to use in the program.

I am embarrassed to say that it took me an incredibly long time to answer these questions. In my defense, the resources on the web are really hard to understand.  Did you scroll down far enough so see where part of the information is written as a debate? But here is what I figured out. Once you get the coefficient values (code will come in another post), you can calculate the value of the sigmoid function, h.
This is a cumulative probability function. If the value of this function is less than 0.5, the output value is 0. If the value is greater than 0.5, the output value is 1. See how simple that is? This generates a nice plot when there are two variables because h=0.5 when the value of the exponent on e is 0. Your equation for theta reduces to
which can be solved for x2
Now you can plot this equation with your data and get this
Not sure how to plot something like this is Python? Don't worry, I'll reveal all my code.





No comments:

Post a Comment