Sunday, December 16, 2012

Using matplotlib to plot the answer to ps1

Now that you know how to solve the first problem in Problem set 1, you have to graph the answer. I used matplotlib. The website for this plotting module can be found here.

Here's a copy of code to plot the data for ps 1:

#Plots x1 vs x2 for Problem 1 of the Stanford machine learning class

from pylab import *
import numpy as np
import numpy.linalg
from math import *

x1,x2=np.loadtxt('q1x.dat',unpack=True)
y=np.loadtxt('q1y.dat',unpack=True)

# we need to show when h=.5 as a line to separate the data. h=.5 when
# theta transpose x =0. so 0=theta0 +theta1*x1+theta2*x2
# solving for x2 give -theta0/theta2-theta1/theta2*x1

a=np.array([min(x1),max(x1),1,0.01])
b=-(-2.6205)/1.1719-0.7604/1.1719*a
plot(a,b)
# Use different colors and markers to plot x1 vs x2 if y=0 or y=1

for i in range(0,98):
    if y[i]==0:
        plot(x1[i],x2[i],'ro')
    if y[i]==1:
        plot(x1[i],x2[i],'bx')

xlabel('x1')
ylabel('x2')

show()

I used a for x1 and b as the calculated value for x2. Plotting a vs b gives that beautiful straight line across the plot.

One of my previous posts talked about expecting the unexpected in numpy. It happens again in this little short program. This line:
a=np.array([min(x1),max(x1),1,0.01])
was not what I though it was. I thought this gave me a list of values between the minimum of x1 and the maximum of x1. I was wrong. Here is what this actually gives:
  print a
[ 0.57079941  7.7054006   1.          0.01      ]

If you want a list of values, you need to use this code:

 a=np.linspace(min(z),max(z),25)

This gives 25 values between the minimum and maximum of your data set.

The rest of the code is very straight forward. Just set up a loop. If y=1, the data plots as a red circle. If y=0, the data plots as a blue x.

I know I have posted this graph before but it is so pretty, I'm going to post it again:


No comments:

Post a Comment