C_range = 10.0 ** np.arange(-2, 9)
gamma_range = 10.0 ** np.arange(-5, 4)
param_grid = dict(gamma=gamma_range, C=C_range)
cv = StratifiedKFold(cl, n_folds=3)
grid = GridSearchCV(SVC(), param_grid=param_grid, cv=cv)
This code should search for the best values for C in the range .01 to 1,000,000,000 and gamma in the range .00001 to 1000. Unfortunately, this just makes my computer crash and I never got an answer.
But I found a reference here that says an exhaustive grid search is time consuming. It suggests to use a coarse search first and then a fine search when you are in the correct region.
So starting from C=10 and gamma = .01, I refined my search and here are the values that I got:
C gamma score
10 .01 .898
9 .0095 .898
8 .009 .901
7 .0085 .901
6 .0085 .901
5.5 .0085 .901
5.4 .0085 .901
5.3 .0086 .901
5.28 .0086 .901
You can see that started by using dictionary values for C with a step of 1 on each side of 10 and a step of .005 on each side of gamma. As the grid search stabilized, I narrowed the step on C to .1 and gamma to .0005. Gamma was very stable and I didn't change the step size. I arbitrarily stopped changing C when the step size reached 0.01.
When I ran these parameters using my 70/30 split on the data, I got a score of .9166. This is a 3.6% improvement on the previous score of .9133.
Reference: A Practical Guide to Support Vector Classification retrieved from http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
No comments:
Post a Comment