I figured that only an ensemble method would get me to a higher score and I started to experiment with these methods. I never managed to come up with an ensemble that even matched my original submission. While I did this, I noticed some things about sci kit learn in Python that made me start to think about looking for other tools.
I decided to try R and Rapid Miner. Rapid Miner has not been a successful experience. I can't seem to get passed the set up repository/import data stage. I have had much more success with R. Most of this is due to a wonderful set of videos by David Mease. If you are interested in learning R for data mining and machine learning, his videos are pure gold. There are 13 videos on Youtube. Not only does he show you how to use R, but he has all the example data sets online so that you can play along. He also does a wonderful job of explaining what benchmarks to use.
David uses a subset of a well known sonar data set. He uses 130 observations in the training set and 78 observations in the test set. There are 60 features. He goes over several methods with the same data set. I still have one more video, but so far he has covered decision trees, svm and k nearest neighbors. He uses k nearest neighbors with n=1 as a benchmark. This is the default in R. For this data set, it gives a missclassification rate of 21%. This is better that the decision tree misclassification rate which is about 30%. But the svm should be able to beat the untuned k nearest neighbors.
I used this same sonar data set to compare results in R and Python.
k nearest neighbors
Missclassification rate for R: 21%
Missclassification rate for Python: could not get this. I set the n_neighbors=1, but I got this error:
C:\Python27\lib\site-packages\sklearn\neighbors\classification.py:131:
NeighborsWarning: kneighbors: neighbor k+1 and neighbor k have the same
distance: results will be dependent on data order.
neigh_dist,
neigh_ind = self.kneighbors(X)
The default distance in k nearest neighbors is the Euclidian distance. The data should be scaled so that the variances of each variable are equal. R does this automatically. Python requires you to scale the data yourself.
Decision Tree
The following table shows the results I got:
Depth
|
R training accuracy
|
R test accuracy
|
Python training accuracy
|
Python test accuracy
|
1
|
.7769
|
.7179
|
.7769
|
.7179
|
2
|
.80
|
.7051
|
.8077
|
.7051
|
3
|
.8615
|
.6538
|
.8923
|
.6667
|
4
|
.8846
|
.6923
|
.9385
|
.7179
|
5
|
.8846
|
.6923
|
.9846
|
.7436
|
6
|
N/A
|
|
1.0
|
.7308
|
Note that the results are the same for a max depth of 1 and 2. As the max depth increases, it looks like sci kit learn gives the better results. However, the test accuracy stays fairly flat for both models while the Python model training accuracy increase to 1.0. It certainly looks like max depth 4 and 5 in Python have overfit the data. It would be nice to compare a picture of the two trees. The tree in R is quite easy to generate. Python requires some graphics modules that are fairly involved to use. At least, they were for me. I couldn't get either one to work. The R model won't fit max depth 6 because of overfitting issues.
Support Vector Machines
The first thing I did is run a default support vector machine in R and Python. Both programs use an rbf kernel as default.
R scales the data and uses cost=1 and gamma=1/number of features as default values. The untuned svm gives a missclassification error of 1.5% for the training data and about 13% for the test data.
Python doesn't scale the data and neither did I. (Maybe this is not a fair comparison but it is an extra step in sci kit learn that isn't required in R.) The Python default values are C=1 (cost=1) and gamma=0. This untuned svm gives a missclassification error of about 30% for the training data and about 36% for the test data.
In addition to the questions I have about how sci kit learn models fit the data, there is the additional problem of categorical data. R usually recognizes categorical data. If it doesn't, you can set a variable to be categorical and R will know how to handle it. Python requires you to transform your own categorical data and it is a klugy process. There is a module called OneHotEncoder. But you can't run this module unless you transform all of your text data to numeric.
I still have a lot to learn about machine learning in R. But from I've seen so far, I think I'll stick to R when I want to run a machine learning algorithm.
I believe there are many more pleasurable opportunities ahead for
ReplyDeleteindividuals that looked at your site.Besant technology offer Python training in Bangalore
And indeed, I’m just always astounded concerning the remarkable
ReplyDeletethings served by you. Some four facts on this page are undeniably the
most effective I’ve had.
Selenium Training in Bangalore
I feel really happy to have seen your webpage and look forward to so many more entertaining times reading here. Thanks once more for all the details.
ReplyDeleteJava training in bangalore
I’m experiencing some small security issues with my latest blog, and I’d like to find something safer. Do you have any suggestions? DevOps Training in Bangalore
ReplyDeleteAWS as a career is a sunshine career, i.e., it is just about to take off with a huge potential in near future.
ReplyDeleteAWS Training in Bangalore Any graduate with qualified training on AWS can aspire for a career in AWs. Most of the companies recruit from the following qualifications, B.Sc, B.Com, BCA, B.Tech, MCA, M.Sc, and M.Tech. Besant technologies has a mature training plan with comprehensive coverage on all testing topics backed by qualified/certified/experienced trainers.
I simply wanted to thank you so much again. I am not sure the things that I might have gone through without the type of hints revealed by you regarding that situation.
ReplyDeleteBesant technologies Marathahalli
hi admin i have read your blog.It was interesting.Keep it up. get more Inventory Verification | Vendor Reconciliation | Customer Helpdesk
ReplyDeleteIt has been simply incredibly generous with you to provide openly what exactly many individuals would’ve marketed for an eBook to end up making some cash for their end, primarily given that you could have tried it in the event you wanted.
ReplyDeleteSelenium Training in Rajaji Nagar
Great post you shared, you have now become top of my list. You were unknown to me before but have found your content to be fantastic.
ReplyDeleteContinuous Transaction Monitoring
Duplicate Invoice Audit
Your new valuable key points imply much a person like me and extremely more to my office workers. With thanks; from every one of us.
ReplyDeleteBig Data Analytics Online Training
This comment has been removed by the author.
ReplyDeleteThis is beyond doubt a blog significant to follow. You’ve dig up a great deal to say about this topic, and so much awareness. I believe that you recognize how to construct people pay attention to what you have to pronounce, particularly with a concern that’s so vital. I am pleased to suggest this blog.
ReplyDeletepython training Course in chennai | python training in Bangalore | Python training institute in kalyan nagar
I always enjoy reading quality articles by an individual who is obviously knowledgeable on their chosen subject. Ill be watching this post with much interest. Keep up the great work, I will be back
ReplyDeleteJava training in Annanagar | Java training in Chennai
Java training in Chennai | Java training in Electronic city
I prefer to study this kind of material. Nicely written information in this post, the quality of content is fine and the conclusion is lovely. Things are very open and intensely clear explanation of issues
ReplyDeleteData Science course in kalyan nagar | Data Science course in OMR
Data Science course in chennai | Data science course in velachery
Data science course in jaya nagar | Data science training in tambaram
All the points you described so beautiful. Every time i read your i blog and i am so surprised that how you can write so well.
ReplyDeleteaws online training
data science with python online training
data science online training
rpa online training
thanks for sharing this information
ReplyDeleteAngularJS Training in Chennai | AngularJS Training in Anna Nagar | AngularJS Training in OMR | AngularJS Training in Porur | AngularJS Training in Tambaram | AngularJS Training in Velachery
I have read this whole blog and it is an amazing blog for developers who are dealing daily with the new challenges and tasks.keep up sharing!!!
ReplyDeleteandroid training in chennai
android online training in chennai
android training in bangalore
android training in hyderabad
android Training in coimbatore
android training
android online training
Thank you so much for sharing these amazing tips. I must say you are an incredible writer, I love the way that you describe the things. Please keep sharing.
ReplyDeletejava training in chennai
java training in velachery
aws training in chennai
aws training in velachery
python training in chennai
python training in velachery
selenium training in chennai
selenium training in velachery
Wonderful blog with great piece of information. Regards to your effort. Keep sharing more such blogs.Looking forward to learn more from you.
ReplyDeleteweb designing training in chennai
web designing training in porur
digital marketing training in chennai
digital marketing training in porur
rpa training in chennai
rpa training in porur
tally training in chennai
tally training in porur
i have express a few of the articles on your website now, and I really like your style of blogging. I added it to my favorite’s blog site list and will be checking back soon…
web designing training in chennai
web designing training in annanagar
digital marketing training in chennai
digital marketing training in annanagar
rpa training in chennai
rpa training in annanagar
tally training in chennai
tally training in annanagar
Well Said, you have furnished the right information that will be useful to anyone at all time. Thanks for sharing your Ideas.
ReplyDeleteAngular js Training in Chenai
Angular js Training in Velachery
Angular js Training in Tambaram
Angular js Training in Porur
Angular js Training in Omr
Angular js Training in Annanagar
i found it really useful and informative. Will sure use the knowledge that you have embedded in here. Appreciate the effort.
ReplyDeleteSelenium Training in Chennai
Selenium Training in Velachery
Selenium Training in Tambaram
Selenium Training in Porur
Selenium Training in Omr
Selenium Training in Annanagar
thanks for sharing this information
ReplyDeleteJava course in chennai
python course in chennai
web designing and development course in chennai
selenium course in chennai
digital-marketing seo course in chennai
I always enjoy reading quality articles by an individual who is obviously knowledgeable on their chosen subject. amazon web services aws training in chennai
ReplyDeletemicrosoft azure course in chennai
workday course in chennai
android course in chennai
ios course in chennai
Great post you shared, you have now become top of my list. You were unknown to me before but have found your content to be fantastic.
ReplyDeleteIELTS Coaching in chennai
German Classes in Chennai
GRE Coaching Classes in Chennai
TOEFL Coaching in Chennai
Spoken english classes in chennai | Communication training
smm panel
ReplyDeletesmm panel
İş ilanları
instagram takipçi satın al
hirdavatciburada.com
https://www.beyazesyateknikservisi.com.tr/
servis
tiktok hile indir
Good content. You write beautiful things.
ReplyDeletevbet
vbet
korsan taksi
sportsbet
hacklink
hacklink
mrbahis
sportsbet
mrbahis
yurtdışı kargo
ReplyDeleteresimli magnet
instagram takipçi satın al
yurtdışı kargo
sms onay
dijital kartvizit
dijital kartvizit
https://nobetci-eczane.org/
LG5RZW
Litvanya yurtdışı kargo
ReplyDeleteLüksemburg yurtdışı kargo
Macaristan yurtdışı kargo
Malta yurtdışı kargo
Polonya yurtdışı kargo
T3EHT6
Portekiz yurtdışı kargo
ReplyDeleteRomanya yurtdışı kargo
Slovakya yurtdışı kargo
Slovenya yurtdışı kargo
İngiltere yurtdışı kargo
İ5M8WU
Yunanistan yurtdışı kargo
ReplyDeleteAfganistan yurtdışı kargo
Amerika Birleşik Devletleri yurtdışı kargo
Amerika Samoası yurtdışı kargo
Angola yurtdışı kargo
5Y5X
Thanks for sharing such informative blog.
ReplyDeletePython training in Nagpur
A round of applause for your blog post. Many thanks again.
ReplyDeleteHyperion Training from Hyderabad
SQL Server Developer Online Coaching In Australia
Splunk Training Classes
Oracle Golden Gate Certification Online Training from Hyderabad
Sap Security Online Training
Best Webmethods Training from Hyderabad
Microsoft Dynamics CRM 365 Online Course from India
Thanks for sharing post with us .
ReplyDeletePython Classes in Pune
شركة تسليك مجاري في دبي iEJ7AkbBaJ
ReplyDelete