11/14/2022 0 Comments Kaggle competition spelling corrector![]() From there, you make steady, incremental progress until about 0.280 or so, but afterwards, any further improvements is limited. At first, progress is easy, and pretty much anyone who submitted anything that was not “predict all zeros” got over 0.200. This graph illustrates the nature of this competition. Line in green is my submission, scoring 0.265. Yet despite being within 10% of the top submission in overall accuracy, I was still in the bottom half of the leaderboard, ranking in the 30th percentile.Ībove: Public leaderboard of the Porto Seguro Kaggle competition two days before the deadline. It was near the end of the competition so I stopped here.Īt this point, the top submission had a score of 0.291, and 0.288 was enough to get a gold medal. In R, this is simple - I just needed to change the glm() call to gbm() and fit the model again. Next, I tried gradient boosted decision trees, which I had learned about in a stats class but never actually used before. #KAGGLE COMPETITION SPELLING CORRECTOR HOW TO#It took a day or so to figure out how to do logistic regression properly, which got me a score of 0.259 on the public leaderboard. Initially my logistic regression wasn’t working properly and I got a negative score. All you had to do was load the data into memory, invoke the glm() function in R, and output the predictions. The first thing I tried was logistic regression. Still, I was curious to see how my relatively simple tools would fare against the sophisticated techniques on the leaderboard. Kaggle is a machine learning competition platform filled with thousands of smart data scientists, machine learning experts, and statistics PhDs, and I am not one of them. I guess this is because it’s a fairly well-understood problem (binary classification) with a reasonably sized dataset, making it accessible to beginning data scientists. With over 5000 teams entering the competition, it was the largest Kaggle competition ever. Basically, you’re asked to predict a binary variable - whether or not an insurance claim will be filed - based on a bunch of numerical and categorical variables. This week I participated in the Porto Seguro Kaggle competition. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |