DSGeek

February 28, 2019

Kaggle Toxic Comments Competition

One of my goals for 2019 is to resume competing in Kaggle competitions. Thinking about this made me realize that I never posted about my solution for last year’s Toxic Comment Challenge. The competition’s goal was to train a model to detect toxic comments like threats, obscenity, insults, and identity-base hate. The data set consisted of comments from Wikipedia’s talk page edits. The training data set had ~500K examples each with one, or more, of the following labels:

Tags: NLP , Kaggle , TensorFlow , DeepLearning

February 19, 2018

Creating TF-IDF Weighted Word Embeddings

Although I’m late to the start, I’ve been working on the Kaggle Toxic Comment Challenge. The dataset only contains about 560K comments. Before trying a deep learning model, I was curious to see how well a relatively simple approach might do. At the same time, I still wanted to use word embeddings to maximize generalization to unseen text. The Coursera Deep Learning Sequence Models class described summarizing documents by averaging over each word’s embedding vector.

Tags: NLP , Kaggle

February 8, 2018

Kaggle TensorFlow Speech Challenge

Over the holidays, I competed in the Kaggle TensorFlow Speech Recognition Challenge. The competition’s goal was to train a model to recognize ten simple spoken words using Google’s speech command data set. This is essentially the trigger word detection problem that alerts voice activated intelligent personal assistants when to pay attention. This post briefly documents my solution written in Python and TensorFlow. As usual, the sources are available on my Github repo.

October 31, 2017

Silent but not Idle

It’s been quite some time since my last post. My silence hasn’t been because of idleness. Quite the opposite. Most of my time this year has been dedicated to gaining a strong understanding of deep learning. I built a solid theoretical foundation by reading Ian Goodfellow’s wonderful book Deep Learning and watching lecture videos from Stanford’s CS231n, Convolutional Neural Networks for Visual Recognition. I rounded out my knowledge by reading various papers, watching talks on YouTube, etc.

Tags: Meta , Coursera , DeepLearning , PGMs

January 29, 2016

Oh Holiday Spherical Quad Tree

This past holiday season, I once again spent my “time off” working on a solution to a Kaggle holiday challenge. This year, Santa needed help after his magic sleigh was stolen! Instead of delivering all the presents in one trip, Santa was forced to make multiple trips using his non-magical sleigh. The sleigh had a weight limit and the poor reindeer were in danger of exhaustion from all the trips. Santa needed a plan for delivering the presents that minimized reindeer weariness.

Tags: Kaggle , DataScience