June 20, 2014

Learning About the Higgs

One of my favorite things about Data Science is the opportunities it provides to learn new things. When I work on a problem, I like having the domain knowledge needed to understand the data, create new features, and identify unexpected insights. Diving into a new topic makes the whole experience much more exciting and rewarding when the results are useful.

So, when I learned about the Kaggle Higgs Boson Machine Learning Challenge, I nearly leaped out of my chair with excitement. I’ve always been a physics geek and eagerly watch shows on Quantum Physics, Relativity, etc. But I’ve never had a reason to learn about these topics in greater depth. Not only does the Kaggle challenge provide a compelling motivation, I also get to play with data!

I started with the technical documentation provided by the challenge’s sponsors from the Atlas project. I studied the paper during a two hour flight to Chicago and managed to get a rough understanding of the Higgs decay channels involved in the challenge.

Next, I found this series of Introduction to Particle Physics YouTube videos by Professor Frank Close that really helped improve my understanding. During a recent trip to the book store, I picked up a copy of Brian R. Martin’s book, Particle Physics: A Beginners Guide. It is well written and goes into more depth on relevant topics. I’m now at a point that I feel comfortable with the data’s features and ready to explore them.

Some might ask if all this learning is necessary. An ensemble of models could perhaps find the key predictive relationships without requiring domain knowledge. But I have concerns about this “kitchen sink” approach to Data Science. I think human creativity is still necessary to the data science process to uncover truly new and valuable insights from data and this, in turn, requires some domain knowledge.

Regardless, my love of learning new things will always motivate me to obtain a degree of domain knowledge while working on data science projects. Only time will tell if this increases or decreases my productivity and effectiveness. Either way, I’m sure I’ll have fun.

Tags:  Kaggle , DataScience