January 29, 2016

Oh Holiday Spherical Quad Tree

This past holiday season, I once again spent my “time off” working on a solution to a Kaggle holiday challenge. This year, Santa needed help after his magic sleigh was stolen! Instead of delivering all the presents in one trip, Santa was forced to make multiple trips using his non-magical sleigh. The sleigh had a weight limit and the poor reindeer were in danger of exhaustion from all the trips. Santa needed a plan for delivering the presents that minimized reindeer weariness.
Read More…

Tags:  Kaggle , DataScience

November 3, 2015

Programmatically visualizing fall

For the Whenology project, I needed to visually describe key concepts related to leaf color change during Fall. The obvious answer was to simply draw some trees! But how? I knew that D3js could import SVG files, so the first step was to draw a tree. I started with a public domain picture of a leaf and used Adobe Illustrator’s Image Trace feature to turn it into an SVG outline.
Read More…

Tags:  D3js , Visualization , SocialGood

October 2, 2015

Having too much fun with climate analytics

I like to solve hard problems. So much so that when I get a particularly good one it’s often difficult to do anything else. This personality quirk has been great for my engineering career but not so good for consistently blogging. Maybe you’ve noticed. The good news is that I’m finally coming up for air and have lots to blog about. Since February, I’ve been collaborating with scientists from Acadia National Park, the Earthwatch Institute, and the Schoodic Institute to study how climate change may be altering the life-cycles of and interactions between species at the park.
Read More…

Tags:  DataScience , SocialGood , R , Python , Projects , Spark , Visualization

January 29, 2015

Skewing Around, Part 3

In parts 1 and 2 , I described how I used the AMS and HyperLogLog online algorithms to estimate the frequency distribution of values in multiple integer sequences. I was doing this for a friend that needed a compute and space efficient way to analyze thousands of such streams. Although the solution I came up with in Part 2 worked, it didn’t express the “skew” in human understandable terms like “20% of the values accounted for 80% of the occurrences”.
Read More…

Tags:  MachineLearning , Streams

December 12, 2014

Skewing Around, Part 2

Where did November go? Or the first half of December for that matter? In Part 1 , I began describing a project to estimate the skew of many integer streams in real-time. I explained how the second frequency moment, also called the surprise number, can be used for this purpose and demonstrated using the AMS to estimate it with minimal compute and memory resources. However, the post concluded with the observation that it is difficult to compare second frequency moments between streams.
Read More…

Tags:  MachineLearning , Streams , Julia