January 29, 2016

Oh Holiday Spherical Quad Tree

This past holiday season, I once again spent my “time off” working on a solution to a Kaggle holiday challenge. This year, Santa needed help after his magic sleigh was stolen!

Instead of delivering all the presents in one trip, Santa was forced to make multiple trips using his non-magical sleigh. The sleigh had a weight limit and the poor reindeer were in danger of exhaustion from all the trips. Santa needed a plan for delivering the presents that minimized reindeer weariness. Poor little guys. You can find out more about the challenge in this post by the designer.

This was essentially a capacitated vehicle routing problem. I approached it by clustering presents based on geographic distance. To avoid an \(O(N^2)\) computation, I needed a way to efficiently determine a present’s closest neighbors. I ended up using a form of spherical quad-tree (SQT), or quaternary triangular mesh, based on a coordinate transformation method described in this paper.

Given latitude and longitude locations, the SQT maps each of them to an octahedron face that serves as an (initially poor) approximation of the globe. The octahedron is then tessellated and the points mapped to the appropriate triangles. When the tessellation is done, locations close to each other get mapped to the same triangle. This substantially reduces the number of locations to consider when clustering. My SQT implementation is extremely simple - all points must be added before the tessellation is done and it doesn’t support hierarchical querying. But I think the implementation can be extended to be a general purpose SQT without much effort.

To make sure I got the SQT math right, I wrote a small prototype in Processing. The following animation visualizes the initial octahedron, two tessellations, and the successive mapping of a test point. Debugging the math visually made getting the SQT right much, much easier.

In prior Kaggle competitions, I ran into run-time issues with solutions implemented in R, Python, and Julia. This time I decided to try Go. I was very happy with this choice. Execution time was very short and I liked Go’s minimalist syntax. In fact, I suspect it will become my new “system” programming language of choice.

I’ve posted the Go and Processing source code to my GitHub account.

My ego wishes I had done better on the leader board. But, as is usually the case, I learned a lot from working on this challenge. One of the things I like best about data science is that it offers opportunities to learn new things like SQTs. It’s hard to put into words how much I enjoy that.

Now, onto the next competition…

Tags:  Kaggle , DataScience