DSGeek

July 15, 2013

Coursera Machine Learning Class

I completed the Coursera Machine Learning class a couple of weeks ago. It was good. I’m looking forward to taking the full Stanford class this Fall to complete the Mining Massing Data Sets Graduate Certificate program.

I found the workload very manageable. The programming assignments required some thought but often ended up being single lines of Octave code. The review questions were well done and reinforced the concepts.

Although I had seen much of the material before, I got three valuable insights from the class,

Neural Networks that use sigmoid activation functions are essentially a logistic regression model that find their own features. This simple explanation instantly clarified NN’s for me.
Learning curves can be used to determine if a model suffers from high bias or variance. I’ve read a lot about the bias/variance tradeoff but this was the first time I’d seen a method for determining a model’s disposition.
Only high-variance models benefit from adding more data. I’ve often read in “Big Data” circles that adding more data improves prediction accuracy. This only works, however, with high-variance models. Essentially, the large volume of data drives the model far enough down the learning curve to prevent overfitting. This is obvious in retrospect but probably would have taken some thought and experimentation to realize on my own.

These insights represent the real value of MOOC’s to me - they are convenient and free opportunities to see familiar material presented in different ways that often lead to new insights and deeper understandings. Although they are time consuming, the promise of such insights motivates me to keep taking data science MOOCs.

Tags: Coursera , MachineLearning