InsightaaS: Anyone who has read the Big Data coverage on this site or my book on cloud will know that I have a lot of time for Paul Lewis. Now CTO of HDS Canada, Lewis combines deep insight into Big Data and related issues with a genuine and humorous way of structuring his point of view. Today's featured post displays both the insight and the humour: working from a scenario that parents and consumers will appreciate (the way in which children suddenly exceed their parents in understanding virtually everything, the mystery inherent in the recommendation engines found in sites like Amazon and Netflix), Lewis is able to illustrate some key machine learning concepts.
In the post, Lewis describes four important inputs to machine learning. The first is training data, the information set used to provide the machine learning system with a framework for interpreting future inputs. The selection of this training data is important; Lewis says here that "We hope that by choosing training data that is sufficiently representative of all of the possible data that may be encountered that we will be able to train a program to make useful predictions or decisions about data outside of the training set."
From this point, Lewis describes three ways of using data: regression (broadly 'given all previous answers, what is the most likely next answer to whatever it is we're evaluating?'), classification (assigning all previous points to groups, and trying to accurately connect the next data point to one of those groups), and decision trees, which follow patterns that can be understood by humans - which can be important in scenarios (Lewis uses loan approvals as an example) where the outcome needs to be explained.
In the post, Lewis writes, "Remember that 'data is your greatest asset' story you keep telling your clients? Much like an oil refinery turns crude oil rich in potential energy into all sorts of useful products, machine learning (a data refinery perhaps?) turns data rich in potential value into insights." To extend the analogy, most of us likely understand a refinery's inputs and outputs better than we understand the details of the complex process that occurs within the plant. The same is true of machine learning, but Lewis has at least provided a synopsis of the process behind the recommendations that spill from our screens.
It was much easier when I was smarter than my kids.
I could rattle off a number of inaccurate responses to common childhood questions like “why is the sky blue?”. “Because I said so, now get out of the bathroom”, and so on.
Unfortunately schooling and the Internet and their mother have taken away the vast majority of my enjoyment by providing correct and accurate answers, and even the cognitive means to deduce the answers for themselves. They have turned their new-found wisdom into insolent sarcasm, mostly directed at me. Damn them.
Now I get responses like these:
- When demonstrating a complex Minecraft world to me: “I can explain it to you but I can’t understand it for you”
- When I make a mistake but say “I meant to do that”: “What he lacks in common sense, he makes up for in self-esteem”
- When I explain what I do for a living to family member: “Don’t fall for it <insert name>, he’s just making up words”
- When someone asks why they are so quiet: “I’m not shy, I’m just holding back my awesomeness so I don’t intimidate you”
Even recently, when I (and by “I”, I mean an external paid professional) installed a new whole home cable system, which started providing content “recommendations” within the first few days, they were the first to recognize its sophistication. My conclusion that Eve (our Elf on the Shelf) evaluated our watching habits during the day and then called the cable company to inform them our family should be watching more educational programming was summarily dismissed. It seems like a reasonable assessment of the situation.
In unison, I was corrected: “The new feature is actually an implementation of a Recommender System, which is just one example of a Machine Learning algorithm. Obviously. Considering we didn’t fill out an online profile of the types of programming we enjoy, or our watching habits, or even just simple demographics, we should conclude that they are using a Latent Factors or Collaborative Filtering system to learn about our viewing habits. Duh.”
My eyes, equal parts glazed and teary, forced upon them an requirement to explain in further detail.
Out came the whiteboard. Apparently I was about to get schooled. Imagine a series of eye rolling and while I’m given this detailed explanation:
Machine Learning is how computers learn to use data to make (hopefully) good predictions or decisions. In the case of movie recommendations, a machine learning program comes to learn our preferences based on the movies we watch, buy, and rate. The program then uses this accumulated knowledge to make recommendations about content we haven’t yet watched. In fact, this is also how sites like Amazon.com make their “customers like you also bought/read/looked at” recommendations.
In a Big Data world, you can think of Machine Learning as the forward looking cousin to Data Mining...