![]() There has been quite a lot of talk of machine learning and artificial intelligence in the media, a trend that is all-but guaranteed to continue. Machine learning implementations are everywhere: from consumer-facing examples like google searches, Siri speech recognition, and spam filtering, to sophisticated commercial technologies like unmanned drones and self-driving cars. machine learning has infiltrated our daily lives, although not quite in the ways portrayed by Hollywood. People have disparate feelings regarding the imminence of artificial intelligence (AI). An observer article reported that technology cognoscenti Stephen Hawking, Bill Gates, and Elon Musk are deeply concerned with the risks of artificial intelligence, with Musk citing AI as “our greatest existential threat”. However, machine learning is also revolutionizing the diagnosis and treatment of disease. Check out this paper from the University of Sao Paulo, in which a machine learning algorithm is used to diagnose Alzheimer's Disease from electroencephalography (EEG) patterns, with both high accuracy and sensitivity! Personally, I think Spiderman said it best when it comes to the development of AI: "With great power comes great responsibility". Specifically, there has got to be some regulatory oversight, both domestically and internationally, when it comes to programming machines to act intelligently. But rather than launch into an ethical discussion, today I'd simply like to unshroud some of the basic concepts of machine learning. | A few data science terms: machine learning: a subfield of computer science that involves "teaching" computers to recognize patterns in data and apply these findings to make decisions. supervised learning: the computer "learns" from input data and builds an algorithm that can be used to make predictions on a similar data set. unsupervised learning: the computer does not have labels for input data, and thus must structure the input data on its own in order to map outputs. cognitive computing: a computer system that seeks to mimic the human thought process. Cognitive computing uses machine learning algorithms, natural language processing, and data mining to continually learn, adapt to new problems, and model solutions. artificial intelligence: a broad computer science field which describes the intelligence exhibited by computers. Encompasses learning, representation, reasoning, and abstract thinking. Some common machine learning task by outputs: classification: computer classifies new data (ie: is this incoming email spam? yes or no) usually by learning from input data clustering: computer divides data into groups without knowing what the groups are. regression: statistical technique to determine relationships between a dependent variable and one more independent variables. |
To start, let's zoom out and see where machine learning is in the spectrum of data science. The above Venn Diagram breaks down data science into some easy to digest concepts. First, computer science can be viewed as broad skill set or field under data science, along with math & statistics and subject matter expertise. At the interface of these skill sets is subfields. For example, my graduate research in electrophysiology can be defined as traditional research. I utilized my subject matter expertise in the heart to design and perform experiments, and by applying statistical techniques to my findings, I was able to show that I had made significant findings. Now machine learning lies at the interface of computer science and math & statistics: machine learning relies on a computer program's ability to apply statistics to problems. (Side note: a "unicorn" data science is the holy grail data scientist, having expertise across the all three areas and rumored to make on average over $200k.)
Machine learning certainly sounds like an intimidating term, but let's start with a very simple example of machine learning: k-means clustering. In this example, let's pretend a business wants to group online customers into similarly-behaving cohorts in order to deliver targeted marketing at scale. While a multitude of variables on the customer would be gathered (demographics, online behavior, affiliate connections) to keep things simple let's just look at two inputs: how much the customer spends on average, and how environmentally conscious the customer is.
The following infographic, adapted from a post by Ben McRedmond, walks through the basic steps of k-means clustering.
Machine learning certainly sounds like an intimidating term, but let's start with a very simple example of machine learning: k-means clustering. In this example, let's pretend a business wants to group online customers into similarly-behaving cohorts in order to deliver targeted marketing at scale. While a multitude of variables on the customer would be gathered (demographics, online behavior, affiliate connections) to keep things simple let's just look at two inputs: how much the customer spends on average, and how environmentally conscious the customer is.
The following infographic, adapted from a post by Ben McRedmond, walks through the basic steps of k-means clustering.
In the last step, we can see that further iterations will not change the cluster assignments: we have found the clusters! Now the business can market to all of the customers in cluster 1 a certain way and all of the customers 2 in a different, more targeted way.
Some notes about k-means clustering: while k-means clustering will always terminate, the first random placement of the green points will affect the clustering. What that means is we could end up with a less than optimal set of cluster assignments based on that initial step. In order to minimize this error, k-means is often repeated to find the best cluster definitions. Also, as I pointed out above, the cluster center is the average of the components, so outliers can skew the cluster by influencing the placement of the cluster center.
This is just one example of clustering, but this type of analysis is used in a variety of other fields and applications. In biology, cluster analysis can be used to building group of genes with similar expression patterns, to make spatial and temporal comparisons of communities, and to differentiate between different types of blood and tissue in a three-dimensional image. This is just one simple example of a machine learning technique, and of course machine learning algorithms get much more complicated, but I hope this illustrated the basic premise of machine-learning: giving computers the ability to solve problems without explicitly programming them.
Some notes about k-means clustering: while k-means clustering will always terminate, the first random placement of the green points will affect the clustering. What that means is we could end up with a less than optimal set of cluster assignments based on that initial step. In order to minimize this error, k-means is often repeated to find the best cluster definitions. Also, as I pointed out above, the cluster center is the average of the components, so outliers can skew the cluster by influencing the placement of the cluster center.
This is just one example of clustering, but this type of analysis is used in a variety of other fields and applications. In biology, cluster analysis can be used to building group of genes with similar expression patterns, to make spatial and temporal comparisons of communities, and to differentiate between different types of blood and tissue in a three-dimensional image. This is just one simple example of a machine learning technique, and of course machine learning algorithms get much more complicated, but I hope this illustrated the basic premise of machine-learning: giving computers the ability to solve problems without explicitly programming them.
Sources:
1. The Observer: Stephen Hawking, Elon Musk, and Bill Gates Warn About Artificial Intelligence
2. Med City News: 4 ways healthcare is putting artificial intelligence, machine learning to use
3. Forbes: The Hunt For Unicorn Data Scientists Lifts Salaries For All Data Analytics Professionals
4. Wikipedia: Machine Learning
5. Blog Intercom: Machine Learning is Way Easier than it Looks
6. Wikipedia: Cluster Analysis
7.Clinical EEG and Neuroscience Improving Alzheimer's Diagnosis with Machine Learning Techniques