• Capital Expenditures
  • About
  • Contact
  • Capital Expenditures
  • About
  • Contact
CAPITAL EXPENDITURES

a learning investment in DAta Science, entrepreneurship, and Biotech

by
​ ​Vanessa Mahoney

Feature Engineering vs Learning

6/26/2018

1 Comment

 
In my field, you hear a lot of buzzwords: artificial intelligence, cognitive, neural network, deep learning, Watson (hehe gotta include that one). They sure sound good, but what do they really mean? How do they fit together?

I've copied this awesome visual from Deep Learnings in Biology that illustrates the relationships between various AI disciplines. As you can see, each of these disciplines is trying to move from an input to an output. In other words, we're trying to build an understanding of the input and make a model or prediction that helps us recreate it. One of the most important steps is figuring out what to look for in those inputs - what features are the most telling about that thing? In other words, what features are the most important in representing that thing? Machine learning often uses hand designed features - us humans try to choose the characteristics -  but as we move into representative and deep learning, the machine can actually learn the features that are the most useful for prediction
Picture
In artificial intelligence, a computer or machine  produces an output without explicitly being given the step-by-step instructions of how to produce that output (as opposed to, an Excel macro, for example, when we tell the algorithm exactly what to do.) However, the complexity of the instructions that we  give the machine can vary quite a bit. We could give the machine very specific instructions: perhaps we want a machine to classify documents, so we ingest several documents that have already been classified correctly. Let's say we have a set of documents that are press releases, legal contracts, and reports. We would feed in the text of these documents along with their classification. The machine learning algorithm would "look" at these labeled documents and "learn" what combinations of features are most probably press releases, legal contracts, or reports.  We also probably told the machine learning algorithm what features it should use to learn. We told the algorithm to use words to learn how to classify - words are one of the features. The machine learning algorithm would be looking for key words and frequency of those key words to make a decision on where that document belongs. 

In this example, we told the algorithm what it needed to learn because we provided it with the correct classifications of past data. We also told the algorithm what features it should use to learn - words, (as well as structure, sentiment, concepts and additional features we didn't talk about).  However, we don't tell the algorithm how to classify the new documents; instead, the machine has learned from the historical labeled data and takes what it learned to classify new documents.  The machine produced an output - classified new documents - but we gave it the rules. 

As you can imagine, there is great utility if we can train a machine to produce an output. However, you'll also see that in this example, we still had quite a bit of manual intervention. We had to feed in correctly labeled documents, and we also had to hand design the features that the machine should use. As such, machine learning algorithms can still be manual and time consuming because of this feature engineering step, especially when hundreds of complicated variables may be necessary to classify images, for example. 

However, what if the machine could also learn what features it should use to build models? In a more complex branch of machine learning called representative learning, the algorithm can actually discover the features that it should use to build a model. By putting in place an artificial neural network, we can create algorithms that will find important features without being told what features to look for. Basically, this type of algorithm (if given enough examples) could decipher the features that make a cat a cat, a dog a dog, and an  apple and apple. Deep learning takes it a step further, learning more complex features by first extracting simpler features and combining them (like whiskers + small triangular nose). Deep learning achieves this by building multiple layers (>2) into a neural network.  An area where deep learning algorithms are being used very successful today is medical image classification. From disease diagnosis, to cell segmentation, to tissue classification,  deep learning algorithms have reached expert-level diagnosis and recognition. 

Anyway, hope you learned something! Here's an interesting perspective on deep learning, especially as it collides with new data protection regulations. 
1 Comment
    Picture

    Vanessa Mahoney,  PHD

    Biomedical scientist & data analyst who loves learning how things work - from mortgage-backed securities to cardiac electrophysiology to Donald Trump's comb over

     
    The postings on this site are my own and don't necessarily represent IBM's positions, strategies, or opinions. 

    Archives

    December 2018
    November 2018
    June 2018
    December 2017
    June 2017
    April 2017
    September 2016
    July 2016
    June 2016
    May 2016
    February 2016
    January 2016
    November 2015
    September 2015
    August 2015
    June 2015
    May 2015

    Categories

    All
    Biotech/Healthcare

    RSS Feed

    Categories

    All
    Biotech/Healthcare

Powered by Create your own unique website with customizable templates.